第九章+无监督学习与半监督学习——Clustering

合集下载

相关主题

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

It is a real-world object and/or phenomena, but sometimes was not measured,
because of faulty sensors; or was measure with a noisy channel, etc. E.g., traffic radio, aircraft signal on a radar screen
Discrete latent variables can be used to partition/cluster data into sub-groups.
Continuous latent variables can be used for dimensionality reduction.
K-means Gaussian Mixture Models and EM Algorithm K-medoids Hierarchical Clustering Density-based Clustering
Dissimilarity/Distance Function
Choice of dissimilarity/distance function is application dependent. Need to consider the type of features. Categorical, ordinal or quantitative. Possible to learn dissimilarity from data.
K-medoids
Hierarchical Clustering Density-based Clustering
Evaluation of Clustering
Performance Evaluation of Clustering: Validity index Evaluation metrics:
Non-reference model
Internal Index
Introduction
Applications of Clustering Distance Functions
Outline
Evaluation Metrics
Clustering Algorithms K-means Gaussian Mixture Models and EM Algorithm K-medoids Hierarchical Clustering Density-based Clustering
Algorithm that optimizes this loss function.
Introduction Applications of Clustering Distance Functions Evaluation Metrics
Outline
Clustering Algorithms
Pick prototype i+1 to be the farthest from prototypes{1,2….i}
Evolution of k-Means
(a) original dataset; (b) random initialization; (c-f) illustration of running two iterations of k-means. (Images from Michael Jordan)
m(m-1)/2 clustering same not a c
reference same not b d
External Index
Non-reference model
Only having result of clustering, how can we evaluate it ? Intra-cluster similarity: larger is better Inter-cluster similarity: smaller is better
Density-based Clustering
Image Segmentation
http://people.cs.uchicago.edu/ pff/segment
Human Population
Eran Elhaik et al. Nature
Clustering Graphs
Newman, 2008
Introduction
Applications of Clustering Distance Functions
Outline
Evaluation Metrics
Clustering Algorithms K-means Gaussian Mixture Models and EM Algorithm
Why do Unsupervised Learning?
Raw data cheap. Labeled data expensive. Save memory/computation. Reduce noise in high-dimensional data. Useful in exploratory data analysis. Often a pre-processing step for supervised learning.
prototypes on one and use these to compute the loss function
on the other. Stability of clusters: Measure the change in the clusters obtained by resampling or splitting the data. Non-parametric approach: Place a prior on K . More details in the Bayesian non-parametric lecture.
Introduction Applications of Clustering
Distance Functions
Outline
Evaluation Metrics Clustering Algorithms
K-means
Gaussian Mixture Models and EM Algorithm K-medoids Hierarchical Clustering
of the data generation process. E.g., speech recognition models, mixture models
It is a real-world object and/or phenomena, but difficult or impossible to measure. E.g., the temperature of a star, causes of a disease, evolutionary ancestors
Vector quantization to compress images
百度文库
Bishop, PRML
Ingredients of cluster analysis
A dissimilarity/distance function between samples. A loss function to evaluate clusters.
If responsibilities known, can compute prototypes
We use an iterative procedure
K-means Algorithms
How do we initialize K-means?
Some heuristics Randomly pick K data points as prototypes
Loss function J after each iteration
Convergence of K-means
k-means is exactly coordinate descent on the reconstruction error E. E monotonically decreases, and the value of E converges, so do the clustering results.
reference model (external index)
compare with reference
non-reference model (internal index)
measure distance of inner-class and inter-class
Reference Model
How to choose K ?
Like choosing K in kNN.
The loss function J generally decreases with K .
How to choose K ?
Gap statistic
Cross-validation: Partition data into two sets. Estimate
K-means Gaussian Mixture Models and EM Algorithm K-medoids Hierarchical Clustering Density-based Clustering
Supervised vs. Unsupervised Learning
Unsupervised Learning ——Clustering
Jiafeng Guo
Introduction Applications of Clustering Distance Functions Evaluation Metrics
Outline
Clustering Algorithms
K-means: Idea
K-means: minimizing the loss function
How do we minimize J w.r.t (rik,uk)?
Chicken and egg problem
If prototypes known, can assign responsibilities
Cluster Analysis
Discover groups such that samples within a group are more similar to each other than samples across groups.
Unobserved Variables
A variable can be unobserved (latent). It is an imaginary quantity meant to provide some simplified and abstractive view
Distance Function
Standardization
Without standardization
With standardization
Standardization not always helpful
Without standardization
With standardization
It is possible for k-means to oscillate between a few different
clusterings, but this almost never happens in practice. E is non-convex, so coordinate descent on E cannot guaranteed to converge to global minimum. One common thing to do is running kmeans many times and pick the best one.