k-means算法的简单示例
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
α , β as the centre and K=2.
Step 3 again.
The new centers of the two clusters are P(1.6,4.8) and Q(4.8,1.6)
Example
A
d PA 1 d PB 0.89
B
E C D P(1.6,4.8) A B
P , Q as the centre and K=2.
Step 3 again.
The new centers of the two clusters are equal to the original P(1.6,4.8) and Q(4.8,1.6)
Final
E C D
cluster 2
A B F I J G H
d AE 2.24 d AF 3.61
< > < > > > > > > >
d BA 1 d BB 0
d BC 1.41 d BD 1
d BE 2 d BF 2.83
d AG 4.47
d AH 5.39 d AI 4.24
H I J
d BG 3.61 d BH 4.47
Algorithm Procedure
1. Randomly select K points from complete samples as the initial center.(That's what k means in K-means) 2. Each point in the dataset is assigned to the closed cluster,based upon the Euclidean distance between each point and each cluster center. 3. Each cluster's center is recomputed as the average of the points in that cluster. 4. Iterate step 2 or more until the new center of cluster equals to the original center of cluster or less than a specified threshold,then clustering finished.
Example
x y center ( ,
i
E C α(1,4.5) A B D
j
i
j
)
PA, B,C , D, E (1.6,4.8)
β(3.75,2.875)
F I
cluster 1
new center
H
G J
QF ,G,H ,I , J (4.8,1.6)
cluster 2
B
C
D
E
β(3.75,2.875)
F I
F G
dE 1.8 dF 3.91 dG 4.72 dH 5.59 dI 4.61 dJ 5.32
G J
H
H I J
< < < < < > > > > >
d A 2.97 d B 2.08
dC 3.48 dD 2.75
Example
x y center ( ,
i
E
j
i
j
)
cluster 2
C D
A(1,4)
B(2,4)
ቤተ መጻሕፍቲ ባይዱ
A ,C
F I G J H
11 4 5 ( , ) (1,4.5) 2 2
new center
cluster 1
B,D,E ,F ,G,H ,I , J (3.75,2.875)
C D E F G
F
G H
d PC 0.63 d PD 0.45
d PE 1.26 d PF 3.69
d PG 4.40
d PH 5.22 d PI 4.49
H
Q(4.8,1.6) I J
I
J
d PJ 5.10
< < < < < > > > > >
dQA 4.49
cluster 1
Clustering finished !
Disadvantages
one of the main disadvantages to k-means is the fact that you must specify the number of clusters(K) as an input to the algorithm.As designed,the algorithm is not capable of determining the appropriate number of clusters and depends upon the user to identify this in advance.
Example
E
C D B
A
F I
G J
H
How to cluster A,B...H,J into two clusters?
Example
A
E C D
d AA 0 d AB 1
B C D E F G
F I G J H
d AC
A(1,4)
d BC
B(2,4)
d AC 1 d AD 1.41
dQB 3.69 dQC 5.10 dQD 4.4
dQE 5.22 dQF 0.89 dQG 0.45
dQH 1.26
d QI 1 dQJ 0.63
Step 2 again.
So,we classify A,B,C,D,E as a cluster and F,G,H,I,J as another cluster.
Example
x y center ( ,
i j
i
j
)
E C D P(1.6,4.8) A B
cluster 1
M A,B,C ,D,E (1.6,4.8)
NF ,G,H ,I , J (4.8,1.6)
F H Q(4.8,1.6) J G
new center
cluster 2
I
K-means Clustering
K-means Clustering
K-means clustering is a sort of clustering algorithm and it is a m et hod of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. K-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. --From Wikipedia
d BI 3.61 d BJ 4.24
d AJ 5
d AB means distance A→B
Randomly choose A,B as the centre and K=2.
Step 1 and 2.
So,we classify A,C as a cluster and B,E,D,F,G,H,I and J as another cluster.
The new centers of the two clusters are (1,4.5) and (3.75,2.875)
Randomly choose A,B as the centre and K=2.
Step 3.
Example
A
E C α(1,4.5) A B D
dA 0.5
dB 1.12 dC 0.5 dD 1.12
K=2
K=3
d E 3.58 d F 0.91 dG 1.53 dH 2.41 d I 1.89 d J 2.25
α , β as the centre and K=2.
Step 2 again.
So,we classify A,B,C,D,E as a cluster and F,G,H,I,J as another cluster.
Step 3 again.
The new centers of the two clusters are P(1.6,4.8) and Q(4.8,1.6)
Example
A
d PA 1 d PB 0.89
B
E C D P(1.6,4.8) A B
P , Q as the centre and K=2.
Step 3 again.
The new centers of the two clusters are equal to the original P(1.6,4.8) and Q(4.8,1.6)
Final
E C D
cluster 2
A B F I J G H
d AE 2.24 d AF 3.61
< > < > > > > > > >
d BA 1 d BB 0
d BC 1.41 d BD 1
d BE 2 d BF 2.83
d AG 4.47
d AH 5.39 d AI 4.24
H I J
d BG 3.61 d BH 4.47
Algorithm Procedure
1. Randomly select K points from complete samples as the initial center.(That's what k means in K-means) 2. Each point in the dataset is assigned to the closed cluster,based upon the Euclidean distance between each point and each cluster center. 3. Each cluster's center is recomputed as the average of the points in that cluster. 4. Iterate step 2 or more until the new center of cluster equals to the original center of cluster or less than a specified threshold,then clustering finished.
Example
x y center ( ,
i
E C α(1,4.5) A B D
j
i
j
)
PA, B,C , D, E (1.6,4.8)
β(3.75,2.875)
F I
cluster 1
new center
H
G J
QF ,G,H ,I , J (4.8,1.6)
cluster 2
B
C
D
E
β(3.75,2.875)
F I
F G
dE 1.8 dF 3.91 dG 4.72 dH 5.59 dI 4.61 dJ 5.32
G J
H
H I J
< < < < < > > > > >
d A 2.97 d B 2.08
dC 3.48 dD 2.75
Example
x y center ( ,
i
E
j
i
j
)
cluster 2
C D
A(1,4)
B(2,4)
ቤተ መጻሕፍቲ ባይዱ
A ,C
F I G J H
11 4 5 ( , ) (1,4.5) 2 2
new center
cluster 1
B,D,E ,F ,G,H ,I , J (3.75,2.875)
C D E F G
F
G H
d PC 0.63 d PD 0.45
d PE 1.26 d PF 3.69
d PG 4.40
d PH 5.22 d PI 4.49
H
Q(4.8,1.6) I J
I
J
d PJ 5.10
< < < < < > > > > >
dQA 4.49
cluster 1
Clustering finished !
Disadvantages
one of the main disadvantages to k-means is the fact that you must specify the number of clusters(K) as an input to the algorithm.As designed,the algorithm is not capable of determining the appropriate number of clusters and depends upon the user to identify this in advance.
Example
E
C D B
A
F I
G J
H
How to cluster A,B...H,J into two clusters?
Example
A
E C D
d AA 0 d AB 1
B C D E F G
F I G J H
d AC
A(1,4)
d BC
B(2,4)
d AC 1 d AD 1.41
dQB 3.69 dQC 5.10 dQD 4.4
dQE 5.22 dQF 0.89 dQG 0.45
dQH 1.26
d QI 1 dQJ 0.63
Step 2 again.
So,we classify A,B,C,D,E as a cluster and F,G,H,I,J as another cluster.
Example
x y center ( ,
i j
i
j
)
E C D P(1.6,4.8) A B
cluster 1
M A,B,C ,D,E (1.6,4.8)
NF ,G,H ,I , J (4.8,1.6)
F H Q(4.8,1.6) J G
new center
cluster 2
I
K-means Clustering
K-means Clustering
K-means clustering is a sort of clustering algorithm and it is a m et hod of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. K-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. --From Wikipedia
d BI 3.61 d BJ 4.24
d AJ 5
d AB means distance A→B
Randomly choose A,B as the centre and K=2.
Step 1 and 2.
So,we classify A,C as a cluster and B,E,D,F,G,H,I and J as another cluster.
The new centers of the two clusters are (1,4.5) and (3.75,2.875)
Randomly choose A,B as the centre and K=2.
Step 3.
Example
A
E C α(1,4.5) A B D
dA 0.5
dB 1.12 dC 0.5 dD 1.12
K=2
K=3
d E 3.58 d F 0.91 dG 1.53 dH 2.41 d I 1.89 d J 2.25
α , β as the centre and K=2.
Step 2 again.
So,we classify A,B,C,D,E as a cluster and F,G,H,I,J as another cluster.