7. •
d(x, y) : x, y
Ci , Cj
Dmin (Ci , Cj ) = min {d(x, y) | x ∈ Ci , y ∈ Cj }
1.
2. 1
3. 2 1
7
8. 1 C C C
B B B
A 3 A 5 A
E 4 E E
2
F F F
G G G
D D D
(A) B,C D,F (B) A,E (C)
6
5
4
3 2
1
A B C E D F G
(D) 8
9. (1/2)
•
4.5.
• 59
(1) N x1 , . . . , xN 1
x1 , . . . , xN C1 , . . . , CN
(2) n = N n
(3) n=1
(a) C1 , . . . , Cn Ci , Cj
i<j1 2 3 4 5 6 7
(b) Ci Cj Ci
(c)
(d) Cj = Cn n=n−1
4.8
9
10. (1) N x1 , . . . , xN 1
(2) n = N
x1 , . . . , xN
n
C1 , . . . , CN
(2/2)
(3) n=1
(a) C1 , . . . , Cn Ci , Cj
i<j
(b) Ci Cj Ci
(c)
(d) Cj = Cn n=n−1
1 4.8 2 3 4 5 6 7
(a)
1 2 3 4 5 6 7
(b)
1 C C C
B 1 2 3
B 4 5 6 7
B
(d)
A 3 A 5 A
E E 10 E
11. (A) (B)
A B,C D,F E G
A 1.2 2.3 1.9 4.1 B,C 1.2
B,C 1.2 3.2 2.0 4.0 A 1.2
D,F 2.3 3.2 2.2 3.5 E 2.2
C
E 1.9 2.0 2.2 2.5 A 1.9
B
G 4.1 4.0 3.5 2.5 E 2.5
A
E A B,C
F
(C) (D)
G
D A,B,C D,F E G
A,B,C 2.3 1.9 4.0 E 1.9
D,F 2.3 2.2 3.5 E 2.2
E 1.9 2.2 2.5 A,B,C 1.9
G 4.0 3.5 2.5 E 2.5
11 ※
12. •
• Ci Cj Ci’
• Ci’ Ck
• Ci Ck Cj Ck
•
• N O(N)
• O(N^2)
•
• Cj Ci’
• 1 O(N) O(N^2)
• N-1 O(N)
O(N^2), O(N^2)
12
13. 35, 48)
•(Kruskal’s Algorithm) (Minimum Spanning Tree)
•
4.2 G = (V, E)(V E )
T ⊆G G T T
G V T T
(Spanning Tree) G
(u, v) ∈ E w(u, v) G
(u,v)∈T w(u, v) T
G (Minimum Spanning Tree)
4.13(A) 4.9(P. 59)
4.13(B) 4.13(B)
G 1 BC 6
GE 13 4.9(D) P. 59
15. Kruscal(
72 ) 4
(1) G = (V, E) V E
(2) A A
(3) V ( 1
)
(4) ( (u, v) ∈ E )
(a) A A ∪ {(u, v)} u v
(b) u v
(5) A
C C C
B B B
4.14 A
A A A
E E E
C e V-C
F F F
G G G
D D D
A={} A={(B,C),(D,F)} A={(B,C),(D,F),(A,B)}
{A}, {B}, {C}, {D}, {E}, {F} {A}, {B,C}, {D,F}, {E}
e’ {A, B,C}, {D,F}, {E}
15
16. C C
B
1
B
A
3
A
E E 4
F
6 5 F
2
G G
D D
(A) (B) (A)
6
5
4
3 2
1
A B C E D F G
(C)
16
17. C C C
B B B
A A A
E E E
F F F
G G G
D D D
(A) (B) A (C) ( AB)
E A B,C T B
C C
B B
A
A
E E Q
F F
T
G G
D D
(D) C, F T (E)
O(E + V log V )
17
18. X
•
d(x, y) : x, y
Ci , Cj
Dmax (Ci , Cj ) = max{d(x, y)|x ∈ Ci , y ∈ Cj }
1.
2. 1
3. 2 1
18
19. 1 3 C C
C
B B B
A A A
E 5
2 4
E
F E F F
G
G G
D D D
(A) B,C D,F (B) A,E (C) G
5 (A) (C) (1 5)
(D)
1 5
4
3
2
1
A B C D F E G
(D)
19
20. (A) (B)
A B,C D,F E G
A 1.3 3.0 1.9 4.1 B,C 1.3
B,C 1.3 4.1 2.5 4.5 A 1.3
D,F 3.0 4.1 2.3 4.0 E 2.3
E 1.9 2.5 2.3 2.5 A 1.9
G 4.1 4.5 4.0 2.5 E 2.5
A B,C
(C) (D)
A,B,C D,F E G
A,B,C 4.1 2.5 4.5 E 2.5
D,F 4.1 2.3 4.0 E 2.3
E 2.5 2.3 2.5 D,F 2.3
G 4.5 4.0 2.5 E 2.5 20
21. C
C
B B
A A
E E
F F
G G
D
D
(A) E A (B) A B,C E
A
C
B
(C) A B,C E
A D,F CE DE
E
E F
G
D O
(N^2) O(N^3)
21
23. duces the clusters shown in Figure 12,
whereas the complete-link algorithm ob-
tains the clustering shown in Figure 13.
Data Clustering • 277
S The clusters obtained by the complete-
i
X
m2 link algorithm are more compact than
X2
i those obtained by the single-link algo-
l
a rithm; the cluster labeled 1 obtained
r 2 2 using the 1single-link algorithm 2is elon-
1 11 2
i 1 111 2 2 22 2 2 2 22 2
t 1 1 11 2 2 gated because 1of the noisy patterns la-
1 1 1 2
2
2
2 2 2
y 11
1 1 1 *** * * * * ** 2 2 2 beled “*”. 1 1 1 1 single-link * algorithm 2is
11
The * * * * * * * * 2 2
1 1 2 2 1 2 2
1 1 1 1
1 2 more versatile 1than the complete-link
1
1 1 1
1 2
2
1 2
algorithm, otherwise. For example, the
single-link algorithm can extract the
concentric clusters shown in Figure 11,
A B C D E F***G
but the complete-link algorithm cannot.
Figure 10. The dendrogram obtained using X1 However, from a pragmatic viewpoint, it 1 X
the single-link algorithm.
Figure 12. A single-link clustering of a pattern has been observed that clustering of a pat-
Figure 13. A complete-link the complete-
set containing two classes (1 and 2) connected by tern set containing two classes (1 and 2) con-
link algorithm produces more useful hi-
Y
a chain of noisy patterns (*). erarchies inchain of noisy patterns (*).
nected by a
many applications than the
1 single-link algorithm [Jain and Dubes
1 1988].
1
(3) The output of the algorithm is a well-separated, chain-like, and concen-
2
nested hierarchy of graphs which
2 1 tric clusters, whereas a typicalClus-
Agglomerative Single-Link parti-
2
can be cut at a desired dissimilarity
1 2 2 tering Algorithm such as the k -means
tional algorithm
level forming a partition (clustering)
1
algorithm works well only on data sets
2 (1) Place isotropic clustersits own clus-
having each pattern in [Nagy 1968].
identified by simply connected com-
1
ponents in the 1corresponding graph.
1 On theConstruct a list of interpattern
ter. other hand, the time and space
complexities for all 1992] ofunordered
distances [Day distinct the parti-
Agglomerative Complete-Link Clus- 23
tional algorithms are typically lower
X pairs of patterns, and sort this list
24. •
• Average Group Linkage
• 1
D(Ci , Cj ) = D(x1 , x2 )
|Ci ||Cj |
x1 ∈Ci x2 ∈Cj
• Ward’s Method
•
D(Ci , Cj ) = E(Ci ∪ Cj E(Ci ) − E(Cj )
)−
where E(Ci ) = (d(x, ci ))2 ,
x∈Ci
1
ci = x
|Ci |
x∈Ci
Average Group Linkage 24
Ward’s Method
29. DIANA (1)
• V(i,S)
V
S(⊂V):
d(i, j) : i j
S i∈V-S V(i,S)
V (i, S)
1
|V |−1 j∈V −{i} d(i, j) if S = φ
=
1
|V −S|−1 j∈S∪{i} d(i, j) − 1
|S| j∈S d(i, j) if S = φ
V(i,S) i S - (V-S)
30. DIANA (2):
C A
B B 1.2 B
A C 1.3 1.0 C
E D 3.0 4.0 4.1 D
F E 1.9 2.0 2.5 2.3 E
G F 2.3 3.2 3.4 1.1 2.2 F
D G 4.1 4.0 4.5 3.5 2.5 4.0
(A) (B)
6 4
(1) : S {} S
(2) V (i, S) i∈V −S
(3) V (i, S) 0 i S (2)
(4) V (i, S) ≤ 0 S i (5)
(5) V V −S
4.24
31. DIANA (3): 1 (1)
C A C
B B 1.2 B B
A C 1.3 1.0 C A
E D 3.0 4.0 4.1 D E
F E 1.9 2.0 2.5 2.3 E F
F 2.3 3.2 3.4 1.1 2.2 F G
G
D G 4.1 4.0 4.5 3.5 2.5 4.0 D
(A) (B) (C)
G
V (i, S)
1
|V |−1 j∈V −{i} d(i, j) if S = φ
=
1
|V −S|−1 j∈S∪{i} d(i, j) − 1
|S| j∈S d(i, j) if S = φ
V (G, {})
= 1/6(d(G, A) + d(G, B) + d(G, C) + d(G, D) + d(G, E) + d(G, F))
= 1/6(4.1 + 4.0 + 4.5 + 3.5 + 2.5 + 4.0) = 3.77
32. DIANA (4): 1 (2)
A
C
B 1.2 B B
C 1.3 1.0 C
A
D 3.0 4.0 4.1 D E
E 1.9 2.0 2.5 2.3 E F
F 2.3 3.2 3.4 1.1 2.2 F
G
G 4.1 4.0 4.5 3.5 2.5 4.0 D
(B) (D)
E
V (i, S)
1
|V |−1 j∈V −{i} d(i, j) if S = φ
=
1
|V −S|−1 j∈S∪{i} d(i, j) − 1
|S| j∈S d(i, j) if S = φ
V (E, {})
= 1/6(d(E, A) + d(E, B) + d(E, C) + d(E, D) + d(E, F) + d(E, G))
= 1/6(1.9 + 2.0 + 2.5 + 2.3 + 2.2 + 2.5) = 2.23
33. DIANA (5): 1 (3)
• V(i, {})
V (A, {}) = 13.8/6 = 2.3, V (B, {}) = 15.4/6 = 2.57,
V (C, {}) = 16.8/6 = 2.8, V (D, {}) = 18.0/6 = 3.0,
V (E, {}) = 2.23, V (F, {}) = 16.2/6 = 2.7
V (G, {}) = 3.77
• V(G, {})
• V(G, {}) 0 G S
• S = {G}
• S
34. DIANA (6): 2 (1)
C A
B B 1.2 B
A C 1.3 1.0 C
E D 3.0 4.0 4.1 D
F E 1.9 2.0 2.5 2.3 E
D F 2.3 3.2 3.4 1.1 2.2 F
G
G 4.1 4.0 4.5 3.5 2.5 4.0
V (i, S)
1
|V |−1 j∈V −{i} d(i, j) if S = φ
=
1
|V −S|−1 j∈S∪{i} d(i, j) − 1
|S| j∈S d(i, j) if S = φ
V (A, {G})
= 1/5(d(A, B) + d(A, C)+
d(A, D) + d(A, E) + d(A, F)) − 1/1(d(A, G))
= 1/5(1.2 + 1.3 + 3.0 + 1.9 + 2.3) − 4.1 = −2.16
35. DIANA (7): 2 (2)
•
V (B, {G}) = 1/5(1.2 + 1.0 + 4.0 + 2.0 + 3.2) − 4.0 = −1.72
V (C, {G}) = 1/5(1.3 + 1.0 + 4.1 + 2.5 + 3.4) − 4.5 = −2.04
V (D, {G}) = 1/5(3.0 + 4.0 + 4.1 + 2.3 + 1.1) − 3.5 = −0.6
V (E, {G}) = 1/5(1.9 + 2.0 + 2.5 + 2.3 + 2.2) − 2.5 = −0.32
V (F, {G}) = 1/5(2.3 + 3.2 + 3.4 + 1.1 + 2.2) − 4.0 = −1.56
• V(E {G})
• V(E, {G}) 0
C
B
A
E
F
G D
36. DIANA (8): 2 (1)
• V {A,B,C,D,E,F,G} {A,B,C,D,E,F} {G}
• V = {G}
• V = {A,B,C,D,E,F}
V (i, S)
1
|V |−1 j∈V −{i} d(i, j) if S = φ
=
1
|V −S|−1 j∈S∪{i} d(i, j) − 1
|S| j∈S d(i, j) if S = φ
V (A, {}) = 1/5(1.2 + 1.3 + 3.0 + 1.9 + 2.3) = 1.94
V (B, {}) = 1/5(1.2 + 1.0 + 4.0 + 2.0 + 3.2) = 2.28
V (C, {}) = 1/5(1.3 + 1.0 + 4.1 + 2.5 + 3.4) = 2.46
V (D, {}) = 1/5(3.0 + 4.0 + 4.1 + 2.3 + 1.1) = 2.9
V (E, {}) = 1/5(1.9 + 2.0 + 2.5 + 2.3 + 2.2) = 2.18
V (F, {}) = 1/5(2.3 + 3.2 + 3.4 + 1.1 + 2.2) = 2.44
• S={D} S
37. DIANA (9): 2 (2)
• V={A,B,C,D,E,F}, S={D} V(i, S)
V (i, S)
1
|V |−1 j∈V −{i} d(i, j) if S = φ
=
1
|V −S|−1 j∈S∪{i} d(i, j) − 1
|S| j∈S d(i, j) if S = φ
V (A, {D}) = 1/4(1.2 + 1.3 + 1.9 + 2.3) − 3.0 = −1.325
V (B, {D}) = 1/4(1.2 + 1.0 + 2.0 + 3.2) − 4.0 = −2.15
V (C, {D}) = 1/4(1.3 + 1.0 + 2.5 + 3.4) − 4.1 = −2.05
V (E, {D}) = 1/4(1.9 + 2.0 + 2.5 + 2.2) − 2.3 = −0.15
V (F, {D}) = 1/4(2.3 + 3.2 + 3.4 + 2.2) − 1.1 = 1.675
C
• S F B
• V(F, {D}) 0 S
E
A
F
D
G
38. DIANA (10): 2 (3)
• V={A,B,C,D,E,F}, S={D,F}
V (i, S)
1
|V |−1 j∈V −{i} d(i, j) if S = φ
=
1
|V −S|−1 j∈S∪{i} d(i, j) − 1
|S| j∈S d(i, j) if S = φ
V (A, {D, F}) = 1/3(1.2 + 1.3 + 1.9) − 1/2(3.0 + 2.3) = −1.183
V (B, {D, F}) = 1/3(1.2 + 1.0 + 2.0) − 1/2(4.0 + 3.2) = −2.2
V (C, {D, F}) = 1/3(1.3 + 1.0 + 2.5) − 1/2(4.1 + 3.4) = −2.15
V (E, {D, F}) = 1/3(1.9 + 2.0 + 2.5) − 1/2(2.3 + 2.2) = −0.117
•
C
B
A
E
F
G
D G D F E A B C