5. Graphs - why should we care? C. Faloutsos (CMU) Internet Map [lumeta.com] Food Web [Martinez ’91] Protein Interactions [genomebiology.com] Friendship Network [Moody ’01] Hadoop Summit '10
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24. Triangle Law: #S.3 [Tsourakakis ICDM 2008] C. Faloutsos (CMU) ASN HEP-TH Epinions X-axis: # of Triangles a node participates in Y-axis: count of such nodes Hadoop Summit '10
25. Triangle Law: #S.3 [Tsourakakis ICDM 2008] C. Faloutsos (CMU) ASN HEP-TH Epinions X-axis: # of Triangles a node participates in Y-axis: count of such nodes Hadoop Summit '10
26. Triangle Law: #S.4 [Tsourakakis ICDM 2008] C. Faloutsos (CMU) SN Reuters Epinions X-axis: degree Y-axis: mean # triangles n friends -> ~ n 1.6 triangles Hadoop Summit '10
27. Triangle Law: Computations [Tsourakakis ICDM 2008] C. Faloutsos (CMU) But: triangles are expensive to compute (3-way join; several approx. algos) Q: Can we do that quickly? details Hadoop Summit '10
28. Triangle Law: Computations [Tsourakakis ICDM 2008] C. Faloutsos (CMU) But: triangles are expensive to compute (3-way join; several approx. algos) Q: Can we do that quickly? A: Yes! #triangles = 1/6 Sum ( i 3 ) (and, because of skewness, we only need the top few eigenvalues! details Hadoop Summit '10
40. Bipartite Communities! magnified bipartite community patents from same inventor(s) cut-and-paste bibliography! C. Faloutsos (CMU) Hadoop Summit '10
41.
42.
43.
44. Observation W.1: Fortification C. Faloutsos (CMU) More donors, more $ ? $10 $5 Hadoop Summit '10 ‘ Reagan’ ‘ Clinton’ $7
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55. More on Time-evolving graphs C. Faloutsos (CMU) M. McGlohon, L. Akoglu, and C. Faloutsos Weighted Graphs and Disconnected Components: Patterns and a Generator. SIG-KDD 2008 Hadoop Summit '10
56.
57.
58.
59. T.4 : popularity over time C. Faloutsos (CMU) Post popularity drops-off – exponentially? lag: days after post # in links 1 2 3 @t @t + lag Hadoop Summit '10
60. T.4 : popularity over time C. Faloutsos (CMU) Post popularity drops-off – exponentially? POWER LAW! Exponent? # in links ( log ) 1 2 3 days after post ( log ) Hadoop Summit '10
61.
62.
63.
64. Ce nter- P iece S ubgraph Discovery [Tong+ KDD 06] Original Graph Q: Who is the most central node wrt the black nodes? (e.g., master-mind criminal, common advisor/collaborator, etc) Input C. Faloutsos (CMU) Hadoop Summit '10 B A C
65. Ce nter- P iece S ubgraph Discovery [Tong+ KDD 06] Q: How to find hub for the query nodes? Input: original graph Output: CePS CePS Node C. Faloutsos (CMU) A: Combine proximity scores (RWR) Hadoop Summit '10 B A C B A C
66.
67.
68.
69. G raph X -Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad KDD’07
70. Output Input Attributed Data Graph Query Graph Matching Subgraph Hadoop Summit '10 C. Faloutsos (CMU)
73. OddBall: Spotting A n o m a l i e s in Weighted Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School of Computer Science To appear in PAKDD 2010, Hyderabad, India
74.
75. What is an egonet? ego egonet C. Faloutsos (CMU) Hadoop Summit '10
80. Outline – Algorithms & results C. Faloutsos (CMU) Hadoop Summit '10 Centralized Hadoop/PEGASUS Degree Distr. old old Pagerank old old Diameter/ANF old DONE Conn. Comp old DONE Triangles DONE Visualization STARTED
81.
82.
83.
84.
85.
86.
87. Radius Plot of GCC of YahooWeb. C. Faloutsos (CMU) Hadoop Summit '10
88. Running time - Kronecker and Erdos-Renyi Graphs with billions edges. details
89. Outline – Algorithms & results C. Faloutsos (CMU) Hadoop Summit '10 Centralized Hadoop/PEGASUS Degree Distr. old old Pagerank old old Diameter/ANF old DONE Conn. Comp old DONE Triangles DONE Visualization STARTED
90. Generalized Iterated Matrix Vector Multiplication (GIMV) C. Faloutsos (CMU) PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations . U Kang, Charalampos E. Tsourakakis, and Christos Faloutsos. ( ICDM ) 2009, Miami, Florida, USA. Best Application Paper (runner-up) . Hadoop Summit '10
91.
92.
93.
94.
95.
96.
97.
98. Outline – Algorithms & results C. Faloutsos (CMU) Hadoop Summit '10 Centralized Hadoop/PEGASUS Degree Distr. old old Pagerank old old Diameter/ANF old DONE Conn. Comp old DONE Triangles DONE Visualization STARTED
99. Triangles : Computations [Tsourakakis ICDM 2008] C. Faloutsos (CMU) But: triangles are expensive to compute (3-way join; several approx. algos) Q: Can we do that quickly? A: Yes! #triangles = 1/6 Sum ( i 3 ) (and, because of skewness, we only need the top few eigenvalues! Mentioned already Hadoop Summit '10
100. Triangle Law: #1 [Tsourakakis ICDM 2008] C. Faloutsos (CMU) ASN HEP-TH Epinions X-axis: # of Triangles a node participates in Y-axis: count of such nodes Mentioned already Hadoop Summit '10
101. Outline – Algorithms & results C. Faloutsos (CMU) Hadoop Summit '10 Centralized Hadoop/PEGASUS Degree Distr. old old Pagerank old old Diameter/ANF old DONE Conn. Comp old DONE Triangles DONE Visualization STARTED