SlideShare une entreprise Scribd logo
1  sur  105
CMU SCS

Graph Mining: Laws, Generators
and Tools
Christos Faloutsos
CMU
CMU SCS

Thanks
• Michael Mahoney
• Lek-Heng Lim
• Petros Drineas
• Gunnar Carlsson

MMDS 08

C. Faloutsos

2
CMU SCS

Outline
•
•
•
•

Problem definition / Motivation
Static & dynamic laws; generators
Tools: CenterPiece graphs; Tensors
Other projects (Virus propagation, e-bay
fraud detection)
• Conclusions

MMDS 08

C. Faloutsos

4
CMU SCS

Motivation
Data mining: ~ find patterns (rules, outliers)
• Problem#1: How do real graphs look like?
• Problem#2: How do they evolve?
• Problem#3: How to generate realistic graphs
TOOLS
• Problem#4: Who is the ‘master-mind’?
• Problem#5: Track communities over time
MMDS 08

C. Faloutsos

5
CMU SCS

Problem#1: Joint work with
Dr. Deepayan Chakrabarti
(CMU/Yahoo R.L.)

MMDS 08

C. Faloutsos

6
CMU SCS

Graphs - why should we care?

Internet Map
[lumeta.com]

Friendship Network
[Moody ’01]

MMDS 08

C. Faloutsos

Food Web
[Martinez ’91]

Protein Interactions
[genomebiology.com]

7
CMU SCS

Graphs - why should we care?
• IR: bi-partite graphs (doc-terms)

D1
DN

• web: hyper-text graph

• ... and more:
MMDS 08

C. Faloutsos

8

...

...

T1
TM
CMU SCS

Graphs - why should we care?
• network of companies & board-of-directors
members
• ‘viral’ marketing
• web-log (‘blog’) news propagation
• computer network security: email/IP traffic
and anomaly detection
• ....
MMDS 08

C. Faloutsos

9
CMU SCS

Problem #1 - network and graph
mining
•
•
•
•

MMDS 08

How does the Internet look like?
How does the web look like?
What is ‘normal’/‘abnormal’?
which patterns/laws hold?

C. Faloutsos

10
CMU SCS

Graph mining
• Are real graphs random?

MMDS 08

C. Faloutsos

11
CMU SCS

Laws and patterns
• Are real graphs random?
• A: NO!!
– Diameter
– in- and out- degree distributions
– other (surprising) patterns

MMDS 08

C. Faloutsos

12
CMU SCS

Solution#1
• Power law in the degree distribution
[SIGCOMM99]
internet domains

att.com

log(degree)

ibm.com

-0.82
log(rank)

MMDS 08

C. Faloutsos

13
CMU SCS

Solution#1’: Eigen Exponent E
Eigenvalue

Exponent = slope
E = -0.48
May 2001

Rank of decreasing eigenvalue

• A2: power law in the eigenvalues of the adjacency
matrix
MMDS 08

C. Faloutsos

14
CMU SCS

Solution#1’: Eigen Exponent E
Eigenvalue

Exponent = slope
E = -0.48
May 2001

Rank of decreasing eigenvalue

• [Papadimitriou, Mihail, ’02]: slope is ½ of rank
exponent
MMDS 08

C. Faloutsos

15
CMU SCS

But:
How about graphs from other domains?

MMDS 08

C. Faloutsos

16
CMU SCS

The Peer-to-Peer Topology

[Jovanovic+]

• Count versus degree
• Number of adjacent peers follows a power-law
MMDS 08

C. Faloutsos

17
CMU SCS

More power laws:
citation counts: (citeseer.nj.nec.com 6/2001)

log(count)

Ullman
log(#citations)
MMDS 08

C. Faloutsos

18
CMU SCS

More power laws:
• web hit counts [w/ A. Montgomery]

Web Site Traffic
log(count)
Zipf
``ebay’’

users

log(in-degree)
MMDS 08

C. Faloutsos

19

sites
CMU SCS

epinions.com
• who-trusts-whom
[Richardson +
Domingos, KDD
2001]

count

trusts-2000-people user

(out) degree
MMDS 08

C. Faloutsos

20
CMU SCS

Motivation
Data mining: ~ find patterns (rules, outliers)
• Problem#1: How do real graphs look like?
• Problem#2: How do they evolve?
• Problem#3: How to generate realistic graphs
TOOLS
• Problem#4: Who is the ‘master-mind’?
• Problem#5: Track communities over time
MMDS 08

C. Faloutsos

21
CMU SCS

Problem#2: Time evolution
• with Jure Leskovec
(CMU/MLD)

• and Jon Kleinberg (Cornell –
sabb. @ CMU)

MMDS 08

C. Faloutsos

22
CMU SCS

Evolution of the Diameter
• Prior work on Power Law graphs hints
at slowly growing diameter:
– diameter ~ O(log N)
– diameter ~ O(log log N)

• What is happening in real data?

MMDS 08

C. Faloutsos

23
CMU SCS

Evolution of the Diameter
• Prior work on Power Law graphs hints
at slowly growing diameter:
– diameter ~ O(log N)
– diameter ~ O(log log N)

• What is happening in real data?
• Diameter shrinks over time

MMDS 08

C. Faloutsos

24
CMU SCS

Diameter – ArXiv citation graph
• Citations among
physics papers
• 1992 –2003
• One graph per
year

diameter

time [years]
MMDS 08

C. Faloutsos

25
CMU SCS

Diameter – “Autonomous
Systems”
• Graph of Internet
• One graph per
day
• 1997 – 2000

diameter

number of nodes
MMDS 08

C. Faloutsos

26
CMU SCS

Diameter – “Affiliation Network”
• Graph of
collaborations in
physics – authors
linked to papers
• 10 years of data

diameter

time [years]
MMDS 08

C. Faloutsos

27
CMU SCS

Diameter – “Patents”
• Patent citation
network
• 25 years of data

diameter

time [years]
MMDS 08

C. Faloutsos

28
CMU SCS

Temporal Evolution of the Graphs
• N(t) … nodes at time t
• E(t) … edges at time t
• Suppose that
N(t+1) = 2 * N(t)

• Q: what is your guess for
E(t+1) =? 2 * E(t)

MMDS 08

C. Faloutsos

29
CMU SCS

Temporal Evolution of the Graphs
• N(t) … nodes at time t
• E(t) … edges at time t
• Suppose that
N(t+1) = 2 * N(t)

• Q: what is your guess for
E(t+1) =? 2 * E(t)

• A: over-doubled!
– But obeying the ``Densification Power Law’’
MMDS 08

C. Faloutsos

30
CMU SCS

Densification – Physics Citations
• Citations among
physics papers E(t)
• 2003:

??

– 29,555 papers,
352,807
citations

N(t)
MMDS 08

C. Faloutsos

31
CMU SCS

Densification – Physics Citations
• Citations among
physics papers E(t)
• 2003:

1.69

– 29,555 papers,
352,807
citations

N(t)
MMDS 08

C. Faloutsos

32
CMU SCS

Densification – Physics Citations
• Citations among
physics papers E(t)
• 2003:

1.69

– 29,555 papers,
352,807
citations

1: tree

N(t)
MMDS 08

C. Faloutsos

33
CMU SCS

Densification – Physics Citations
• Citations among
physics papers E(t)
• 2003:
– 29,555 papers,
352,807
citations

1.69

clique: 2

N(t)
MMDS 08

C. Faloutsos

34
CMU SCS

Densification – Patent Citations
• Citations among
patents granted E(t)
• 1999
1.66

– 2.9 million nodes
– 16.5 million
edges

• Each year is a
datapoint
MMDS 08

N(t)
C. Faloutsos

35
CMU SCS

Densification – Autonomous Systems
• Graph of
Internet
• 2000

E(t)
1.18

– 6,000 nodes
– 26,000 edges

• One graph per
day
N(t)
MMDS 08

C. Faloutsos

36
CMU SCS

Densification – Affiliation
Network
• Authors linked
to their
publications
• 2002

E(t)
1.15

– 60,000 nodes
• 20,000 authors
• 38,000 papers

– 133,000 edges
MMDS 08

C. Faloutsos

N(t)
37
CMU SCS

Motivation
Data mining: ~ find patterns (rules, outliers)
• Problem#1: How do real graphs look like?
• Problem#2: How do they evolve?
• Problem#3: How to generate realistic graphs
TOOLS
• Problem#4: Who is the ‘master-mind’?
• Problem#5: Track communities over time
MMDS 08

C. Faloutsos

38
CMU SCS

Problem#3: Generation
• Given a growing graph with count of nodes N1,
N2, …
• Generate a realistic sequence of graphs that will
obey all the patterns

MMDS 08

C. Faloutsos

39
CMU SCS

Problem Definition
• Given a growing graph with count of nodes N1, N2,
…
• Generate a realistic sequence of graphs that will
obey all the patterns
– Static Patterns
Power Law Degree Distribution
Power Law eigenvalue and eigenvector distribution
Small Diameter

– Dynamic Patterns
Growth Power Law
Shrinking/Stabilizing Diameters

MMDS 08

C. Faloutsos

40
CMU SCS

Problem Definition
• Given a growing graph with count of
nodes N1, N2, …
• Generate a realistic sequence of graphs that
will obey all the patterns
• Idea: Self-similarity
– Leads to power laws
– Communities within communities
–…

MMDS 08

C. Faloutsos

41
CMU SCS

Kronecker Product – a Graph

Intermediate stage

Adjacency08
MMDS matrix

C. Faloutsos

Adjacency matrix
42
CMU SCS

Kronecker Product – a Graph
• Continuing multiplying with G1 we obtain G4 and
so on …

MMDS 08

G4 adjacency matrix
C. Faloutsos

43
CMU SCS

Kronecker Product – a Graph
• Continuing multiplying with G1 we obtain G4 and
so on …

MMDS 08

G4 adjacency matrix
C. Faloutsos

44
CMU SCS

Kronecker Product – a Graph
• Continuing multiplying with G1 we obtain G4 and
so on …

MMDS 08

G4 adjacency matrix
C. Faloutsos

45
CMU SCS

Properties:
• We can PROVE that
–
–
–
–

Degree distribution is multinomial ~ power law
Diameter: constant
Eigenvalue distribution: multinomial
First eigenvector: multinomial

• See [Leskovec+, PKDD’05] for proofs

MMDS 08

C. Faloutsos

46
CMU SCS

Problem Definition
• Given a growing graph with nodes N1, N2, …
• Generate a realistic sequence of graphs that will obey all
the patterns
– Static Patterns

 Power Law Degree Distribution
 Power Law eigenvalue and eigenvector distribution
 Small Diameter

– Dynamic Patterns

 Growth Power Law
 Shrinking/Stabilizing Diameters
• First and only generator for which we can prove
all these properties
MMDS 08

C. Faloutsos

47
CMU SCS

skip

Stochastic Kronecker Graphs
• Create N1×N1 probability matrix P1
• Compute the kth Kronecker power Pk
• For each entry puv of Pk include an edge
(u,v) with probability puv
0.4 0.2
0.1 0.3

Kronecker
multiplication

P1

0.16 0.08 0.08 0.04
0.04 0.12 0.02 0.06

Instance
Matrix G2

0.04 0.02 0.12 0.06
0.01 0.03 0.03 0.09

flip biased
coins

Pk
MMDS 08

C. Faloutsos

48
CMU SCS

Experiments
• How well can we match real graphs?
– Arxiv: physics citations:
• 30,000 papers, 350,000 citations
• 10 years of data

– U.S. Patent citation network
• 4 million patents, 16 million citations
• 37 years of data

– Autonomous systems – graph of internet
• Single snapshot from January 2002
• 6,400 nodes, 26,000 edges

• We show both static and temporal patterns
MMDS 08

C. Faloutsos

49
CMU SCS

(Q: how to fit the parm’s?)
A:
• Stochastic version of Kronecker graphs +
• Max likelihood +
• Metropolis sampling
• [Leskovec+, ICML’07]

MMDS 08

C. Faloutsos

50
CMU SCS

Experiments on real AS graph
Degree distribution

Adjacency matrix eigen values

MMDS 08

C. Faloutsos

Hop plot

Network value

51
CMU SCS

Conclusions
• Kronecker graphs have:
– All the static properties
 Heavy tailed degree distributions
 Small diameter
 Multinomial eigenvalues and eigenvectors
– All the temporal properties
 Densification Power Law
 Shrinking/Stabilizing Diameters
– We can formally prove these results
MMDS 08

C. Faloutsos

52
CMU SCS

Motivation
Data mining: ~ find patterns (rules, outliers)
• Problem#1: How do real graphs look like?
• Problem#2: How do they evolve?
• Problem#3: How to generate realistic graphs
TOOLS
• Problem#4: Who is the ‘master-mind’?
• Problem#5: Track communities over time
MMDS 08

C. Faloutsos

53
CMU SCS

Problem#4: MasterMind – ‘CePS’
• w/ Hanghang Tong,
KDD 2006
• htong <at> cs.cmu.edu

MMDS 08

C. Faloutsos

54
CMU SCS

Center-Piece Subgraph(Ceps)
• Given Q query nodes
• Find Center-piece ( ≤ b )
• App.
– Social Networks
– Law Inforcement, …

• Idea:
– Proximity -> random walk
with restarts
MMDS 08

C. Faloutsos

55
CMU SCS

Case Study: AND query
R. Agrawal

Jiawei Han

V. Vapnik

M. Jordan

MMDS 08

C. Faloutsos

56
CMU SCS

Case Study: AND query

MMDS 08

C. Faloutsos

57
CMU SCS

Case Study: AND query

MMDS 08

C. Faloutsos

58
CMU SCS

databases

ML/Statistics

2_SoftAnd query
MMDS 08

C. Faloutsos

59
CMU SCS

Conclusions
•
•
•
•

Q1:How to measure the importance?
A1: RWR+K_SoftAnd
Q2:How to do it efficiently?
A2:Graph Partition (Fast CePS)
– ~90% quality
– 150x speedup (ICDM’06, b.p. award)

MMDS 08

C. Faloutsos

60
CMU SCS

Outline
•
•
•
•

Problem definition / Motivation
Static & dynamic laws; generators
Tools: CenterPiece graphs; Tensors
Other projects (Virus propagation, e-bay
fraud detection)
• Conclusions

MMDS 08

C. Faloutsos

61
CMU SCS

Motivation
Data mining: ~ find patterns (rules, outliers)
• Problem#1: How do real graphs look like?
• Problem#2: How do they evolve?
• Problem#3: How to generate realistic graphs
TOOLS
• Problem#4: Who is the ‘master-mind’?
• Problem#5: Track communities over time
MMDS 08

C. Faloutsos

62
CMU SCS

Tensors for time evolving graphs
• [Jimeng Sun+
KDD’06]
• [ “ , SDM’07]
• [ CF, Kolda, Sun,
SDM’07 tutorial]

MMDS 08

C. Faloutsos

63
CMU SCS

Social network analysis
• Static: find community structures

Keywords

MMDS 08

Authors

1990

DB

C. Faloutsos

64
CMU SCS

Social network analysis
• Static: find community structures

MMDS 08

Authors

1992
1991
1990

DB

C. Faloutsos

65
CMU SCS

Social network analysis
• Static: find community structures
• Dynamic: monitor community structure evolution;
spot abnormal individuals; abnormal time-stamps
Keywords

2004

DM

DB

MMDS 08

Authors

1990
DB

C. Faloutsos

66
CMU SCS

Application 1: Multiway latent
semantic indexing (LSI)
Philip Yu

2004
1990
authors

DB
Ukeyword

DB
keyword

Michael
Stonebraker

Uauthors

DM

Pattern

Query

• Projection matrices specify the clusters
• Core tensors give cluster activation level
MMDS 08

C. Faloutsos

67
CMU SCS

Bibliographic data (DBLP)
• Papers from VLDB and KDD conferences
• Construct 2nd order tensors with yearly
windows with <author, keywords>
• Each tensor: 4584×3741
• 11 timestamps (years)

MMDS 08

C. Faloutsos

68
CMU SCS

Multiway LSI
Authors

Keywords

Year

michael carey, michael
stonebraker, h. jagadish,
hector garcia-molina

queri,parallel,optimization,concurr,
objectorient

1995

distribut,systems,view,storage,servic,pr
ocess,cache

2004

streams,pattern,support, cluster,
index,gener,queri

2004

surajit chaudhuri,mitch
cherniack,michael
stonebraker,ugur etintemel

DB

jiawei han,jian pei,philip s. yu,
jianyong wang,charu c. aggarwal

DM
• Two groups are correctly identified: Databases and Data
mining
• People and concepts are drifting over time

MMDS 08

C. Faloutsos

69
CMU SCS

Network forensics
• Directional network flows
• A large ISP with 100 POPs, each POP 10Gbps link
capacity [Hotnets2004]
– 450 GB/hour with compression

• Task: Identify abnormal traffic pattern and find out the
cause
normal traffic

destination

destination

abnormal traffic

source
source
(with Prof. Hui FaloutsosDr. Yinglian Xie) 70
MMDS 08
C. Zhang and
CMU SCS

Conclusions
Tensor-based methods (WTA/DTA/STA):
• spot patterns and anomalies on time
evolving graphs, and
• on streams (monitoring)

MMDS 08

C. Faloutsos

71
CMU SCS

Motivation
Data mining: ~ find patterns (rules, outliers)
• Problem#1: How do real graphs look like?
• Problem#2: How do they evolve?
• Problem#3: How to generate realistic graphs
TOOLS
• Problem#4: Who is the ‘master-mind’?
• Problem#5: Track communities over time
MMDS 08

C. Faloutsos

72
CMU SCS

Outline
•
•
•
•

Problem definition / Motivation
Static & dynamic laws; generators
Tools: CenterPiece graphs; Tensors
Other projects (Virus propagation, e-bay
fraud detection, blogs)
• Conclusions

MMDS 08

C. Faloutsos

73
CMU SCS

Virus propagation
• How do viruses/rumors propagate?
• Blog influence?
• Will a flu-like virus linger, or will it
become extinct soon?

MMDS 08

C. Faloutsos

74
CMU SCS

The model: SIS
• ‘Flu’ like: Susceptible-Infected-Susceptible
• Virus ‘strength’ s= β/δ
Healthy
Prob. δ

N2

Prob. β

N1

N

Infected
MMDS 08

Pro
b. β

N3
C. Faloutsos

75
CMU SCS

Epidemic threshold τ
of a graph: the value of τ, such that
if strength s = β / δ < τ
an epidemic can not happen
Thus,
• given a graph
• compute its epidemic threshold

MMDS 08

C. Faloutsos

76
CMU SCS

Epidemic threshold τ
What should τ depend on?
• avg. degree? and/or highest degree?
• and/or variance of degree?
• and/or third moment of degree?
• and/or diameter?

MMDS 08

C. Faloutsos

77
CMU SCS

Epidemic threshold
• [Theorem] We have no epidemic, if

β/δ <τ = 1/ λ1,A

MMDS 08

C. Faloutsos

78
CMU SCS

Epidemic threshold
• [Theorem] We have no epidemic, if
epidemic threshold

recovery prob.

β/δ <τ = 1/ λ1,A
largest eigenvalue
of adj. matrix A

attack prob.

Proof: [Wang+03]
MMDS 08

C. Faloutsos

79
CMU SCS

Experiments (Oregon)
β/δ > τ
(above threshold)

β/δ = τ
(at the threshold)
β/δ < τ
(below threshold)
MMDS 08

C. Faloutsos

80
CMU SCS

Outline
•
•
•
•

Problem definition / Motivation
Static & dynamic laws; generators
Tools: CenterPiece graphs; Tensors
Other projects (Virus propagation, e-bay
fraud detection, blogs)
• Conclusions

MMDS 08

C. Faloutsos

81
CMU SCS

E-bay Fraud detection

w/ Polo Chau &
Shashank Pandit, CMU

MMDS 08

C. Faloutsos

82
CMU SCS

E-bay Fraud detection
• lines: positive feedbacks
• would you buy from him/her?

MMDS 08

C. Faloutsos

83
CMU SCS

E-bay Fraud detection
• lines: positive feedbacks
• would you buy from him/her?
• or him/her?

MMDS 08

C. Faloutsos

84
CMU SCS

E-bay Fraud detection - NetProbe

MMDS 08

C. Faloutsos

85
CMU SCS

Outline
•
•
•
•

Problem definition / Motivation
Static & dynamic laws; generators
Tools: CenterPiece graphs; Tensors
Other projects (Virus propagation, e-bay
fraud detection, blogs)
• Conclusions

MMDS 08

C. Faloutsos

86
CMU SCS

Blog analysis
•
•
•
•

with Mary McGlohon (CMU)
Jure Leskovec (CMU)
Natalie Glance (now at Google)
Mat Hurst (now at MSR)
[SDM’07]

MMDS 08

C. Faloutsos

87
CMU SCS

Cascades on the Blogosphere
B1

B2

1
1

B1

a
B2

1
B3

B4

Blogosphere
blogs + posts

1

B3

b

2
B4

Blog network
links among blogs

3

d
e

Post network
links among posts

Q1: popularity-decay of a post?
Q2: degree distributions?
MMDS 08

C. Faloutsos

c

88
CMU SCS

Q1: popularity over time
# in links

1

2

3

days after post

Post popularity drops-off – exponentially?

MMDS 08

C. Faloutsos

89

Days after post
CMU SCS

Q1: popularity over time
# in links
(log)

1

2

3

days after post
(log)

Post popularity drops-off – exponentially?
POWER LAW!
Exponent?
MMDS 08

C. Faloutsos

90

Days after post
CMU SCS

Q1: popularity over time
# in links
(log)

-1.6
1

2

3

days after post
(log)

Post popularity drops-off – exponentially?
POWER LAW!
Exponent? -1.6 (close to -1.5: Barabasi’s stack model)
MMDS 08

C. Faloutsos

91

Days after post
CMU SCS

Q2: degree distribution
44,356 nodes, 122,153 edges. Half of blogs
belong to largest connected component.
count
B
1

??

1
1
1

B
2

2

B

B

3

3

4

blog in-degree
MMDS 08

C. Faloutsos

92
CMU SCS

Q2: degree distribution
44,356 nodes, 122,153 edges. Half of blogs
belong to largest connected component.
count
B
1

1
1
1

B
2

2

B

B

3

3

4

blog in-degree
MMDS 08

C. Faloutsos

93
CMU SCS

Q2: degree distribution
44,356 nodes, 122,153 edges. Half of blogs
belong to largest connected component.
count

in-degree slope: -1.7
out-degree: -3
‘rich get richer’
MMDS 08

C. Faloutsos

blog in-degree
94
CMU SCS

Next steps:
• edges with categorical attributes and/or timestamps
• nodes with attributes
• scalability (hadoop – PetaByte scale)
– first eigenvalue; diameter [done]
– rest eigenvalues; community detection [to be done]
– modularity, anomalies etc etc

• visualization (-> summarization)
MMDS 08

C. Faloutsos

95
CMU SCS

E.g.: self-* system @ CMU
• >200 nodes
• target: 1 PetaByte

MMDS 08

C. Faloutsos

96
CMU SCS

D.I.S.C.
• ‘Data Intensive Scientific Computing’
[R. Bryant, CMU]
– ‘big data’
– http://www.cs.cmu.edu/~bryant/pubdir/cmucs-07-128.pdf

MMDS 08

C. Faloutsos

97
CMU SCS

Scalability
• Google: > 450,000 processors in clusters of ~2000
processors each
Barroso, Dean, Hölzle, “Web Search for a
Planet: The Google Cluster Architecture”
IEEE Micro 2003

•
•
•
•

Yahoo: 5Pb of data [Fayyad, KDD’07]
Problem: machine failures, on a daily basis
How to parallelize data mining tasks, then?
A: map/reduce – hadoop (open-source clone) http://
hadoop.apache.org/

MMDS 08

C. Faloutsos

98
CMU SCS

2’ intro to hadoop
• master-slave architecture; n-way replication
(default n=3)
• ‘group by’ of SQL (in parallel, fault-tolerant way)
• e.g, find histogram of word frequency
– slaves compute local histograms
– master merges into global histogram

select course-id, count(*)
from ENROLLMENT
group by course-id
MMDS 08

C. Faloutsos

99
CMU SCS

2’ intro to hadoop
• master-slave architecture; n-way replication
(default n=3)
• ‘group by’ of SQL (in parallel, fault-tolerant way)
• e.g, find histogram of word frequency
– slaves compute local histograms
– master merges into global histogram

select course-id, count(*)
from ENROLLMENT
group by course-id
MMDS 08

C. Faloutsos

reduce
map
100
CMU SCS

OVERALL CONCLUSIONS
• Graphs: Self-similarity and power laws
work, when textbook methods fail!
• New patterns (shrinking diameter!)
• New generator: Kronecker
• SVD / tensors / RWR: valuable tools
• hadoop/mapReduce for scalability
MMDS 08

C. Faloutsos

101
CMU SCS

References
• Hanghang Tong, Christos Faloutsos, and Jia-Yu
Pan
Fast Random Walk with Restart and Its Applications
ICDM 2006, Hong Kong.
• Hanghang Tong, Christos Faloutsos Center-Piece
Subgraphs
: Problem Definition and Fast Solutions, KDD
2006, Philadelphia, PA

MMDS 08

C. Faloutsos

102
CMU SCS

References

• Jure Leskovec, Jon Kleinberg and Christos
Faloutsos
Graphs over Time: Densification Laws, Shrinking Diame
KDD 2005, Chicago, IL. ("Best Research Paper"
award).
• Jure Leskovec, Deepayan Chakrabarti, Jon
Kleinberg, Christos Faloutsos
Realistic, Mathematically Tractable Graph Generation
(ECML/PKDD 2005), Porto, Portugal, 2005.

MMDS 08

C. Faloutsos

103
CMU SCS

References

• Jure Leskovec and Christos Faloutsos, Scalable
Modeling of Real Graphs using Kronecker
Multiplication, ICML 2007, Corvallis, OR, USA
• Shashank Pandit, Duen Horng (Polo) Chau,
Samuel Wang and Christos Faloutsos NetProbe
: A Fast and Scalable System for Fraud Detection in Onl
WWW 2007, Banff, Alberta, Canada, May 8-12,
2007.
• Jimeng Sun, Dacheng Tao, Christos Faloutsos
Beyond Streams and Graphs: Dynamic Tensor Analysis,
KDD 2006, Philadelphia, PA
MMDS 08

C. Faloutsos

104
CMU SCS

References
• Jimeng Sun, Yinglian Xie, Hui Zhang, Christos
Faloutsos. Less is More: Compact Matrix
Decomposition for Large Sparse Graphs, SDM,
Minneapolis, Minnesota, Apr 2007. [pdf]

MMDS 08

C. Faloutsos

105
CMU SCS

Contact info:
www. cs.cmu.edu /~christos
(w/ papers, datasets, code, etc)

MMDS 08

C. Faloutsos

106

Contenu connexe

En vedette

Complex and Social Network Analysis in Python
Complex and Social Network Analysis in PythonComplex and Social Network Analysis in Python
Complex and Social Network Analysis in Pythonrik0
 
Kick start graph visualization projects
Kick start graph visualization projectsKick start graph visualization projects
Kick start graph visualization projectsLinkurious
 
Prof. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network AnalysisProf. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network AnalysisHendrik Speck
 
Large Graph Mining
Large Graph MiningLarge Graph Mining
Large Graph MiningSabri Skhiri
 
Introduction to Social Network Analysis
Introduction to Social Network AnalysisIntroduction to Social Network Analysis
Introduction to Social Network AnalysisPatti Anklam
 
Data Mining Seminar - Graph Mining and Social Network Analysis
Data Mining Seminar - Graph Mining and Social Network AnalysisData Mining Seminar - Graph Mining and Social Network Analysis
Data Mining Seminar - Graph Mining and Social Network Analysisvwchu
 
Social Network Analysis: applications for education research
Social Network Analysis: applications for education researchSocial Network Analysis: applications for education research
Social Network Analysis: applications for education researchChristian Bokhove
 
Social Network Analysis - Lecture 4 in Introduction to Computational Social S...
Social Network Analysis - Lecture 4 in Introduction to Computational Social S...Social Network Analysis - Lecture 4 in Introduction to Computational Social S...
Social Network Analysis - Lecture 4 in Introduction to Computational Social S...Lauri Eloranta
 
Social network analysis course 2010 - 2011
Social network analysis course 2010 - 2011Social network analysis course 2010 - 2011
Social network analysis course 2010 - 2011guillaume ereteo
 
Social network analysis & Big Data - Telecommunications and more
Social network analysis & Big Data - Telecommunications and moreSocial network analysis & Big Data - Telecommunications and more
Social network analysis & Big Data - Telecommunications and moreWael Elrifai
 
Social network analysis intro part I
Social network analysis intro part ISocial network analysis intro part I
Social network analysis intro part ITHomas Plotkowiak
 
Social Network Analysis & an Introduction to Tools
Social Network Analysis & an Introduction to ToolsSocial Network Analysis & an Introduction to Tools
Social Network Analysis & an Introduction to ToolsPatti Anklam
 

En vedette (12)

Complex and Social Network Analysis in Python
Complex and Social Network Analysis in PythonComplex and Social Network Analysis in Python
Complex and Social Network Analysis in Python
 
Kick start graph visualization projects
Kick start graph visualization projectsKick start graph visualization projects
Kick start graph visualization projects
 
Prof. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network AnalysisProf. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network Analysis
 
Large Graph Mining
Large Graph MiningLarge Graph Mining
Large Graph Mining
 
Introduction to Social Network Analysis
Introduction to Social Network AnalysisIntroduction to Social Network Analysis
Introduction to Social Network Analysis
 
Data Mining Seminar - Graph Mining and Social Network Analysis
Data Mining Seminar - Graph Mining and Social Network AnalysisData Mining Seminar - Graph Mining and Social Network Analysis
Data Mining Seminar - Graph Mining and Social Network Analysis
 
Social Network Analysis: applications for education research
Social Network Analysis: applications for education researchSocial Network Analysis: applications for education research
Social Network Analysis: applications for education research
 
Social Network Analysis - Lecture 4 in Introduction to Computational Social S...
Social Network Analysis - Lecture 4 in Introduction to Computational Social S...Social Network Analysis - Lecture 4 in Introduction to Computational Social S...
Social Network Analysis - Lecture 4 in Introduction to Computational Social S...
 
Social network analysis course 2010 - 2011
Social network analysis course 2010 - 2011Social network analysis course 2010 - 2011
Social network analysis course 2010 - 2011
 
Social network analysis & Big Data - Telecommunications and more
Social network analysis & Big Data - Telecommunications and moreSocial network analysis & Big Data - Telecommunications and more
Social network analysis & Big Data - Telecommunications and more
 
Social network analysis intro part I
Social network analysis intro part ISocial network analysis intro part I
Social network analysis intro part I
 
Social Network Analysis & an Introduction to Tools
Social Network Analysis & an Introduction to ToolsSocial Network Analysis & an Introduction to Tools
Social Network Analysis & an Introduction to Tools
 

Similaire à Graph mining

Time Series Data Mining - from PhD to Startup
Time Series Data Mining - from PhD to StartupTime Series Data Mining - from PhD to Startup
Time Series Data Mining - from PhD to StartupPeter Laurinec
 
Tetiana Bogodorova "Data Science for Electric Power Systems"
Tetiana Bogodorova "Data Science for Electric Power Systems"Tetiana Bogodorova "Data Science for Electric Power Systems"
Tetiana Bogodorova "Data Science for Electric Power Systems"Lviv Startup Club
 
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering Unsupervised Learning: Clustering
Unsupervised Learning: Clustering Experfy
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsHigh-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsJason Riedy
 
Lecture 1
Lecture 1Lecture 1
Lecture 1eseem
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering odsc
 
Modeling and Simulation of Electrical Power Systems using OpenIPSL.org and Gr...
Modeling and Simulation of Electrical Power Systems using OpenIPSL.org and Gr...Modeling and Simulation of Electrical Power Systems using OpenIPSL.org and Gr...
Modeling and Simulation of Electrical Power Systems using OpenIPSL.org and Gr...Luigi Vanfretti
 
2.ME VLSI 233-251.doc
2.ME VLSI 233-251.doc2.ME VLSI 233-251.doc
2.ME VLSI 233-251.docSRI NISHITH
 
Macaluso antonio meetup dli 2020-12-15
Macaluso antonio  meetup dli 2020-12-15Macaluso antonio  meetup dli 2020-12-15
Macaluso antonio meetup dli 2020-12-15Deep Learning Italia
 
1 Unit-1 DEC B.Tech ECE III Sem Syllabus & Intro.pptx
1 Unit-1 DEC B.Tech ECE III Sem Syllabus & Intro.pptx1 Unit-1 DEC B.Tech ECE III Sem Syllabus & Intro.pptx
1 Unit-1 DEC B.Tech ECE III Sem Syllabus & Intro.pptxSatish Chandra
 
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdfHailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdfcookie1969
 
012675925c0f652bb179b6a33cd3d13b_MIT6_003F11_lec01.pdf
012675925c0f652bb179b6a33cd3d13b_MIT6_003F11_lec01.pdf012675925c0f652bb179b6a33cd3d13b_MIT6_003F11_lec01.pdf
012675925c0f652bb179b6a33cd3d13b_MIT6_003F11_lec01.pdfBasavaRajeshwari2
 

Similaire à Graph mining (20)

L10-handout.pdf
L10-handout.pdfL10-handout.pdf
L10-handout.pdf
 
L10-handout.pdf
L10-handout.pdfL10-handout.pdf
L10-handout.pdf
 
L10.pdf
L10.pdfL10.pdf
L10.pdf
 
L10.pdf
L10.pdfL10.pdf
L10.pdf
 
Time Series Data Mining - from PhD to Startup
Time Series Data Mining - from PhD to StartupTime Series Data Mining - from PhD to Startup
Time Series Data Mining - from PhD to Startup
 
Graph Evolution Models
Graph Evolution ModelsGraph Evolution Models
Graph Evolution Models
 
Tetiana Bogodorova "Data Science for Electric Power Systems"
Tetiana Bogodorova "Data Science for Electric Power Systems"Tetiana Bogodorova "Data Science for Electric Power Systems"
Tetiana Bogodorova "Data Science for Electric Power Systems"
 
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsHigh-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
Lect21
Lect21Lect21
Lect21
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
 
Distributed Systems in the Post-Moore Era.pptx
Distributed Systems in the Post-Moore Era.pptxDistributed Systems in the Post-Moore Era.pptx
Distributed Systems in the Post-Moore Era.pptx
 
Modeling and Simulation of Electrical Power Systems using OpenIPSL.org and Gr...
Modeling and Simulation of Electrical Power Systems using OpenIPSL.org and Gr...Modeling and Simulation of Electrical Power Systems using OpenIPSL.org and Gr...
Modeling and Simulation of Electrical Power Systems using OpenIPSL.org and Gr...
 
Thesis presentation
Thesis presentationThesis presentation
Thesis presentation
 
2.ME VLSI 233-251.doc
2.ME VLSI 233-251.doc2.ME VLSI 233-251.doc
2.ME VLSI 233-251.doc
 
Macaluso antonio meetup dli 2020-12-15
Macaluso antonio  meetup dli 2020-12-15Macaluso antonio  meetup dli 2020-12-15
Macaluso antonio meetup dli 2020-12-15
 
1 Unit-1 DEC B.Tech ECE III Sem Syllabus & Intro.pptx
1 Unit-1 DEC B.Tech ECE III Sem Syllabus & Intro.pptx1 Unit-1 DEC B.Tech ECE III Sem Syllabus & Intro.pptx
1 Unit-1 DEC B.Tech ECE III Sem Syllabus & Intro.pptx
 
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdfHailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
 
012675925c0f652bb179b6a33cd3d13b_MIT6_003F11_lec01.pdf
012675925c0f652bb179b6a33cd3d13b_MIT6_003F11_lec01.pdf012675925c0f652bb179b6a33cd3d13b_MIT6_003F11_lec01.pdf
012675925c0f652bb179b6a33cd3d13b_MIT6_003F11_lec01.pdf
 

Plus de Houw Liong The

Museumgeologi 130427165857-phpapp02
Museumgeologi 130427165857-phpapp02Museumgeologi 130427165857-phpapp02
Museumgeologi 130427165857-phpapp02Houw Liong The
 
Research in data mining
Research in data miningResearch in data mining
Research in data miningHouw Liong The
 
Fisika & komputasi cerdas
Fisika & komputasi cerdasFisika & komputasi cerdas
Fisika & komputasi cerdasHouw Liong The
 
Sharma : social networks
Sharma : social networksSharma : social networks
Sharma : social networksHouw Liong The
 
Introduction to-graph-theory-1204617648178088-2
Introduction to-graph-theory-1204617648178088-2Introduction to-graph-theory-1204617648178088-2
Introduction to-graph-theory-1204617648178088-2Houw Liong The
 
Chaper 13 trend, Han & Kamber
Chaper 13 trend, Han & KamberChaper 13 trend, Han & Kamber
Chaper 13 trend, Han & KamberHouw Liong The
 
Capter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberCapter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberHouw Liong The
 
Chapter 11 cluster advanced, Han & Kamber
Chapter 11 cluster advanced, Han & KamberChapter 11 cluster advanced, Han & Kamber
Chapter 11 cluster advanced, Han & KamberHouw Liong The
 
Web & text mining lecture10
Web & text mining lecture10Web & text mining lecture10
Web & text mining lecture10Houw Liong The
 
Graph mining seminar_2009
Graph mining seminar_2009Graph mining seminar_2009
Graph mining seminar_2009Houw Liong The
 
Chapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text miningChapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text miningHouw Liong The
 
Chapter 09 classification advanced
Chapter 09 classification advancedChapter 09 classification advanced
Chapter 09 classification advancedHouw Liong The
 
System dynamics prof nagurney
System dynamics prof nagurneySystem dynamics prof nagurney
System dynamics prof nagurneyHouw Liong The
 

Plus de Houw Liong The (20)

Museumgeologi 130427165857-phpapp02
Museumgeologi 130427165857-phpapp02Museumgeologi 130427165857-phpapp02
Museumgeologi 130427165857-phpapp02
 
Space weather
Space weather Space weather
Space weather
 
Research in data mining
Research in data miningResearch in data mining
Research in data mining
 
Indonesia
IndonesiaIndonesia
Indonesia
 
Canfis
CanfisCanfis
Canfis
 
Climate Change
Climate ChangeClimate Change
Climate Change
 
Space Weather
Space Weather Space Weather
Space Weather
 
Fisika komputasi
Fisika komputasiFisika komputasi
Fisika komputasi
 
Fisika & komputasi cerdas
Fisika & komputasi cerdasFisika & komputasi cerdas
Fisika & komputasi cerdas
 
Climate model
Climate modelClimate model
Climate model
 
Sharma : social networks
Sharma : social networksSharma : social networks
Sharma : social networks
 
Introduction to-graph-theory-1204617648178088-2
Introduction to-graph-theory-1204617648178088-2Introduction to-graph-theory-1204617648178088-2
Introduction to-graph-theory-1204617648178088-2
 
Chaper 13 trend, Han & Kamber
Chaper 13 trend, Han & KamberChaper 13 trend, Han & Kamber
Chaper 13 trend, Han & Kamber
 
Capter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberCapter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & Kamber
 
Chapter 11 cluster advanced, Han & Kamber
Chapter 11 cluster advanced, Han & KamberChapter 11 cluster advanced, Han & Kamber
Chapter 11 cluster advanced, Han & Kamber
 
Web & text mining lecture10
Web & text mining lecture10Web & text mining lecture10
Web & text mining lecture10
 
Graph mining seminar_2009
Graph mining seminar_2009Graph mining seminar_2009
Graph mining seminar_2009
 
Chapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text miningChapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text mining
 
Chapter 09 classification advanced
Chapter 09 classification advancedChapter 09 classification advanced
Chapter 09 classification advanced
 
System dynamics prof nagurney
System dynamics prof nagurneySystem dynamics prof nagurney
System dynamics prof nagurney
 

Dernier

Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 

Dernier (20)

Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 

Graph mining

  • 1. CMU SCS Graph Mining: Laws, Generators and Tools Christos Faloutsos CMU
  • 2. CMU SCS Thanks • Michael Mahoney • Lek-Heng Lim • Petros Drineas • Gunnar Carlsson MMDS 08 C. Faloutsos 2
  • 3. CMU SCS Outline • • • • Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection) • Conclusions MMDS 08 C. Faloutsos 4
  • 4. CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do real graphs look like? • Problem#2: How do they evolve? • Problem#3: How to generate realistic graphs TOOLS • Problem#4: Who is the ‘master-mind’? • Problem#5: Track communities over time MMDS 08 C. Faloutsos 5
  • 5. CMU SCS Problem#1: Joint work with Dr. Deepayan Chakrabarti (CMU/Yahoo R.L.) MMDS 08 C. Faloutsos 6
  • 6. CMU SCS Graphs - why should we care? Internet Map [lumeta.com] Friendship Network [Moody ’01] MMDS 08 C. Faloutsos Food Web [Martinez ’91] Protein Interactions [genomebiology.com] 7
  • 7. CMU SCS Graphs - why should we care? • IR: bi-partite graphs (doc-terms) D1 DN • web: hyper-text graph • ... and more: MMDS 08 C. Faloutsos 8 ... ... T1 TM
  • 8. CMU SCS Graphs - why should we care? • network of companies & board-of-directors members • ‘viral’ marketing • web-log (‘blog’) news propagation • computer network security: email/IP traffic and anomaly detection • .... MMDS 08 C. Faloutsos 9
  • 9. CMU SCS Problem #1 - network and graph mining • • • • MMDS 08 How does the Internet look like? How does the web look like? What is ‘normal’/‘abnormal’? which patterns/laws hold? C. Faloutsos 10
  • 10. CMU SCS Graph mining • Are real graphs random? MMDS 08 C. Faloutsos 11
  • 11. CMU SCS Laws and patterns • Are real graphs random? • A: NO!! – Diameter – in- and out- degree distributions – other (surprising) patterns MMDS 08 C. Faloutsos 12
  • 12. CMU SCS Solution#1 • Power law in the degree distribution [SIGCOMM99] internet domains att.com log(degree) ibm.com -0.82 log(rank) MMDS 08 C. Faloutsos 13
  • 13. CMU SCS Solution#1’: Eigen Exponent E Eigenvalue Exponent = slope E = -0.48 May 2001 Rank of decreasing eigenvalue • A2: power law in the eigenvalues of the adjacency matrix MMDS 08 C. Faloutsos 14
  • 14. CMU SCS Solution#1’: Eigen Exponent E Eigenvalue Exponent = slope E = -0.48 May 2001 Rank of decreasing eigenvalue • [Papadimitriou, Mihail, ’02]: slope is ½ of rank exponent MMDS 08 C. Faloutsos 15
  • 15. CMU SCS But: How about graphs from other domains? MMDS 08 C. Faloutsos 16
  • 16. CMU SCS The Peer-to-Peer Topology [Jovanovic+] • Count versus degree • Number of adjacent peers follows a power-law MMDS 08 C. Faloutsos 17
  • 17. CMU SCS More power laws: citation counts: (citeseer.nj.nec.com 6/2001) log(count) Ullman log(#citations) MMDS 08 C. Faloutsos 18
  • 18. CMU SCS More power laws: • web hit counts [w/ A. Montgomery] Web Site Traffic log(count) Zipf ``ebay’’ users log(in-degree) MMDS 08 C. Faloutsos 19 sites
  • 19. CMU SCS epinions.com • who-trusts-whom [Richardson + Domingos, KDD 2001] count trusts-2000-people user (out) degree MMDS 08 C. Faloutsos 20
  • 20. CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do real graphs look like? • Problem#2: How do they evolve? • Problem#3: How to generate realistic graphs TOOLS • Problem#4: Who is the ‘master-mind’? • Problem#5: Track communities over time MMDS 08 C. Faloutsos 21
  • 21. CMU SCS Problem#2: Time evolution • with Jure Leskovec (CMU/MLD) • and Jon Kleinberg (Cornell – sabb. @ CMU) MMDS 08 C. Faloutsos 22
  • 22. CMU SCS Evolution of the Diameter • Prior work on Power Law graphs hints at slowly growing diameter: – diameter ~ O(log N) – diameter ~ O(log log N) • What is happening in real data? MMDS 08 C. Faloutsos 23
  • 23. CMU SCS Evolution of the Diameter • Prior work on Power Law graphs hints at slowly growing diameter: – diameter ~ O(log N) – diameter ~ O(log log N) • What is happening in real data? • Diameter shrinks over time MMDS 08 C. Faloutsos 24
  • 24. CMU SCS Diameter – ArXiv citation graph • Citations among physics papers • 1992 –2003 • One graph per year diameter time [years] MMDS 08 C. Faloutsos 25
  • 25. CMU SCS Diameter – “Autonomous Systems” • Graph of Internet • One graph per day • 1997 – 2000 diameter number of nodes MMDS 08 C. Faloutsos 26
  • 26. CMU SCS Diameter – “Affiliation Network” • Graph of collaborations in physics – authors linked to papers • 10 years of data diameter time [years] MMDS 08 C. Faloutsos 27
  • 27. CMU SCS Diameter – “Patents” • Patent citation network • 25 years of data diameter time [years] MMDS 08 C. Faloutsos 28
  • 28. CMU SCS Temporal Evolution of the Graphs • N(t) … nodes at time t • E(t) … edges at time t • Suppose that N(t+1) = 2 * N(t) • Q: what is your guess for E(t+1) =? 2 * E(t) MMDS 08 C. Faloutsos 29
  • 29. CMU SCS Temporal Evolution of the Graphs • N(t) … nodes at time t • E(t) … edges at time t • Suppose that N(t+1) = 2 * N(t) • Q: what is your guess for E(t+1) =? 2 * E(t) • A: over-doubled! – But obeying the ``Densification Power Law’’ MMDS 08 C. Faloutsos 30
  • 30. CMU SCS Densification – Physics Citations • Citations among physics papers E(t) • 2003: ?? – 29,555 papers, 352,807 citations N(t) MMDS 08 C. Faloutsos 31
  • 31. CMU SCS Densification – Physics Citations • Citations among physics papers E(t) • 2003: 1.69 – 29,555 papers, 352,807 citations N(t) MMDS 08 C. Faloutsos 32
  • 32. CMU SCS Densification – Physics Citations • Citations among physics papers E(t) • 2003: 1.69 – 29,555 papers, 352,807 citations 1: tree N(t) MMDS 08 C. Faloutsos 33
  • 33. CMU SCS Densification – Physics Citations • Citations among physics papers E(t) • 2003: – 29,555 papers, 352,807 citations 1.69 clique: 2 N(t) MMDS 08 C. Faloutsos 34
  • 34. CMU SCS Densification – Patent Citations • Citations among patents granted E(t) • 1999 1.66 – 2.9 million nodes – 16.5 million edges • Each year is a datapoint MMDS 08 N(t) C. Faloutsos 35
  • 35. CMU SCS Densification – Autonomous Systems • Graph of Internet • 2000 E(t) 1.18 – 6,000 nodes – 26,000 edges • One graph per day N(t) MMDS 08 C. Faloutsos 36
  • 36. CMU SCS Densification – Affiliation Network • Authors linked to their publications • 2002 E(t) 1.15 – 60,000 nodes • 20,000 authors • 38,000 papers – 133,000 edges MMDS 08 C. Faloutsos N(t) 37
  • 37. CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do real graphs look like? • Problem#2: How do they evolve? • Problem#3: How to generate realistic graphs TOOLS • Problem#4: Who is the ‘master-mind’? • Problem#5: Track communities over time MMDS 08 C. Faloutsos 38
  • 38. CMU SCS Problem#3: Generation • Given a growing graph with count of nodes N1, N2, … • Generate a realistic sequence of graphs that will obey all the patterns MMDS 08 C. Faloutsos 39
  • 39. CMU SCS Problem Definition • Given a growing graph with count of nodes N1, N2, … • Generate a realistic sequence of graphs that will obey all the patterns – Static Patterns Power Law Degree Distribution Power Law eigenvalue and eigenvector distribution Small Diameter – Dynamic Patterns Growth Power Law Shrinking/Stabilizing Diameters MMDS 08 C. Faloutsos 40
  • 40. CMU SCS Problem Definition • Given a growing graph with count of nodes N1, N2, … • Generate a realistic sequence of graphs that will obey all the patterns • Idea: Self-similarity – Leads to power laws – Communities within communities –… MMDS 08 C. Faloutsos 41
  • 41. CMU SCS Kronecker Product – a Graph Intermediate stage Adjacency08 MMDS matrix C. Faloutsos Adjacency matrix 42
  • 42. CMU SCS Kronecker Product – a Graph • Continuing multiplying with G1 we obtain G4 and so on … MMDS 08 G4 adjacency matrix C. Faloutsos 43
  • 43. CMU SCS Kronecker Product – a Graph • Continuing multiplying with G1 we obtain G4 and so on … MMDS 08 G4 adjacency matrix C. Faloutsos 44
  • 44. CMU SCS Kronecker Product – a Graph • Continuing multiplying with G1 we obtain G4 and so on … MMDS 08 G4 adjacency matrix C. Faloutsos 45
  • 45. CMU SCS Properties: • We can PROVE that – – – – Degree distribution is multinomial ~ power law Diameter: constant Eigenvalue distribution: multinomial First eigenvector: multinomial • See [Leskovec+, PKDD’05] for proofs MMDS 08 C. Faloutsos 46
  • 46. CMU SCS Problem Definition • Given a growing graph with nodes N1, N2, … • Generate a realistic sequence of graphs that will obey all the patterns – Static Patterns  Power Law Degree Distribution  Power Law eigenvalue and eigenvector distribution  Small Diameter – Dynamic Patterns  Growth Power Law  Shrinking/Stabilizing Diameters • First and only generator for which we can prove all these properties MMDS 08 C. Faloutsos 47
  • 47. CMU SCS skip Stochastic Kronecker Graphs • Create N1×N1 probability matrix P1 • Compute the kth Kronecker power Pk • For each entry puv of Pk include an edge (u,v) with probability puv 0.4 0.2 0.1 0.3 Kronecker multiplication P1 0.16 0.08 0.08 0.04 0.04 0.12 0.02 0.06 Instance Matrix G2 0.04 0.02 0.12 0.06 0.01 0.03 0.03 0.09 flip biased coins Pk MMDS 08 C. Faloutsos 48
  • 48. CMU SCS Experiments • How well can we match real graphs? – Arxiv: physics citations: • 30,000 papers, 350,000 citations • 10 years of data – U.S. Patent citation network • 4 million patents, 16 million citations • 37 years of data – Autonomous systems – graph of internet • Single snapshot from January 2002 • 6,400 nodes, 26,000 edges • We show both static and temporal patterns MMDS 08 C. Faloutsos 49
  • 49. CMU SCS (Q: how to fit the parm’s?) A: • Stochastic version of Kronecker graphs + • Max likelihood + • Metropolis sampling • [Leskovec+, ICML’07] MMDS 08 C. Faloutsos 50
  • 50. CMU SCS Experiments on real AS graph Degree distribution Adjacency matrix eigen values MMDS 08 C. Faloutsos Hop plot Network value 51
  • 51. CMU SCS Conclusions • Kronecker graphs have: – All the static properties  Heavy tailed degree distributions  Small diameter  Multinomial eigenvalues and eigenvectors – All the temporal properties  Densification Power Law  Shrinking/Stabilizing Diameters – We can formally prove these results MMDS 08 C. Faloutsos 52
  • 52. CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do real graphs look like? • Problem#2: How do they evolve? • Problem#3: How to generate realistic graphs TOOLS • Problem#4: Who is the ‘master-mind’? • Problem#5: Track communities over time MMDS 08 C. Faloutsos 53
  • 53. CMU SCS Problem#4: MasterMind – ‘CePS’ • w/ Hanghang Tong, KDD 2006 • htong <at> cs.cmu.edu MMDS 08 C. Faloutsos 54
  • 54. CMU SCS Center-Piece Subgraph(Ceps) • Given Q query nodes • Find Center-piece ( ≤ b ) • App. – Social Networks – Law Inforcement, … • Idea: – Proximity -> random walk with restarts MMDS 08 C. Faloutsos 55
  • 55. CMU SCS Case Study: AND query R. Agrawal Jiawei Han V. Vapnik M. Jordan MMDS 08 C. Faloutsos 56
  • 56. CMU SCS Case Study: AND query MMDS 08 C. Faloutsos 57
  • 57. CMU SCS Case Study: AND query MMDS 08 C. Faloutsos 58
  • 59. CMU SCS Conclusions • • • • Q1:How to measure the importance? A1: RWR+K_SoftAnd Q2:How to do it efficiently? A2:Graph Partition (Fast CePS) – ~90% quality – 150x speedup (ICDM’06, b.p. award) MMDS 08 C. Faloutsos 60
  • 60. CMU SCS Outline • • • • Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection) • Conclusions MMDS 08 C. Faloutsos 61
  • 61. CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do real graphs look like? • Problem#2: How do they evolve? • Problem#3: How to generate realistic graphs TOOLS • Problem#4: Who is the ‘master-mind’? • Problem#5: Track communities over time MMDS 08 C. Faloutsos 62
  • 62. CMU SCS Tensors for time evolving graphs • [Jimeng Sun+ KDD’06] • [ “ , SDM’07] • [ CF, Kolda, Sun, SDM’07 tutorial] MMDS 08 C. Faloutsos 63
  • 63. CMU SCS Social network analysis • Static: find community structures Keywords MMDS 08 Authors 1990 DB C. Faloutsos 64
  • 64. CMU SCS Social network analysis • Static: find community structures MMDS 08 Authors 1992 1991 1990 DB C. Faloutsos 65
  • 65. CMU SCS Social network analysis • Static: find community structures • Dynamic: monitor community structure evolution; spot abnormal individuals; abnormal time-stamps Keywords 2004 DM DB MMDS 08 Authors 1990 DB C. Faloutsos 66
  • 66. CMU SCS Application 1: Multiway latent semantic indexing (LSI) Philip Yu 2004 1990 authors DB Ukeyword DB keyword Michael Stonebraker Uauthors DM Pattern Query • Projection matrices specify the clusters • Core tensors give cluster activation level MMDS 08 C. Faloutsos 67
  • 67. CMU SCS Bibliographic data (DBLP) • Papers from VLDB and KDD conferences • Construct 2nd order tensors with yearly windows with <author, keywords> • Each tensor: 4584×3741 • 11 timestamps (years) MMDS 08 C. Faloutsos 68
  • 68. CMU SCS Multiway LSI Authors Keywords Year michael carey, michael stonebraker, h. jagadish, hector garcia-molina queri,parallel,optimization,concurr, objectorient 1995 distribut,systems,view,storage,servic,pr ocess,cache 2004 streams,pattern,support, cluster, index,gener,queri 2004 surajit chaudhuri,mitch cherniack,michael stonebraker,ugur etintemel DB jiawei han,jian pei,philip s. yu, jianyong wang,charu c. aggarwal DM • Two groups are correctly identified: Databases and Data mining • People and concepts are drifting over time MMDS 08 C. Faloutsos 69
  • 69. CMU SCS Network forensics • Directional network flows • A large ISP with 100 POPs, each POP 10Gbps link capacity [Hotnets2004] – 450 GB/hour with compression • Task: Identify abnormal traffic pattern and find out the cause normal traffic destination destination abnormal traffic source source (with Prof. Hui FaloutsosDr. Yinglian Xie) 70 MMDS 08 C. Zhang and
  • 70. CMU SCS Conclusions Tensor-based methods (WTA/DTA/STA): • spot patterns and anomalies on time evolving graphs, and • on streams (monitoring) MMDS 08 C. Faloutsos 71
  • 71. CMU SCS Motivation Data mining: ~ find patterns (rules, outliers) • Problem#1: How do real graphs look like? • Problem#2: How do they evolve? • Problem#3: How to generate realistic graphs TOOLS • Problem#4: Who is the ‘master-mind’? • Problem#5: Track communities over time MMDS 08 C. Faloutsos 72
  • 72. CMU SCS Outline • • • • Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection, blogs) • Conclusions MMDS 08 C. Faloutsos 73
  • 73. CMU SCS Virus propagation • How do viruses/rumors propagate? • Blog influence? • Will a flu-like virus linger, or will it become extinct soon? MMDS 08 C. Faloutsos 74
  • 74. CMU SCS The model: SIS • ‘Flu’ like: Susceptible-Infected-Susceptible • Virus ‘strength’ s= β/δ Healthy Prob. δ N2 Prob. β N1 N Infected MMDS 08 Pro b. β N3 C. Faloutsos 75
  • 75. CMU SCS Epidemic threshold τ of a graph: the value of τ, such that if strength s = β / δ < τ an epidemic can not happen Thus, • given a graph • compute its epidemic threshold MMDS 08 C. Faloutsos 76
  • 76. CMU SCS Epidemic threshold τ What should τ depend on? • avg. degree? and/or highest degree? • and/or variance of degree? • and/or third moment of degree? • and/or diameter? MMDS 08 C. Faloutsos 77
  • 77. CMU SCS Epidemic threshold • [Theorem] We have no epidemic, if β/δ <τ = 1/ λ1,A MMDS 08 C. Faloutsos 78
  • 78. CMU SCS Epidemic threshold • [Theorem] We have no epidemic, if epidemic threshold recovery prob. β/δ <τ = 1/ λ1,A largest eigenvalue of adj. matrix A attack prob. Proof: [Wang+03] MMDS 08 C. Faloutsos 79
  • 79. CMU SCS Experiments (Oregon) β/δ > τ (above threshold) β/δ = τ (at the threshold) β/δ < τ (below threshold) MMDS 08 C. Faloutsos 80
  • 80. CMU SCS Outline • • • • Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection, blogs) • Conclusions MMDS 08 C. Faloutsos 81
  • 81. CMU SCS E-bay Fraud detection w/ Polo Chau & Shashank Pandit, CMU MMDS 08 C. Faloutsos 82
  • 82. CMU SCS E-bay Fraud detection • lines: positive feedbacks • would you buy from him/her? MMDS 08 C. Faloutsos 83
  • 83. CMU SCS E-bay Fraud detection • lines: positive feedbacks • would you buy from him/her? • or him/her? MMDS 08 C. Faloutsos 84
  • 84. CMU SCS E-bay Fraud detection - NetProbe MMDS 08 C. Faloutsos 85
  • 85. CMU SCS Outline • • • • Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection, blogs) • Conclusions MMDS 08 C. Faloutsos 86
  • 86. CMU SCS Blog analysis • • • • with Mary McGlohon (CMU) Jure Leskovec (CMU) Natalie Glance (now at Google) Mat Hurst (now at MSR) [SDM’07] MMDS 08 C. Faloutsos 87
  • 87. CMU SCS Cascades on the Blogosphere B1 B2 1 1 B1 a B2 1 B3 B4 Blogosphere blogs + posts 1 B3 b 2 B4 Blog network links among blogs 3 d e Post network links among posts Q1: popularity-decay of a post? Q2: degree distributions? MMDS 08 C. Faloutsos c 88
  • 88. CMU SCS Q1: popularity over time # in links 1 2 3 days after post Post popularity drops-off – exponentially? MMDS 08 C. Faloutsos 89 Days after post
  • 89. CMU SCS Q1: popularity over time # in links (log) 1 2 3 days after post (log) Post popularity drops-off – exponentially? POWER LAW! Exponent? MMDS 08 C. Faloutsos 90 Days after post
  • 90. CMU SCS Q1: popularity over time # in links (log) -1.6 1 2 3 days after post (log) Post popularity drops-off – exponentially? POWER LAW! Exponent? -1.6 (close to -1.5: Barabasi’s stack model) MMDS 08 C. Faloutsos 91 Days after post
  • 91. CMU SCS Q2: degree distribution 44,356 nodes, 122,153 edges. Half of blogs belong to largest connected component. count B 1 ?? 1 1 1 B 2 2 B B 3 3 4 blog in-degree MMDS 08 C. Faloutsos 92
  • 92. CMU SCS Q2: degree distribution 44,356 nodes, 122,153 edges. Half of blogs belong to largest connected component. count B 1 1 1 1 B 2 2 B B 3 3 4 blog in-degree MMDS 08 C. Faloutsos 93
  • 93. CMU SCS Q2: degree distribution 44,356 nodes, 122,153 edges. Half of blogs belong to largest connected component. count in-degree slope: -1.7 out-degree: -3 ‘rich get richer’ MMDS 08 C. Faloutsos blog in-degree 94
  • 94. CMU SCS Next steps: • edges with categorical attributes and/or timestamps • nodes with attributes • scalability (hadoop – PetaByte scale) – first eigenvalue; diameter [done] – rest eigenvalues; community detection [to be done] – modularity, anomalies etc etc • visualization (-> summarization) MMDS 08 C. Faloutsos 95
  • 95. CMU SCS E.g.: self-* system @ CMU • >200 nodes • target: 1 PetaByte MMDS 08 C. Faloutsos 96
  • 96. CMU SCS D.I.S.C. • ‘Data Intensive Scientific Computing’ [R. Bryant, CMU] – ‘big data’ – http://www.cs.cmu.edu/~bryant/pubdir/cmucs-07-128.pdf MMDS 08 C. Faloutsos 97
  • 97. CMU SCS Scalability • Google: > 450,000 processors in clusters of ~2000 processors each Barroso, Dean, Hölzle, “Web Search for a Planet: The Google Cluster Architecture” IEEE Micro 2003 • • • • Yahoo: 5Pb of data [Fayyad, KDD’07] Problem: machine failures, on a daily basis How to parallelize data mining tasks, then? A: map/reduce – hadoop (open-source clone) http:// hadoop.apache.org/ MMDS 08 C. Faloutsos 98
  • 98. CMU SCS 2’ intro to hadoop • master-slave architecture; n-way replication (default n=3) • ‘group by’ of SQL (in parallel, fault-tolerant way) • e.g, find histogram of word frequency – slaves compute local histograms – master merges into global histogram select course-id, count(*) from ENROLLMENT group by course-id MMDS 08 C. Faloutsos 99
  • 99. CMU SCS 2’ intro to hadoop • master-slave architecture; n-way replication (default n=3) • ‘group by’ of SQL (in parallel, fault-tolerant way) • e.g, find histogram of word frequency – slaves compute local histograms – master merges into global histogram select course-id, count(*) from ENROLLMENT group by course-id MMDS 08 C. Faloutsos reduce map 100
  • 100. CMU SCS OVERALL CONCLUSIONS • Graphs: Self-similarity and power laws work, when textbook methods fail! • New patterns (shrinking diameter!) • New generator: Kronecker • SVD / tensors / RWR: valuable tools • hadoop/mapReduce for scalability MMDS 08 C. Faloutsos 101
  • 101. CMU SCS References • Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan Fast Random Walk with Restart and Its Applications ICDM 2006, Hong Kong. • Hanghang Tong, Christos Faloutsos Center-Piece Subgraphs : Problem Definition and Fast Solutions, KDD 2006, Philadelphia, PA MMDS 08 C. Faloutsos 102
  • 102. CMU SCS References • Jure Leskovec, Jon Kleinberg and Christos Faloutsos Graphs over Time: Densification Laws, Shrinking Diame KDD 2005, Chicago, IL. ("Best Research Paper" award). • Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos Realistic, Mathematically Tractable Graph Generation (ECML/PKDD 2005), Porto, Portugal, 2005. MMDS 08 C. Faloutsos 103
  • 103. CMU SCS References • Jure Leskovec and Christos Faloutsos, Scalable Modeling of Real Graphs using Kronecker Multiplication, ICML 2007, Corvallis, OR, USA • Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang and Christos Faloutsos NetProbe : A Fast and Scalable System for Fraud Detection in Onl WWW 2007, Banff, Alberta, Canada, May 8-12, 2007. • Jimeng Sun, Dacheng Tao, Christos Faloutsos Beyond Streams and Graphs: Dynamic Tensor Analysis, KDD 2006, Philadelphia, PA MMDS 08 C. Faloutsos 104
  • 104. CMU SCS References • Jimeng Sun, Yinglian Xie, Hui Zhang, Christos Faloutsos. Less is More: Compact Matrix Decomposition for Large Sparse Graphs, SDM, Minneapolis, Minnesota, Apr 2007. [pdf] MMDS 08 C. Faloutsos 105
  • 105. CMU SCS Contact info: www. cs.cmu.edu /~christos (w/ papers, datasets, code, etc) MMDS 08 C. Faloutsos 106

Notes de l'éditeur

  1. &lt;number&gt; Diameter first, DPL second Check diameter formulas As the network grows the distances between nodes slowly grow
  2. &lt;number&gt; Diameter first, DPL second Check diameter formulas As the network grows the distances between nodes slowly grow
  3. &lt;number&gt;
  4. &lt;number&gt;
  5. &lt;number&gt;
  6. &lt;number&gt;
  7. &lt;number&gt;
  8. &lt;number&gt; CHECKMARKS
  9. &lt;number&gt; There are increasing interests in graph mining. In this talk, we introduce center-piece subgraphs. That is, given Q query nodes, we want to find nodes and the resulting subgraph, that have strong connection to all or most of the query nodes. Our Ceps alg. takes three parameters as input: Q query nodes; the budget b, indicating how large the resulting subgraph is; and the k softand coefficient, which I will explain it later. Here is an illustrating example, the upper figure is the original graph and given three query nodes (the red one), the importance of those intermediate nodes are indicated in the bottom figure. Eh, throughout the talk, we use the color to indicate the importance of the nodes, more read means more important. And subsequently, its center-piece subgraph could be as follows. In social network, say in a authorship network, we can use ceps to discover their common advisor, or an influential author in the field that Q query nodes belongs to. In law inforcement, the query nodes could be the suspects and by ceps, we can find the master-mind.
  10. &lt;number&gt; Since christos is here, I will not make any comments on this face.
  11. &lt;number&gt; Since christos is here, I will not make any comments on this face.
  12. &lt;number&gt; Since christos is here, I will not make any comments on this face.
  13. &lt;number&gt;
  14. &lt;number&gt;
  15. &lt;number&gt;
  16. &lt;number&gt;
  17. &lt;number&gt;
  18. &lt;number&gt;
  19. &lt;number&gt;
  20. &lt;number&gt;
  21. &lt;number&gt;
  22. &lt;number&gt;
  23. &lt;number&gt;
  24. &lt;number&gt;
  25. &lt;number&gt;
  26. &lt;number&gt;
  27. &lt;number&gt;
  28. &lt;number&gt; PT bytes, self-*, linux, gigabit link?