Fast Graphlet Decomposition: Theory, Algorithms, and Applications

Jennifer'Neville
Purdue&University
Ryan'A.'Rossi
PARC
Nick'Duffield
Texas&A&M&University
Ted'Willke
Intel&Research&Labs

Social'Network Internet'(AS)
BiologicalPolitical'Blogs
Graph
Mining

Network(Motifs:(Simple(Building(Blocks(of(Complex(Networks(– [Milo&et.&al&– Science&2002]
The(Structure(and(Function(of(Complex(Networks(– [Newman&– Siam&Review&2003]
2"node'
Graphlets
3"node'
Graphlets
4"node'
Graphlets
Connected(
Disconnected

! Small&k9vertex&induced&subgraphs
2"node'
Graphlets
3"node'
Graphlets
4"node'
Graphlets
Connected(
Disconnected

! Motifs:&Occur&in&real9world&networks&with&frequencies&significantly(
higher than&randomly&generated&networks
2"node'
Graphlets
3"node'
Graphlets
4"node'
Graphlets
Connected(
Disconnected

! Motifs:&Occur&in&real9world&networks&with&frequencies&significantly(
higher than&randomly&generated&networks
! Applied&to&food&web,&genetic,&neural,&web,&and&other&networks
• Found&distinct&graphlets in&each&case
2"node'
Graphlets
3"node'
Graphlets
4"node'
Graphlets
Connected(
Disconnected

AISTATS'2009
Bioinformatics'2006

9
9
9
9
9
! Biological&Networks&
• network&alignment,&protein&function&prediction
! Social&Networks&
• triad&analysis,&community&detection,&Exp.&Random&Models
! Computer&Networks
! Internet&AS
! Cyber&Security&
• spam&detection
! Ecology
.(.(.

Ex:(Given(an(input(graph(G
9 How%many%triangles%in%G?
9 How%many%cliques%of%size%49nodes%in%G?
9 How%many%cycles%of%size%49nodes%in%G?

Ex:(Given(an(input(graph(G
9 How%many%triangles%in%G?
9 How%many%cliques%of%size%49nodes%in%G?
9 How%many%cycles%of%size%49nodes%in%G?
" In%practice,%we%would%like%to%count%all%k9vertex%graphlets

! Enumerate&all&possible&graphlets

" Exhaustive%enumeration% is%too%expensive%

! Count&graphlets for&each&node&– and&combine&all&node&counts
[Shervashidze et.%al%– AISTAT%2009]%

" Still%expensive%for%relatively%large%k%%%[Shervashidze et.%al%– AISTAT%2009]%

" Still%expensive%for%relatively%large%k% [Shervashidze et.%al%– AISTAT%2009]%
! Other&recent&work&counts&only&connected& graphlets of&size&k=4
[Marcus%&%Shavitt – Computer%Networks%2012]%
Not(practical(– scales&only&for&small&graphs&with&few&
hundred/thousand&nodes/edges
9 taking%2400%secs for%a%graph%with%26K%nodes

Most&work&focused&on&graphlets of&k=3&nodes&
In&this&work,&we&focus&on&graphlets of&k=3,4&nodes
Efficient%Graphlet Counting%for%Large%Networks%%[Ahmed%et%al.,%ICDM%2015]
Graphlet Decomposition:% Framework,%Algorithms,%and%Applications
[Ahmed%et%al.,%KAIS%Journal%2016%(to%appear)]

Searching(Edge(
Neighborhoods
① For(each(edge(do
u v
v2 v3v1 v4
v6 v7
edge

Searching(Edge(
Neighborhoods
• Count(All(3<node(graphlets
② Merge(counts(from(all(edges
u v
v2 v3v1 v4
v6 v7
edge
Triangle 2<star 1<edge Independent(

Searching(Edge(
Neighborhoods
• Count(All(3<node(graphlets
② Merge(counts(from(all(edges
u v
v2 v3v1 V4
v6 v7
edge
Triangle 2<star 1<edge Independent(
# We(only(need(to(
find/count(triangles
# Use(equations to(get(
counts(of(others(in(o(1) Triangle

Edge"centric,'Parallel,'Memory"efficient'Framework'

How to count all 4-node graphlets?
4<Clique 4<Cycle4<Chrodal<Cycle Tailed<triangle 4<Path 3<Star
4<node<triangle 4<node<2star 4<node<2edge 4<node<1edge Independent(

Step(1 Step(2 Step(3
Searching(Edge(
Neighborhoods
For%each%edge%
Find%the%triangles
Count(4Gnode(graphlets
For%each%edge%
Count%49node%cliques%
and%49node%cycles only
Count(4Gnode(graphlets
For%each%edge%
Use%combinatorial%%%
relationships%to%compute%
counts%of%other%graphlets
in%constant(time
Step(4 Merge(counts(from(all(edges(

± 1&edge
4<Node(Graphlet Transition(Diagram(

4<Node(Graphlet Transition(Diagram(
± 1&edge
Count(Cliques(&(Cycles(ONLY
Use(relationships(&(transitions(
to(count(all(other(graphlets in(constant(time
4<Cliques
4<Cycles
Maximum&no.&triangles&
Incident&to&an&edge
Maximum&no.&stars
Incident&to&an&edge

T T
Relationship(between(4<cliques(&(4<ChordalCycles
4<Cliques 4<ChordalCycle
e
T T
e
No.&49ChordalCycles No.&&49Cliques
Proof'in'Lemma'1'" Ahmed'et'al.,'ICDM'2015

T T
Relationship(between(4<cliques(&(4<ChordalCycles
T T
No.&49ChordalCycles No.&&49Cliques
4<Cliques 4<ChordalCycle
e e
Proof'in'Lemma'1'" Ahmed'et'al.,'ICDM'2015

! Shared&Memory&Implementation
! Tested&on&graphs&with&over&a&billion&edges
! Largest&systematic&investigation&on&300+&networks
• Social,&web,&technological,& biological,&co9authorship,& infrastructure…&
• Facebook&100&networks&from&a&variety&of&US&schools
• Dense&graphs&from&the&DIMACS&challenge&
• Large&collections&of&biological&and&chemical&graphs
Details'in'the'paper
Data/code'online

Comparison(to(RAGE([Marcus'&'Shavitt – J.'Computer'Networks'2011]'''
Facebook100 Networks from US Schools
Ours RAGE
Time-in-Seconds

|V| |E| Ours RAGE
Time-in-Seconds
Baseline%(RAGE)%
did%not%finish%for%
most%graphs
We'take'~45'mins for'socSorkut (117M'edges)
We'take'~40'secs for'caSdblp (15M'edges)
Most'graphlet counts'in'orders'of'106'– 1015

|V| |E| Ours RAGE
Time-in-Seconds
Baseline%(RAGE)%
did%not%finish%for%
most%graphs
We'take'~4.5'secs for'webSgoogle (4.3M'edges)
We'take'~4'secs for'infSroadSusa (29M'edges)
Most'graphlet counts'in'orders'of'106'– 1015

0 1 2 4 8 16
0
5
10
15
Number of Processing Units
Speedup
socfb−Texas
socfb−OR
socfb−UCLA
socfb−Berkeley13
socfb−MIT
socfb−Penn94
0 1 2 4 8 16
0
5
10
15
Speedup
0 1 2 4 8 16
0
2
4
6
8
10
12
14
Speedup
tech−internet−as
tech−WHOIS
web−it−2004
web−spam
0 1 2 4 8 16
0
2
4
6
8
10
12
14
Speedup
Strong'scaling'results
Intel%Xeon%3.10%Ghz E592687W%server,%16%cores

Label'1
Label'0
Enzyme
NonSEnzyme
Collection'of'Graphs
(e.g.'Protein'Graphs)
.
.
.
Graphs&
Each%Protein%is%represented%by%a%graph
Binary%label%represents%the%function%of%the%protein

Label'1
Label'0
Enzyme
NonSEnzyme
?
?
.
.
.
Graphs&
?
?
?
Collection'of'Graphs
(e.g.'Protein'Graphs)
Assume%we%know%the%labels%of%a%few%graphs
How%to%predict%the%labels%of%the%unlabeled%graphs?

Features
Graphs
Graphlet
Feature&
Extraction
Model
Learning
Predict&Labels&of&
Unlabeled&Graphs
Label'1
Label'0
?
?
?
?
.
.
.
Graphs&
?
?
?
Protein'Graphs

! D&D&– 1178&protein&graphs.&Binary&labeled&as&Enzymes&vs.&
Non.&Enzymes
! MUTAG&– 188&mutagenic&compounds. Binary&labeled&
(whether&or&not&they&have&a&mutagenic&effect&on&the&Gram9
negative&bacterium)
! 109fold&validation,&Support&Vector&Machine
! Used&2,3,4&node&graphlets as&features

Previous)work: in)machine)learning)&)biological)networks)
Shervashidze et.al [AISTATS'2009]
Feature'Extraction'Time:
D&D' 2'hours,'45'mins
MUTAG 4.73'secs

Ranking'by'graphlet counts
Links'are'colored/weighted'
by'stars'of'size'4'nodes
Nodes'are'colored/weighted'
by'triangle'counts
Leukemia
Colon(
cancer
Deafness

! Local&Graphlet Decomposition
Role'discovery,'relational'learning,'multi"label'classification

! Unbiased&Estimation&of&Graphlet Counts
10
4
10
5
0.85
0.9
0.95
1
1.05
1.1
1.15
soc−orkut−dir
Sample Size
10
4
10
5
0.85
0.9
0.95
1
1.05
1.1
1.15
soc−orkut−dir
Sample Size
x/y
10
4
10
5
0.9
0.95
1
1.05
1.1
1.15
soc−flickr
Sample Size
10
4
10
5
0.9
0.95
1
1.05
1.1
1.15
soc−flickr
Sample Size
Estimation'of'counts'of'4"vertex'clique

! Framework&&&Algorithms&
• One&of&the&first&parallel&approaches&for&graphlet counting
• On&average&460x&faster&than&current& methods
• Edge9centric&computations&(only&requires&access&to&edge&
neighborhood)
• Time&and&space9efficient
• Sampling/estimation&methods&
• Local/global&counting
! Applications
• Large<scale graph&comparison,& classification,&and&anomaly&detection
• Visual&analytics&and&real<time graphlet mining

Code
http://nesreenahmed.com/graphlets
https://github.com/nkahmed/PGD
Data
http://networkrepository.com
" Email%us%for%questions%

Thank&you!
Questions?
nesreen.k.ahmed@intel.com
http://nesreenahmed.com

Fast Graphlet Decomposition: Theory, Algorithms, and Applications

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (11)

Similaire à Fast Graphlet Decomposition: Theory, Algorithms, and Applications

Similaire à Fast Graphlet Decomposition: Theory, Algorithms, and Applications (20)

Dernier

Dernier (20)

Fast Graphlet Decomposition: Theory, Algorithms, and Applications