SlideShare une entreprise Scribd logo
1  sur  31
Télécharger pour lire hors ligne
Dynamic Community Detection for Large-scale e-Commerce data
with Spark Streaming and GraphX
Ming Huang
Meng Zhang, Bin Wei
GuangYuan Huang, Jinkui Shi
Community Detection
Scenarios
•  VIP Customer
•  Reputation Escalator
•  Fraud Seller
•  ………
Algorithms
•  LPA
•  GN
•  Fast Unfolding
•  …….
How to make it Dynamic?
Static Communities Streaming Data
Make sophisticated, real-time decisions
Definition & Solution
Dynamic Community Detection
1.  Decide New Node’s community
2.  Update Graph Physical Topology
3.  Effect communities and modularity
Spark Streaming + GraphX à Streaming Graph
REAL-TIME
Streaming Graph
Edges
DStream
Graph
DStream
merge merge merge
Stock Graph
… … …
Models and Algorithms
Quick Overview of
Fast Unfolding
Modularity:
!
Q=
1
2m
Aij
*
ki
kj
2m
⎡
⎣
⎢
⎢
⎤
⎦
⎥
⎥i,j
∑ δ ci
,cj( )
!
Q = Qi
i
c
∑ =
in∑
2m
)
tot∑
2m
⎛
⎝⎜
⎞
⎠⎟
2
⎡
⎣
⎢
⎢
⎤
⎦
⎥
⎥i
c
∑
Incremental Algorithms
JV(Streaming with RDD ) UMG(Streaming with Graph)
"   Union & Modularity Greedy"   Join & Vote
JV
A B C
C1 C2 C2
D D D
A B C
D D D
C1 C2 C2
D
C2
join
Vote
incEdgeRDD stockCommunityRDD
D
C2
UMG 1 - Union
A
B
C1
C2
C3
C
(C1 or C2) ?
   newGraph = stockGraph.union(incGraph)"
A
B
C
D
UMG 2 - findBestCommunity
A
B
C
D
gain1=G(node(d), community(1))
gain2=G(node(d) , community(2))
C3
incVertexWithNeighbors = newGraph.mapReduceTriplets[Array[VertexData]]
(collectNeighborFunc, _ ++ _,"# # # # #Some((incGraph.vertices, EdgeDirection.Either)))
idCommunity = incVertexWithNeighbors.map {"
case (vid, neighbors) => (vid, findBestCommunity(neighbors))"
}.cache()"
!
Ci
=Cmax
j
G(nodei
,Cj
)
!
ΔQ=
in∑ +ki,in
2m
+
tot+ki∑
2m
⎛
⎝
⎜
⎞
⎠
⎟
2
⎡
⎣
⎢
⎢
⎤
⎦
⎥
⎥
+
in∑
2m
+
tot∑
2m
⎛
⎝
⎜
⎞
⎠
⎟
2
+
ki
2m
⎛
⎝
⎜
⎞
⎠
⎟
2
⎡
⎣
⎢
⎢
⎤
⎦
⎥
⎥
C2
C1
UMG 3 - updateCommunities
A
D
B
C
newCommunityRdd = idCommunity.updateCommunities(commuitiyRdd)"
"
newModularity = newCommunityRdd.map(community=>community.modularity).reduce(_+_)"
C1
C2
!
Q = Qi
i
c
∑ =
in∑
2m
)
tot∑
2m
⎛
⎝⎜
⎞
⎠⎟
2
⎡
⎣
⎢
⎢
⎤
⎦
⎥
⎥i
c
∑
(Q1, Q2)
edgeStreamRDD.foreachRDD { "
  incEdgeRdd => { "
   val incGraph  = buildIncGraph(incEdgeRdd) "
   (communityInfoRDD, modularity) = streamingFU.trainOn(incGraph)"
outputToHBase(communityInfoRDD)"
outputToHBase(modularity)"
edgeRdd "
  }"
} "
Flow Example Code
ssc.start()"
ssc.awaitTermination()"
val conf = new SparkConf().setMaster(……).setAppName(……)"
val ssc = new StreamingContext(conf, Seconds(60))"
"
"
val totalGraph = initGraph(totalEdgesRdd) "
Val streamingFU = new StreamingFU().setTotalGraph(totalGraph)"
"
val onlineDataFlow = getDataFlow(ssc.sparkContext)"
val edgeStreamRDD  = ssc.queueStream(onlineDataFlow, true) "
"
Experiment Results
Autonomous Systems Graphs
Stanford Large Network Dataset Collection(as-733)
https://snap.stanford.edu/data/
Modularity Trend – AS
Online Trading Graph
Buyer Seller
C-C
Modularity Trend – OT
Streaming Graph à Better Result
Key Points
"   Operator
"   Merge Small graph into Large graph
"   Model
"   Local changes
"   Index or summary
"   Algorithm
"   Delicate formula
"   Commutative law & Associative law
"   Parallelly & Incrementally
Complex GraphX
Operators
Graph Union Operator
GRAPH(H)GRAPH(G)
∪ =	

E
F
G
H
B
C
D E
F
A
B
C
D
E
F
A
H
G
GRAPH(G U H)
Graph Union Operator
https://issues.apache.org/jira/browse/SPARK-7894"
"
[GraphX] Complex Operators between Graphs: Union
https://github.com/apache/spark/pull/6685"
"
   newGraph = stockGraph.union(incGraph)"
Complex GraphX Operators
"   Union of Graphs ( G ∪ H )
"   Intersection of Graphs ( G ∩ H)
"   Graph Join
"   Difference of Graphs(G – H)
"   Graph Complement
"   Line Graph ( L(G) )
Issues:"
Complex Operators between Graphs
https://issues.apache.org/jira/browse/SPARK-7893"
Streaming Optimization
Monitoring and Correction
Ω
Data Loading Modularity Threshold CheckingStreaming-FU
FastUnfolding
[Hourly Monitoring]
[Streaming]
[Daily Running]
FastUnfolding
communityID	
 communityInfo	

community1	
 (in1,tot1,degree1,modularity1)	

……	
 ……	

mTime mValue
timestamp1 totalModularity1
…… ……
modularityTablecommRDDTable
Streaming Resource Allocation
•  Driver-Memory: 20G
•  Executors: 100
•  Core: 2
•  Executor-Memory: 20G
Not Enough for Peak Period!
Streaming Buffer
Kafka
Stream
Hdfs
Stream
Join
StreamingFUModel
Streaming-
FU
Streaming-
Buffer
TT
Receiver
Split
HDFS
Modularity Correction Buffer
Resource Peak Buffer
Kafka
Buffer
Writer
Conclusion
"   Streaming Graph
"   Complex Operators will help
"   Daily Rebuild & Threshold Check
"   Costs more memory and time
"   Open Question
checkpoint with Streaming or Graph?
Acknowledgements
1.  Limits of community detection
" http://www.slideshare.net/vtraag/comm-detect
2.  Community Detection
" http://www.traag.net/projects/community-detection/
3.  Social Network Analysis
" http://lorenzopaoliani.info/topics/
4.  Community detection in complex networks using Extremal Optimization
" http://arxiv.org/pdf/cond-mat/0501368.pdf
"   Q & A
Agenda
"   Dynamic Community Detection
"   Streaming Graph
"   Models and Algorithms
"   Complex GraphX Operators
"   Streaming Optimization
"   Conclusion
Static vs. Dynamic
Static Model Dynamic Model

Contenu connexe

Tendances

Η ΓΡΑΜΜΑΤΙΚΗ ΚΑΙ ΤΟ ΣΥΝΤΑΚΤΙΚΟ ΓΙΑ ΤΙΣ ΠΑΝΕΛΛΗΝΙΕΣ
Η ΓΡΑΜΜΑΤΙΚΗ ΚΑΙ ΤΟ ΣΥΝΤΑΚΤΙΚΟ ΓΙΑ ΤΙΣ ΠΑΝΕΛΛΗΝΙΕΣΗ ΓΡΑΜΜΑΤΙΚΗ ΚΑΙ ΤΟ ΣΥΝΤΑΚΤΙΚΟ ΓΙΑ ΤΙΣ ΠΑΝΕΛΛΗΝΙΕΣ
Η ΓΡΑΜΜΑΤΙΚΗ ΚΑΙ ΤΟ ΣΥΝΤΑΚΤΙΚΟ ΓΙΑ ΤΙΣ ΠΑΝΕΛΛΗΝΙΕΣΕΥΗ ΕΛΕΥΘΕΡΙΟΥ
 
μάθημα 3 υλοποίηση αλγορίθμου με υπολογιστή - προγραμματισμός
μάθημα 3   υλοποίηση αλγορίθμου με υπολογιστή - προγραμματισμόςμάθημα 3   υλοποίηση αλγορίθμου με υπολογιστή - προγραμματισμός
μάθημα 3 υλοποίηση αλγορίθμου με υπολογιστή - προγραμματισμόςGeorge Avgeris
 
Το Βιογραφικό Σημείωμα
Το Βιογραφικό ΣημείωμαΤο Βιογραφικό Σημείωμα
Το Βιογραφικό ΣημείωμαSkywalker.gr
 
Διαγώνισμα δομή ακολουθίας ΑΕΠΠ
Διαγώνισμα δομή ακολουθίας ΑΕΠΠΔιαγώνισμα δομή ακολουθίας ΑΕΠΠ
Διαγώνισμα δομή ακολουθίας ΑΕΠΠEleni Kokkinou
 
Enotita3 kef11
Enotita3 kef11Enotita3 kef11
Enotita3 kef11anvaraz
 
Βοήθημα για το μάθημα «Ανάπτυξη Εφαρμογών σε Προγραμματιστικό Περιβάλλον».
Βοήθημα για το μάθημα «Ανάπτυξη Εφαρμογών σε Προγραμματιστικό Περιβάλλον».Βοήθημα για το μάθημα «Ανάπτυξη Εφαρμογών σε Προγραμματιστικό Περιβάλλον».
Βοήθημα για το μάθημα «Ανάπτυξη Εφαρμογών σε Προγραμματιστικό Περιβάλλον».Θανάσης Δρούγας
 
[Φυσική Γ' Γυμνασίου] Διαγώνισμα στις Ταλαντώσεις (ΘΕΜΑΤΑ)
[Φυσική Γ' Γυμνασίου] Διαγώνισμα στις Ταλαντώσεις (ΘΕΜΑΤΑ)[Φυσική Γ' Γυμνασίου] Διαγώνισμα στις Ταλαντώσεις (ΘΕΜΑΤΑ)
[Φυσική Γ' Γυμνασίου] Διαγώνισμα στις Ταλαντώσεις (ΘΕΜΑΤΑ)Dimitris Kontoudakis
 
ιστορικά παραθέματα (συνέχεια)
ιστορικά παραθέματα (συνέχεια)ιστορικά παραθέματα (συνέχεια)
ιστορικά παραθέματα (συνέχεια)Androniki Masaouti
 
Ασκήσεις δομή Επιλογής
Ασκήσεις δομή ΕπιλογήςΑσκήσεις δομή Επιλογής
Ασκήσεις δομή ΕπιλογήςEleni Kokkinou
 
αντικειμενοστραφής προγραμματισμός
αντικειμενοστραφής προγραμματισμόςαντικειμενοστραφής προγραμματισμός
αντικειμενοστραφής προγραμματισμόςkmag388
 
HTML-CSS για αρχάριους :: Μάθημα 1ο
HTML-CSS για αρχάριους :: Μάθημα 1οHTML-CSS για αρχάριους :: Μάθημα 1ο
HTML-CSS για αρχάριους :: Μάθημα 1οDespina Kamilali
 
Θεωρία μαθηματικά προσανατολισμού Γ λυκείου
Θεωρία μαθηματικά προσανατολισμού Γ λυκείουΘεωρία μαθηματικά προσανατολισμού Γ λυκείου
Θεωρία μαθηματικά προσανατολισμού Γ λυκείουΘανάσης Δρούγας
 
Στερεότυπα
ΣτερεότυπαΣτερεότυπα
ΣτερεότυπαAkis Ampelas
 
Εφαρμογές Πληροφορικής Α' ΓΕΛ και Α' ΕΠΑΛ
Εφαρμογές Πληροφορικής Α' ΓΕΛ και Α' ΕΠΑΛΕφαρμογές Πληροφορικής Α' ΓΕΛ και Α' ΕΠΑΛ
Εφαρμογές Πληροφορικής Α' ΓΕΛ και Α' ΕΠΑΛVassilis Efopoulos
 
αεππ κεφάλαιο 2 βασικές έννοιες αλγορίθμων
αεππ κεφάλαιο 2 βασικές έννοιες αλγορίθμωναεππ κεφάλαιο 2 βασικές έννοιες αλγορίθμων
αεππ κεφάλαιο 2 βασικές έννοιες αλγορίθμωνevoyiatz
 
Ερωτήσεις επανάληψης Χημεία Α Λυκείου 2013.
Ερωτήσεις επανάληψης Χημεία Α Λυκείου 2013.Ερωτήσεις επανάληψης Χημεία Α Λυκείου 2013.
Ερωτήσεις επανάληψης Χημεία Α Λυκείου 2013.Stathis Gourzis
 
Διδακτικό σενάριο:"Φιλία"
Διδακτικό σενάριο:"Φιλία"Διδακτικό σενάριο:"Φιλία"
Διδακτικό σενάριο:"Φιλία"Emytse66
 

Tendances (20)

Η ΓΡΑΜΜΑΤΙΚΗ ΚΑΙ ΤΟ ΣΥΝΤΑΚΤΙΚΟ ΓΙΑ ΤΙΣ ΠΑΝΕΛΛΗΝΙΕΣ
Η ΓΡΑΜΜΑΤΙΚΗ ΚΑΙ ΤΟ ΣΥΝΤΑΚΤΙΚΟ ΓΙΑ ΤΙΣ ΠΑΝΕΛΛΗΝΙΕΣΗ ΓΡΑΜΜΑΤΙΚΗ ΚΑΙ ΤΟ ΣΥΝΤΑΚΤΙΚΟ ΓΙΑ ΤΙΣ ΠΑΝΕΛΛΗΝΙΕΣ
Η ΓΡΑΜΜΑΤΙΚΗ ΚΑΙ ΤΟ ΣΥΝΤΑΚΤΙΚΟ ΓΙΑ ΤΙΣ ΠΑΝΕΛΛΗΝΙΕΣ
 
μάθημα 3 υλοποίηση αλγορίθμου με υπολογιστή - προγραμματισμός
μάθημα 3   υλοποίηση αλγορίθμου με υπολογιστή - προγραμματισμόςμάθημα 3   υλοποίηση αλγορίθμου με υπολογιστή - προγραμματισμός
μάθημα 3 υλοποίηση αλγορίθμου με υπολογιστή - προγραμματισμός
 
Το Βιογραφικό Σημείωμα
Το Βιογραφικό ΣημείωμαΤο Βιογραφικό Σημείωμα
Το Βιογραφικό Σημείωμα
 
Διαγώνισμα δομή ακολουθίας ΑΕΠΠ
Διαγώνισμα δομή ακολουθίας ΑΕΠΠΔιαγώνισμα δομή ακολουθίας ΑΕΠΠ
Διαγώνισμα δομή ακολουθίας ΑΕΠΠ
 
Συναρτήσεις, επανάληψη
Συναρτήσεις, επανάληψη Συναρτήσεις, επανάληψη
Συναρτήσεις, επανάληψη
 
Enotita3 kef11
Enotita3 kef11Enotita3 kef11
Enotita3 kef11
 
Βοήθημα για το μάθημα «Ανάπτυξη Εφαρμογών σε Προγραμματιστικό Περιβάλλον».
Βοήθημα για το μάθημα «Ανάπτυξη Εφαρμογών σε Προγραμματιστικό Περιβάλλον».Βοήθημα για το μάθημα «Ανάπτυξη Εφαρμογών σε Προγραμματιστικό Περιβάλλον».
Βοήθημα για το μάθημα «Ανάπτυξη Εφαρμογών σε Προγραμματιστικό Περιβάλλον».
 
[Φυσική Γ' Γυμνασίου] Διαγώνισμα στις Ταλαντώσεις (ΘΕΜΑΤΑ)
[Φυσική Γ' Γυμνασίου] Διαγώνισμα στις Ταλαντώσεις (ΘΕΜΑΤΑ)[Φυσική Γ' Γυμνασίου] Διαγώνισμα στις Ταλαντώσεις (ΘΕΜΑΤΑ)
[Φυσική Γ' Γυμνασίου] Διαγώνισμα στις Ταλαντώσεις (ΘΕΜΑΤΑ)
 
ιστορικά παραθέματα (συνέχεια)
ιστορικά παραθέματα (συνέχεια)ιστορικά παραθέματα (συνέχεια)
ιστορικά παραθέματα (συνέχεια)
 
Ασκήσεις δομή Επιλογής
Ασκήσεις δομή ΕπιλογήςΑσκήσεις δομή Επιλογής
Ασκήσεις δομή Επιλογής
 
αντικειμενοστραφής προγραμματισμός
αντικειμενοστραφής προγραμματισμόςαντικειμενοστραφής προγραμματισμός
αντικειμενοστραφής προγραμματισμός
 
HTML-CSS για αρχάριους :: Μάθημα 1ο
HTML-CSS για αρχάριους :: Μάθημα 1οHTML-CSS για αρχάριους :: Μάθημα 1ο
HTML-CSS για αρχάριους :: Μάθημα 1ο
 
Θεωρία μαθηματικά προσανατολισμού Γ λυκείου
Θεωρία μαθηματικά προσανατολισμού Γ λυκείουΘεωρία μαθηματικά προσανατολισμού Γ λυκείου
Θεωρία μαθηματικά προσανατολισμού Γ λυκείου
 
Kef09
Kef09Kef09
Kef09
 
Στερεότυπα
ΣτερεότυπαΣτερεότυπα
Στερεότυπα
 
Εφαρμογές Πληροφορικής Α' ΓΕΛ και Α' ΕΠΑΛ
Εφαρμογές Πληροφορικής Α' ΓΕΛ και Α' ΕΠΑΛΕφαρμογές Πληροφορικής Α' ΓΕΛ και Α' ΕΠΑΛ
Εφαρμογές Πληροφορικής Α' ΓΕΛ και Α' ΕΠΑΛ
 
πραγματικοι αριθμοι
πραγματικοι αριθμοιπραγματικοι αριθμοι
πραγματικοι αριθμοι
 
αεππ κεφάλαιο 2 βασικές έννοιες αλγορίθμων
αεππ κεφάλαιο 2 βασικές έννοιες αλγορίθμωναεππ κεφάλαιο 2 βασικές έννοιες αλγορίθμων
αεππ κεφάλαιο 2 βασικές έννοιες αλγορίθμων
 
Ερωτήσεις επανάληψης Χημεία Α Λυκείου 2013.
Ερωτήσεις επανάληψης Χημεία Α Λυκείου 2013.Ερωτήσεις επανάληψης Χημεία Α Λυκείου 2013.
Ερωτήσεις επανάληψης Χημεία Α Λυκείου 2013.
 
Διδακτικό σενάριο:"Φιλία"
Διδακτικό σενάριο:"Φιλία"Διδακτικό σενάριο:"Φιλία"
Διδακτικό σενάριο:"Φιλία"
 

En vedette

Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)Spark Summit
 
Limits of community detection
Limits of community detectionLimits of community detection
Limits of community detectionVincent Traag
 
ODSC_Cherven_20160518
ODSC_Cherven_20160518ODSC_Cherven_20160518
ODSC_Cherven_20160518Ken Cherven
 
Visualizing Networks
Visualizing NetworksVisualizing Networks
Visualizing Networksfreshdatabos
 
Diagnosing Open-Source Community Health with Spark-(William Benton, Red Hat)
Diagnosing Open-Source Community Health with Spark-(William Benton, Red Hat)Diagnosing Open-Source Community Health with Spark-(William Benton, Red Hat)
Diagnosing Open-Source Community Health with Spark-(William Benton, Red Hat)Spark Summit
 
Spark Summit EU talk by Brij Bhushan Ravat
Spark Summit EU talk by Brij Bhushan RavatSpark Summit EU talk by Brij Bhushan Ravat
Spark Summit EU talk by Brij Bhushan RavatSpark Summit
 
Estudio sobre Spark, Storm, Kafka y Hive
Estudio sobre Spark, Storm, Kafka y HiveEstudio sobre Spark, Storm, Kafka y Hive
Estudio sobre Spark, Storm, Kafka y HiveWellness Telecom
 
Reactive Streams, linking Reactive Application to Spark Streaming by Luc Bour...
Reactive Streams, linking Reactive Application to Spark Streaming by Luc Bour...Reactive Streams, linking Reactive Application to Spark Streaming by Luc Bour...
Reactive Streams, linking Reactive Application to Spark Streaming by Luc Bour...Spark Summit
 
Use of Spark MLib for Predicting the Offlining of Digital Media-(Christopher ...
Use of Spark MLib for Predicting the Offlining of Digital Media-(Christopher ...Use of Spark MLib for Predicting the Offlining of Digital Media-(Christopher ...
Use of Spark MLib for Predicting the Offlining of Digital Media-(Christopher ...Spark Summit
 
Spark Summit Keynote with Ken Tsai
Spark Summit Keynote with Ken TsaiSpark Summit Keynote with Ken Tsai
Spark Summit Keynote with Ken TsaiSpark Summit
 
Spark Summit EU talk by Stephan Kessler
Spark Summit EU talk by Stephan KesslerSpark Summit EU talk by Stephan Kessler
Spark Summit EU talk by Stephan KesslerSpark Summit
 
GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™Databricks
 
Monitoring Electronic Trading Environments using Spark by Fergal Toomey and P...
Monitoring Electronic Trading Environments using Spark by Fergal Toomey and P...Monitoring Electronic Trading Environments using Spark by Fergal Toomey and P...
Monitoring Electronic Trading Environments using Spark by Fergal Toomey and P...Spark Summit
 
Huohua: A Distributed Time Series Analysis Framework For Spark
Huohua: A Distributed Time Series Analysis Framework For SparkHuohua: A Distributed Time Series Analysis Framework For Spark
Huohua: A Distributed Time Series Analysis Framework For SparkJen Aman
 
Community Detection in Social Media
Community Detection in Social MediaCommunity Detection in Social Media
Community Detection in Social MediaSymeon Papadopoulos
 
Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...
Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...
Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...Spark Summit
 
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)Spark Summit
 
Credit Fraud Prevention with Spark and Graph Analysis
Credit Fraud Prevention with Spark and Graph AnalysisCredit Fraud Prevention with Spark and Graph Analysis
Credit Fraud Prevention with Spark and Graph AnalysisJen Aman
 

En vedette (20)

Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
 
Limits of community detection
Limits of community detectionLimits of community detection
Limits of community detection
 
ODSC_Cherven_20160518
ODSC_Cherven_20160518ODSC_Cherven_20160518
ODSC_Cherven_20160518
 
Emr hive barcamp 2012
Emr hive   barcamp 2012Emr hive   barcamp 2012
Emr hive barcamp 2012
 
Visualizing Networks
Visualizing NetworksVisualizing Networks
Visualizing Networks
 
Xgboost
XgboostXgboost
Xgboost
 
Diagnosing Open-Source Community Health with Spark-(William Benton, Red Hat)
Diagnosing Open-Source Community Health with Spark-(William Benton, Red Hat)Diagnosing Open-Source Community Health with Spark-(William Benton, Red Hat)
Diagnosing Open-Source Community Health with Spark-(William Benton, Red Hat)
 
Spark Summit EU talk by Brij Bhushan Ravat
Spark Summit EU talk by Brij Bhushan RavatSpark Summit EU talk by Brij Bhushan Ravat
Spark Summit EU talk by Brij Bhushan Ravat
 
Estudio sobre Spark, Storm, Kafka y Hive
Estudio sobre Spark, Storm, Kafka y HiveEstudio sobre Spark, Storm, Kafka y Hive
Estudio sobre Spark, Storm, Kafka y Hive
 
Reactive Streams, linking Reactive Application to Spark Streaming by Luc Bour...
Reactive Streams, linking Reactive Application to Spark Streaming by Luc Bour...Reactive Streams, linking Reactive Application to Spark Streaming by Luc Bour...
Reactive Streams, linking Reactive Application to Spark Streaming by Luc Bour...
 
Use of Spark MLib for Predicting the Offlining of Digital Media-(Christopher ...
Use of Spark MLib for Predicting the Offlining of Digital Media-(Christopher ...Use of Spark MLib for Predicting the Offlining of Digital Media-(Christopher ...
Use of Spark MLib for Predicting the Offlining of Digital Media-(Christopher ...
 
Spark Summit Keynote with Ken Tsai
Spark Summit Keynote with Ken TsaiSpark Summit Keynote with Ken Tsai
Spark Summit Keynote with Ken Tsai
 
Spark Summit EU talk by Stephan Kessler
Spark Summit EU talk by Stephan KesslerSpark Summit EU talk by Stephan Kessler
Spark Summit EU talk by Stephan Kessler
 
GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™
 
Monitoring Electronic Trading Environments using Spark by Fergal Toomey and P...
Monitoring Electronic Trading Environments using Spark by Fergal Toomey and P...Monitoring Electronic Trading Environments using Spark by Fergal Toomey and P...
Monitoring Electronic Trading Environments using Spark by Fergal Toomey and P...
 
Huohua: A Distributed Time Series Analysis Framework For Spark
Huohua: A Distributed Time Series Analysis Framework For SparkHuohua: A Distributed Time Series Analysis Framework For Spark
Huohua: A Distributed Time Series Analysis Framework For Spark
 
Community Detection in Social Media
Community Detection in Social MediaCommunity Detection in Social Media
Community Detection in Social Media
 
Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...
Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...
Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...
 
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
 
Credit Fraud Prevention with Spark and Graph Analysis
Credit Fraud Prevention with Spark and Graph AnalysisCredit Fraud Prevention with Spark and Graph Analysis
Credit Fraud Prevention with Spark and Graph Analysis
 

Similaire à Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX-(Ming Huang, Taobao)

RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsConnected Data World
 
Scaling graph investigations with Math, GPUs, & Experts
Scaling graph investigations with Math, GPUs, & ExpertsScaling graph investigations with Math, GPUs, & Experts
Scaling graph investigations with Math, GPUs, & Expertsgraphistry
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Spark Summit
 
Using Graph Analysis and Fraud Detection in the Fintech Industry
Using Graph Analysis and Fraud Detection in the Fintech IndustryUsing Graph Analysis and Fraud Detection in the Fintech Industry
Using Graph Analysis and Fraud Detection in the Fintech IndustryStanka Dalekova
 
Using Graph Analysis and Fraud Detection in the Fintech Industry
Using Graph Analysis and Fraud Detection in the Fintech IndustryUsing Graph Analysis and Fraud Detection in the Fintech Industry
Using Graph Analysis and Fraud Detection in the Fintech IndustryStanka Dalekova
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsPetr Novotný
 
GraphQL & DGraph with Go
GraphQL & DGraph with GoGraphQL & DGraph with Go
GraphQL & DGraph with GoJames Tan
 
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...TigerGraph
 
Large-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCLarge-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCAapo Kyrölä
 
Graph protocol for accessing information about blockchains and d apps
Graph protocol for accessing information about blockchains and d appsGraph protocol for accessing information about blockchains and d apps
Graph protocol for accessing information about blockchains and d appsGene Leybzon
 
Introduction to MapReduce & hadoop
Introduction to MapReduce & hadoopIntroduction to MapReduce & hadoop
Introduction to MapReduce & hadoopColin Su
 
Graphs in data structures are non-linear data structures made up of a finite ...
Graphs in data structures are non-linear data structures made up of a finite ...Graphs in data structures are non-linear data structures made up of a finite ...
Graphs in data structures are non-linear data structures made up of a finite ...bhargavi804095
 
100X Investigations - Graphistry / Microsoft BlueHat
100X Investigations - Graphistry / Microsoft BlueHat100X Investigations - Graphistry / Microsoft BlueHat
100X Investigations - Graphistry / Microsoft BlueHatgraphistry
 
Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017
Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017
Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017MLconf
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesKonstantinos Xirogiannopoulos
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesPyData
 
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONSBIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONScscpconf
 
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions csandit
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersJen Aman
 

Similaire à Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX-(Ming Huang, Taobao) (20)

RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
Scaling graph investigations with Math, GPUs, & Experts
Scaling graph investigations with Math, GPUs, & ExpertsScaling graph investigations with Math, GPUs, & Experts
Scaling graph investigations with Math, GPUs, & Experts
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
 
Using Graph Analysis and Fraud Detection in the Fintech Industry
Using Graph Analysis and Fraud Detection in the Fintech IndustryUsing Graph Analysis and Fraud Detection in the Fintech Industry
Using Graph Analysis and Fraud Detection in the Fintech Industry
 
Using Graph Analysis and Fraud Detection in the Fintech Industry
Using Graph Analysis and Fraud Detection in the Fintech IndustryUsing Graph Analysis and Fraud Detection in the Fintech Industry
Using Graph Analysis and Fraud Detection in the Fintech Industry
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big Graphs
 
GraphQL & DGraph with Go
GraphQL & DGraph with GoGraphQL & DGraph with Go
GraphQL & DGraph with Go
 
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
 
Large-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCLarge-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PC
 
Graph protocol for accessing information about blockchains and d apps
Graph protocol for accessing information about blockchains and d appsGraph protocol for accessing information about blockchains and d apps
Graph protocol for accessing information about blockchains and d apps
 
Introduction to MapReduce & hadoop
Introduction to MapReduce & hadoopIntroduction to MapReduce & hadoop
Introduction to MapReduce & hadoop
 
Graphs in data structures are non-linear data structures made up of a finite ...
Graphs in data structures are non-linear data structures made up of a finite ...Graphs in data structures are non-linear data structures made up of a finite ...
Graphs in data structures are non-linear data structures made up of a finite ...
 
100X Investigations - Graphistry / Microsoft BlueHat
100X Investigations - Graphistry / Microsoft BlueHat100X Investigations - Graphistry / Microsoft BlueHat
100X Investigations - Graphistry / Microsoft BlueHat
 
Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017
Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017
Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational Databases
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational Databases
 
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONSBIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
 
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity Clusters
 

Plus de Spark Summit

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang WuSpark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya RaghavendraSpark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakSpark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimSpark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraSpark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spark Summit
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovSpark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkSpark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...Spark Summit
 

Plus de Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 

Dernier

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 

Dernier (20)

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 

Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX-(Ming Huang, Taobao)

  • 1. Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX Ming Huang Meng Zhang, Bin Wei GuangYuan Huang, Jinkui Shi
  • 2. Community Detection Scenarios •  VIP Customer •  Reputation Escalator •  Fraud Seller •  ……… Algorithms •  LPA •  GN •  Fast Unfolding •  …….
  • 3. How to make it Dynamic? Static Communities Streaming Data Make sophisticated, real-time decisions
  • 4. Definition & Solution Dynamic Community Detection 1.  Decide New Node’s community 2.  Update Graph Physical Topology 3.  Effect communities and modularity Spark Streaming + GraphX à Streaming Graph REAL-TIME
  • 7. Quick Overview of Fast Unfolding Modularity: ! Q= 1 2m Aij * ki kj 2m ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥i,j ∑ δ ci ,cj( ) ! Q = Qi i c ∑ = in∑ 2m ) tot∑ 2m ⎛ ⎝⎜ ⎞ ⎠⎟ 2 ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥i c ∑
  • 8. Incremental Algorithms JV(Streaming with RDD ) UMG(Streaming with Graph) "   Union & Modularity Greedy"   Join & Vote
  • 9. JV A B C C1 C2 C2 D D D A B C D D D C1 C2 C2 D C2 join Vote incEdgeRDD stockCommunityRDD D C2
  • 10. UMG 1 - Union A B C1 C2 C3 C (C1 or C2) ?    newGraph = stockGraph.union(incGraph)" A B C D
  • 11. UMG 2 - findBestCommunity A B C D gain1=G(node(d), community(1)) gain2=G(node(d) , community(2)) C3 incVertexWithNeighbors = newGraph.mapReduceTriplets[Array[VertexData]] (collectNeighborFunc, _ ++ _,"# # # # #Some((incGraph.vertices, EdgeDirection.Either))) idCommunity = incVertexWithNeighbors.map {" case (vid, neighbors) => (vid, findBestCommunity(neighbors))" }.cache()" ! Ci =Cmax j G(nodei ,Cj ) ! ΔQ= in∑ +ki,in 2m + tot+ki∑ 2m ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ 2 ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ + in∑ 2m + tot∑ 2m ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ 2 + ki 2m ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ 2 ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ C2 C1
  • 12. UMG 3 - updateCommunities A D B C newCommunityRdd = idCommunity.updateCommunities(commuitiyRdd)" " newModularity = newCommunityRdd.map(community=>community.modularity).reduce(_+_)" C1 C2 ! Q = Qi i c ∑ = in∑ 2m ) tot∑ 2m ⎛ ⎝⎜ ⎞ ⎠⎟ 2 ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥i c ∑ (Q1, Q2)
  • 13. edgeStreamRDD.foreachRDD { "   incEdgeRdd => { "    val incGraph  = buildIncGraph(incEdgeRdd) "    (communityInfoRDD, modularity) = streamingFU.trainOn(incGraph)" outputToHBase(communityInfoRDD)" outputToHBase(modularity)" edgeRdd "   }" } " Flow Example Code ssc.start()" ssc.awaitTermination()" val conf = new SparkConf().setMaster(……).setAppName(……)" val ssc = new StreamingContext(conf, Seconds(60))" " " val totalGraph = initGraph(totalEdgesRdd) " Val streamingFU = new StreamingFU().setTotalGraph(totalGraph)" " val onlineDataFlow = getDataFlow(ssc.sparkContext)" val edgeStreamRDD  = ssc.queueStream(onlineDataFlow, true) " "
  • 15. Autonomous Systems Graphs Stanford Large Network Dataset Collection(as-733) https://snap.stanford.edu/data/
  • 18. Modularity Trend – OT Streaming Graph à Better Result
  • 19. Key Points "   Operator "   Merge Small graph into Large graph "   Model "   Local changes "   Index or summary "   Algorithm "   Delicate formula "   Commutative law & Associative law "   Parallelly & Incrementally
  • 21. Graph Union Operator GRAPH(H)GRAPH(G) ∪ =  E F G H B C D E F A B C D E F A H G GRAPH(G U H) Graph Union Operator https://issues.apache.org/jira/browse/SPARK-7894" " [GraphX] Complex Operators between Graphs: Union https://github.com/apache/spark/pull/6685" "    newGraph = stockGraph.union(incGraph)"
  • 22. Complex GraphX Operators "   Union of Graphs ( G ∪ H ) "   Intersection of Graphs ( G ∩ H) "   Graph Join "   Difference of Graphs(G – H) "   Graph Complement "   Line Graph ( L(G) ) Issues:" Complex Operators between Graphs https://issues.apache.org/jira/browse/SPARK-7893"
  • 24. Monitoring and Correction Ω Data Loading Modularity Threshold CheckingStreaming-FU FastUnfolding [Hourly Monitoring] [Streaming] [Daily Running] FastUnfolding communityID  communityInfo  community1  (in1,tot1,degree1,modularity1)  ……  ……  mTime mValue timestamp1 totalModularity1 …… …… modularityTablecommRDDTable
  • 25. Streaming Resource Allocation •  Driver-Memory: 20G •  Executors: 100 •  Core: 2 •  Executor-Memory: 20G Not Enough for Peak Period!
  • 27. Conclusion "   Streaming Graph "   Complex Operators will help "   Daily Rebuild & Threshold Check "   Costs more memory and time "   Open Question checkpoint with Streaming or Graph?
  • 28. Acknowledgements 1.  Limits of community detection " http://www.slideshare.net/vtraag/comm-detect 2.  Community Detection " http://www.traag.net/projects/community-detection/ 3.  Social Network Analysis " http://lorenzopaoliani.info/topics/ 4.  Community detection in complex networks using Extremal Optimization " http://arxiv.org/pdf/cond-mat/0501368.pdf
  • 29. "   Q & A
  • 30. Agenda "   Dynamic Community Detection "   Streaming Graph "   Models and Algorithms "   Complex GraphX Operators "   Streaming Optimization "   Conclusion
  • 31. Static vs. Dynamic Static Model Dynamic Model