SlideShare une entreprise Scribd logo
1  sur  128
Mining Billion-node Graphs: Patterns, Generators and Tools Christos Faloutsos CMU
Thanks! ,[object Object],Hadoop Summit '10 C. Faloutsos (CMU)
Our goal: ,[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Graphs - why should we care? C. Faloutsos (CMU) Internet Map [lumeta.com] Food Web [Martinez ’91] Protein Interactions [genomebiology.com] Friendship Network [Moody ’01] Hadoop Summit '10
Graphs - why should we care? ,[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10 D 1 D N T 1 T M ... ...
Graphs - why should we care? ,[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Problem #1 - network and graph mining ,[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Problem #1 - network and graph mining ,[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Problem #1 - network and graph mining ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Graph mining ,[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Laws and patterns ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Solution# S.1 ,[object Object],C. Faloutsos (CMU) log(rank) log(degree) internet domains att.com ibm.com Hadoop Summit '10 -0.82
Solution# S.2: Eigen Exponent  E ,[object Object],C. Faloutsos (CMU) E = -0.48 Exponent = slope Eigenvalue Rank of decreasing eigenvalue May 2001 Hadoop Summit '10
Solution# S.2: Eigen Exponent  E ,[object Object],C. Faloutsos (CMU) E = -0.48 Exponent = slope Eigenvalue Rank of decreasing eigenvalue May 2001 Hadoop Summit '10
But: ,[object Object],C. Faloutsos (CMU) Hadoop Summit '10
More power laws: ,[object Object],C. Faloutsos (CMU) Web Site Traffic in-degree (log scale) Count (log scale) Zipf ``ebay’’ Hadoop Summit '10 users sites
epinions.com ,[object Object],C. Faloutsos (CMU) (out) degree count trusts-2000-people user Hadoop Summit '10
And numerous more ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Hadoop Summit '10 C. Faloutsos (CMU)
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Solution# S.3: Triangle ‘Laws’ ,[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Solution# S.3: Triangle ‘Laws’ ,[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Triangle Law: #S.3  [Tsourakakis ICDM 2008] C. Faloutsos (CMU) ASN HEP-TH Epinions X-axis: # of Triangles a node participates in Y-axis: count of such nodes Hadoop Summit '10
Triangle Law: #S.3  [Tsourakakis ICDM 2008] C. Faloutsos (CMU) ASN HEP-TH Epinions X-axis: # of Triangles a node participates in Y-axis: count of such nodes Hadoop Summit '10
Triangle Law: #S.4  [Tsourakakis ICDM 2008] C. Faloutsos (CMU) SN Reuters Epinions X-axis: degree Y-axis: mean # triangles n  friends -> ~ n 1.6  triangles Hadoop Summit '10
Triangle Law: Computations  [Tsourakakis ICDM 2008] C. Faloutsos (CMU) But: triangles are expensive to compute (3-way join; several approx. algos) Q: Can we do that quickly? details Hadoop Summit '10
Triangle Law: Computations  [Tsourakakis ICDM 2008] C. Faloutsos (CMU) But: triangles are expensive to compute (3-way join; several approx. algos) Q: Can we do that quickly? A: Yes! #triangles = 1/6 Sum (   i 3  ) (and, because of skewness, we only need  the top few eigenvalues! details Hadoop Summit '10
Triangle Law: Computations  [Tsourakakis ICDM 2008] C. Faloutsos (CMU) 1000x+ speed-up, >90% accuracy details Hadoop Summit '10
EigenSpokes ,[object Object],C. Faloutsos (CMU) Hadoop Summit '10
EigenSpokes ,[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
EigenSpokes ,[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10 N N details
EigenSpokes ,[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) u1 u2 Hadoop Summit '10 1 st  Principal  component 2 nd  Principal  component
EigenSpokes ,[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10 u1 u2 90 o
EigenSpokes - pervasiveness ,[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
EigenSpokes - explanation ,[object Object],C. Faloutsos (CMU) Hadoop Summit '10
EigenSpokes - explanation ,[object Object],C. Faloutsos (CMU) Hadoop Summit '10
EigenSpokes - explanation ,[object Object],C. Faloutsos (CMU) Hadoop Summit '10
EigenSpokes - explanation ,[object Object],[object Object],[object Object],[object Object],[object Object],spy plot of top 20 nodes C. Faloutsos (CMU) Hadoop Summit '10
Bipartite Communities! magnified bipartite community patents from same inventor(s) cut-and-paste bibliography! C. Faloutsos (CMU) Hadoop Summit '10
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Observations on  weighted graphs? ,[object Object],C. Faloutsos (CMU) M. McGlohon, L. Akoglu, and C. Faloutsos  Weighted Graphs and Disconnected Components: Patterns and a Generator.   SIG-KDD  2008  Hadoop Summit '10
Observation W.1: Fortification ,[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Observation W.1: Fortification C. Faloutsos (CMU) More donors,  more $ ? $10 $5 Hadoop Summit '10 ‘ Reagan’ ‘ Clinton’ $7
Observation W.1: fortification: Snapshot Power Law ,[object Object],[object Object],Edges (# donors) In-weights ($) C. Faloutsos (CMU) Orgs-Candidates e.g. John Kerry,  $10M received, from 1K donors More donors,  even  more $ $10 $5 Hadoop Summit '10
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Problem: Time evolution ,[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
T.1 Evolution of the Diameter ,[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
T.1 Evolution of the Diameter ,[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
T.1 Diameter – “Patents” ,[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) time [years] diameter Hadoop Summit '10
T.2 Temporal Evolution of the Graphs ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
T.2 Temporal Evolution of the Graphs ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
T.2 Densification – Patent Citations ,[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) N(t) E(t) 1.66 Hadoop Summit '10
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
More on Time-evolving   graphs C. Faloutsos (CMU) M. McGlohon, L. Akoglu, and C. Faloutsos  Weighted Graphs and Disconnected Components: Patterns and a Generator.   SIG-KDD  2008  Hadoop Summit '10
Observation T.3: NLCC behavior ,[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Observation T.3: NLCC behavior ,[object Object],C. Faloutsos (CMU) IMDB CC size Time-stamp Hadoop Summit '10
Timing for Blogs ,[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
T.4 : popularity over time C. Faloutsos (CMU) Post popularity drops-off – exponentially? lag: days after post # in links 1 2 3 @t @t +  lag Hadoop Summit '10
T.4 : popularity over time C. Faloutsos (CMU) Post popularity drops-off – exponentially? POWER LAW! Exponent? # in links ( log ) 1 2 3 days after post ( log ) Hadoop Summit '10
T.4 : popularity over time C. Faloutsos (CMU) ,[object Object],[object Object],[object Object],[object Object],[object Object],# in links ( log ) 1 2 3 -1.6 days after post ( log ) Hadoop Summit '10
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
CenterPiece Subgraphs ,[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Ce nter- P iece  S ubgraph Discovery  [Tong+ KDD 06] Original Graph Q: Who is the most central node wrt the black nodes?  (e.g., master-mind criminal, common advisor/collaborator, etc) Input C. Faloutsos (CMU) Hadoop Summit '10 B A C
Ce nter- P iece  S ubgraph Discovery  [Tong+ KDD 06] Q: How to find hub for the query nodes? Input:  original graph Output:  CePS CePS Node C. Faloutsos (CMU) A: Combine proximity scores (RWR) Hadoop Summit '10 B A C B A C
CePS : Example (AND Query) ? C. Faloutsos (CMU) Hadoop Summit '10 ,[object Object],[object Object],[object Object]
CePS : Example (AND Query) C. Faloutsos (CMU) ,[object Object],[object Object],[object Object],Hadoop Summit '10
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
G raph X -Ray:  Fast Best-Effort Pattern Matching  in Large Attributed Graphs Hanghang Tong, Brian Gallagher,  Christos Faloutsos, Tina Eliassi-Rad KDD’07
Output Input Attributed Data Graph Query Graph Matching Subgraph Hadoop Summit '10 C. Faloutsos (CMU)
Effectiveness: star-query  Query  Result Hadoop Summit '10 C. Faloutsos (CMU)
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
OddBall: Spotting  A n o m a l i e s  in  Weighted Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University  School of Computer Science To appear in PAKDD 2010, Hyderabad, India
Main idea ,[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
What is an egonet? ego egonet C. Faloutsos (CMU) Hadoop Summit '10
Selected Features ,[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Near-Clique/Star Hadoop Summit '10 C. Faloutsos (CMU)
Near-Clique/Star C. Faloutsos (CMU) Hadoop Summit '10
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Outline – Algorithms & results C. Faloutsos (CMU) Hadoop Summit '10 Centralized Hadoop/PEGASUS Degree Distr. old old Pagerank old old Diameter/ANF old DONE Conn. Comp old DONE Triangles DONE Visualization STARTED
HADI for diameter estimation ,[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
[object Object],[object Object],???? ?? 19+? [Barabasi+]  (‘99, O(10 6 ) nodes) C. Faloutsos (CMU) Radius Count Hadoop Summit '10
[object Object],[object Object],???? C. Faloutsos (CMU) Radius Count Hadoop Summit '10 14 (dir.) ~7 (undir.) 19+? [Barabasi+]  (‘99, O(10 6 ) nodes)
[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10 Shape?
[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
C. Faloutsos (CMU) ,[object Object],[object Object],[object Object],Hadoop Summit '10
Radius Plot of  GCC  of YahooWeb. C. Faloutsos (CMU) Hadoop Summit '10
Running time -  Kronecker and Erdos-Renyi  Graphs with billions edges. details
Outline – Algorithms & results C. Faloutsos (CMU) Hadoop Summit '10 Centralized Hadoop/PEGASUS Degree Distr. old old Pagerank old old Diameter/ANF old DONE Conn. Comp old DONE Triangles DONE Visualization STARTED
Generalized Iterated Matrix Vector Multiplication (GIMV) C. Faloutsos (CMU) PEGASUS: A Peta-Scale Graph Mining  System - Implementation and Observations .  U Kang, Charalampos E. Tsourakakis,  and Christos Faloutsos.  ( ICDM ) 2009, Miami, Florida, USA.  Best Application Paper (runner-up) .   Hadoop Summit '10
Generalized Iterated Matrix Vector Multiplication (GIMV) C. Faloutsos (CMU) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Matrix – vector Multiplication (iterated) Hadoop Summit '10 details
Example: GIM-V At Work ,[object Object],Size Count C. Faloutsos (CMU) Hadoop Summit '10
Example: GIM-V At Work ,[object Object],Size Count C. Faloutsos (CMU) Hadoop Summit '10 ~0.7B  singleton nodes
Example: GIM-V At Work ,[object Object],Size Count C. Faloutsos (CMU) Hadoop Summit '10
Example: GIM-V At Work ,[object Object],Size Count 300-size cmpt X 500. Why? 1100-size cmpt X 65. Why? C. Faloutsos (CMU) Hadoop Summit '10
Example: GIM-V At Work ,[object Object],Size Count suspicious financial-advice sites (not existing now) C. Faloutsos (CMU) Hadoop Summit '10
GIM-V At Work ,[object Object],[object Object],Stable tail slope after the gelling point C. Faloutsos (CMU) Hadoop Summit '10
Outline – Algorithms & results C. Faloutsos (CMU) Hadoop Summit '10 Centralized Hadoop/PEGASUS Degree Distr. old old Pagerank old old Diameter/ANF old DONE Conn. Comp old DONE Triangles DONE Visualization STARTED
Triangles : Computations  [Tsourakakis ICDM 2008] C. Faloutsos (CMU) But: triangles are expensive to compute (3-way join; several approx. algos) Q: Can we do that quickly? A: Yes! #triangles = 1/6 Sum (   i 3  ) (and, because of skewness, we only need  the top few eigenvalues! Mentioned already Hadoop Summit '10
Triangle Law: #1  [Tsourakakis ICDM 2008] C. Faloutsos (CMU) ASN HEP-TH Epinions X-axis: # of Triangles a node participates in Y-axis: count of such nodes Mentioned already Hadoop Summit '10
Outline – Algorithms & results C. Faloutsos (CMU) Hadoop Summit '10 Centralized Hadoop/PEGASUS Degree Distr. old old Pagerank old old Diameter/ANF old DONE Conn. Comp old DONE Triangles DONE Visualization STARTED
Visualization: ShiftR ,[object Object],C. Faloutsos (CMU) Hadoop Summit '10
C. Faloutsos (CMU) Hadoop Summit '10
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Other topics - part#1 - tools ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Tensors ,[object Object],Hadoop Summit '10 C. Faloutsos (CMU) keyword 1990 Author
Tensors ,[object Object],Hadoop Summit '10 C. Faloutsos (CMU) keyword 1991 1992 1990 Author
Tensors ,[object Object],~ + PARAFAC tensor decomposition (generalization of SVD) Hadoop Summit '10 C. Faloutsos (CMU) keyword 1991 1992 1990 Author
Other topics – part#2 - generators ,[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Kronecker Product – a Graph ,[object Object],Hadoop Summit '10 C. Faloutsos (CMU)
Other topics - part#3 – virus propagation ,[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
More info ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Hadoop Summit '10 C. Faloutsos (CMU)
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
OVERALL CONCLUSIONS – low level: ,[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
OVERALL CONCLUSIONS – high level ,[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
References ,[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
References ,[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
References ,[object Object],C. Faloutsos (CMU) Hadoop Summit '10
References ,[object Object],C. Faloutsos (CMU) Hadoop Summit '10
References ,[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
References ,[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
References ,[object Object],Hadoop Summit '10 C. Faloutsos (CMU)
References ,[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
References ,[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Joint papers with LLNL ,[object Object],[object Object],[object Object],Hadoop Summit '10 C. Faloutsos (CMU)
Joint papers with LLNL ,[object Object],[object Object],Hadoop Summit '10 C. Faloutsos (CMU)
Joint papers with LLNL ,[object Object],[object Object],Hadoop Summit '10 C. Faloutsos (CMU)
Project info ,[object Object],C. Faloutsos (CMU) Akoglu,  Leman Chau,  Polo Kang, U McGlohon,  Mary Tsourakakis,  Babis Tong,  Hanghang Prakash, Aditya Hadoop Summit '10 Thanks to: Yahoo (M45 + gifts + data) NSF, LLNL, CTA-INARC, IBM, SPRINT, INTEL, HP

Contenu connexe

Similaire à Mining Billion-node Graphs: Patterns, Generators and Tools__HadoopSummit2010

Hadoop for Data Science: Moving from BI dashboards to R models, using Hive st...
Hadoop for Data Science: Moving from BI dashboards to R models, using Hive st...Hadoop for Data Science: Moving from BI dashboards to R models, using Hive st...
Hadoop for Data Science: Moving from BI dashboards to R models, using Hive st...huguk
 
Streamly: Concurrent Data Flow Programming
Streamly: Concurrent Data Flow ProgrammingStreamly: Concurrent Data Flow Programming
Streamly: Concurrent Data Flow ProgrammingHarendra Kumar
 
Propagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsPropagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsEnrico Daga
 
PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" Joshua Bloom
 
XLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaXLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaUniversity of Washington
 
Stream Reasoning : Where We Got So Far
Stream Reasoning: Where We Got So FarStream Reasoning: Where We Got So Far
Stream Reasoning : Where We Got So FarEmanuele Della Valle
 
Continuous Deep Analytics
Continuous Deep AnalyticsContinuous Deep Analytics
Continuous Deep AnalyticsParis Carbone
 
Agents In An Exponential World Foster
Agents In An Exponential World FosterAgents In An Exponential World Foster
Agents In An Exponential World FosterIan Foster
 
Clouds, Grids and Data
Clouds, Grids and DataClouds, Grids and Data
Clouds, Grids and DataGuy Coates
 
Web Services Catalog
Web Services CatalogWeb Services Catalog
Web Services CatalogRudolf Husar
 
Self-Similarity in Complex Networks
Self-Similarity in Complex NetworksSelf-Similarity in Complex Networks
Self-Similarity in Complex Networksnorman_fahrer
 
Systems-of-Systems in our Future?
Systems-of-Systems in our Future?Systems-of-Systems in our Future?
Systems-of-Systems in our Future?rhlumsde
 
Bruce, T. R., and Richards, R. C. (2011). Adapting Specialized Legal Metadata...
Bruce, T. R., and Richards, R. C. (2011). Adapting Specialized Legal Metadata...Bruce, T. R., and Richards, R. C. (2011). Adapting Specialized Legal Metadata...
Bruce, T. R., and Richards, R. C. (2011). Adapting Specialized Legal Metadata...Robert Richards
 
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...Larry Smarr
 
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataPaco Nathan
 
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...Folio3 Software
 

Similaire à Mining Billion-node Graphs: Patterns, Generators and Tools__HadoopSummit2010 (20)

Hadoop for Data Science: Moving from BI dashboards to R models, using Hive st...
Hadoop for Data Science: Moving from BI dashboards to R models, using Hive st...Hadoop for Data Science: Moving from BI dashboards to R models, using Hive st...
Hadoop for Data Science: Moving from BI dashboards to R models, using Hive st...
 
Streamly: Concurrent Data Flow Programming
Streamly: Concurrent Data Flow ProgrammingStreamly: Concurrent Data Flow Programming
Streamly: Concurrent Data Flow Programming
 
Propagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsPropagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data Flows
 
PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning"
 
XLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaXLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and Myria
 
Stream Reasoning : Where We Got So Far
Stream Reasoning: Where We Got So FarStream Reasoning: Where We Got So Far
Stream Reasoning : Where We Got So Far
 
Continuous Deep Analytics
Continuous Deep AnalyticsContinuous Deep Analytics
Continuous Deep Analytics
 
Agents In An Exponential World Foster
Agents In An Exponential World FosterAgents In An Exponential World Foster
Agents In An Exponential World Foster
 
Complex Models for Big Data
Complex Models for Big DataComplex Models for Big Data
Complex Models for Big Data
 
Clouds, Grids and Data
Clouds, Grids and DataClouds, Grids and Data
Clouds, Grids and Data
 
14 turing wics
14 turing wics14 turing wics
14 turing wics
 
2015 illinois-talk
2015 illinois-talk2015 illinois-talk
2015 illinois-talk
 
Web Services Catalog
Web Services CatalogWeb Services Catalog
Web Services Catalog
 
Self-Similarity in Complex Networks
Self-Similarity in Complex NetworksSelf-Similarity in Complex Networks
Self-Similarity in Complex Networks
 
Systems-of-Systems in our Future?
Systems-of-Systems in our Future?Systems-of-Systems in our Future?
Systems-of-Systems in our Future?
 
Technology Disruption
Technology DisruptionTechnology Disruption
Technology Disruption
 
Bruce, T. R., and Richards, R. C. (2011). Adapting Specialized Legal Metadata...
Bruce, T. R., and Richards, R. C. (2011). Adapting Specialized Legal Metadata...Bruce, T. R., and Richards, R. C. (2011). Adapting Specialized Legal Metadata...
Bruce, T. R., and Richards, R. C. (2011). Adapting Specialized Legal Metadata...
 
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...
 
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About Data
 
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...
 

Plus de Yahoo Developer Network

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaYahoo Developer Network
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Yahoo Developer Network
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanYahoo Developer Network
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Yahoo Developer Network
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathYahoo Developer Network
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuYahoo Developer Network
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolYahoo Developer Network
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Yahoo Developer Network
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Yahoo Developer Network
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathYahoo Developer Network
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Yahoo Developer Network
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathYahoo Developer Network
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsYahoo Developer Network
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Yahoo Developer Network
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondYahoo Developer Network
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Yahoo Developer Network
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...Yahoo Developer Network
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexYahoo Developer Network
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsYahoo Developer Network
 

Plus de Yahoo Developer Network (20)

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
 
CICD at Oath using Screwdriver
CICD at Oath using ScrewdriverCICD at Oath using Screwdriver
CICD at Oath using Screwdriver
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, Oath
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI Applications
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
 

Dernier

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Dernier (20)

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 

Mining Billion-node Graphs: Patterns, Generators and Tools__HadoopSummit2010

  • 1. Mining Billion-node Graphs: Patterns, Generators and Tools Christos Faloutsos CMU
  • 2.
  • 3.
  • 4.
  • 5. Graphs - why should we care? C. Faloutsos (CMU) Internet Map [lumeta.com] Food Web [Martinez ’91] Protein Interactions [genomebiology.com] Friendship Network [Moody ’01] Hadoop Summit '10
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24. Triangle Law: #S.3 [Tsourakakis ICDM 2008] C. Faloutsos (CMU) ASN HEP-TH Epinions X-axis: # of Triangles a node participates in Y-axis: count of such nodes Hadoop Summit '10
  • 25. Triangle Law: #S.3 [Tsourakakis ICDM 2008] C. Faloutsos (CMU) ASN HEP-TH Epinions X-axis: # of Triangles a node participates in Y-axis: count of such nodes Hadoop Summit '10
  • 26. Triangle Law: #S.4 [Tsourakakis ICDM 2008] C. Faloutsos (CMU) SN Reuters Epinions X-axis: degree Y-axis: mean # triangles n friends -> ~ n 1.6 triangles Hadoop Summit '10
  • 27. Triangle Law: Computations [Tsourakakis ICDM 2008] C. Faloutsos (CMU) But: triangles are expensive to compute (3-way join; several approx. algos) Q: Can we do that quickly? details Hadoop Summit '10
  • 28. Triangle Law: Computations [Tsourakakis ICDM 2008] C. Faloutsos (CMU) But: triangles are expensive to compute (3-way join; several approx. algos) Q: Can we do that quickly? A: Yes! #triangles = 1/6 Sum (  i 3 ) (and, because of skewness, we only need the top few eigenvalues! details Hadoop Summit '10
  • 29. Triangle Law: Computations [Tsourakakis ICDM 2008] C. Faloutsos (CMU) 1000x+ speed-up, >90% accuracy details Hadoop Summit '10
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40. Bipartite Communities! magnified bipartite community patents from same inventor(s) cut-and-paste bibliography! C. Faloutsos (CMU) Hadoop Summit '10
  • 41.
  • 42.
  • 43.
  • 44. Observation W.1: Fortification C. Faloutsos (CMU) More donors, more $ ? $10 $5 Hadoop Summit '10 ‘ Reagan’ ‘ Clinton’ $7
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55. More on Time-evolving graphs C. Faloutsos (CMU) M. McGlohon, L. Akoglu, and C. Faloutsos Weighted Graphs and Disconnected Components: Patterns and a Generator. SIG-KDD 2008 Hadoop Summit '10
  • 56.
  • 57.
  • 58.
  • 59. T.4 : popularity over time C. Faloutsos (CMU) Post popularity drops-off – exponentially? lag: days after post # in links 1 2 3 @t @t + lag Hadoop Summit '10
  • 60. T.4 : popularity over time C. Faloutsos (CMU) Post popularity drops-off – exponentially? POWER LAW! Exponent? # in links ( log ) 1 2 3 days after post ( log ) Hadoop Summit '10
  • 61.
  • 62.
  • 63.
  • 64. Ce nter- P iece S ubgraph Discovery [Tong+ KDD 06] Original Graph Q: Who is the most central node wrt the black nodes? (e.g., master-mind criminal, common advisor/collaborator, etc) Input C. Faloutsos (CMU) Hadoop Summit '10 B A C
  • 65. Ce nter- P iece S ubgraph Discovery [Tong+ KDD 06] Q: How to find hub for the query nodes? Input: original graph Output: CePS CePS Node C. Faloutsos (CMU) A: Combine proximity scores (RWR) Hadoop Summit '10 B A C B A C
  • 66.
  • 67.
  • 68.
  • 69. G raph X -Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad KDD’07
  • 70. Output Input Attributed Data Graph Query Graph Matching Subgraph Hadoop Summit '10 C. Faloutsos (CMU)
  • 71. Effectiveness: star-query Query Result Hadoop Summit '10 C. Faloutsos (CMU)
  • 72.
  • 73. OddBall: Spotting A n o m a l i e s in Weighted Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School of Computer Science To appear in PAKDD 2010, Hyderabad, India
  • 74.
  • 75. What is an egonet? ego egonet C. Faloutsos (CMU) Hadoop Summit '10
  • 76.
  • 77. Near-Clique/Star Hadoop Summit '10 C. Faloutsos (CMU)
  • 78. Near-Clique/Star C. Faloutsos (CMU) Hadoop Summit '10
  • 79.
  • 80. Outline – Algorithms & results C. Faloutsos (CMU) Hadoop Summit '10 Centralized Hadoop/PEGASUS Degree Distr. old old Pagerank old old Diameter/ANF old DONE Conn. Comp old DONE Triangles DONE Visualization STARTED
  • 81.
  • 82.
  • 83.
  • 84.
  • 85.
  • 86.
  • 87. Radius Plot of GCC of YahooWeb. C. Faloutsos (CMU) Hadoop Summit '10
  • 88. Running time - Kronecker and Erdos-Renyi Graphs with billions edges. details
  • 89. Outline – Algorithms & results C. Faloutsos (CMU) Hadoop Summit '10 Centralized Hadoop/PEGASUS Degree Distr. old old Pagerank old old Diameter/ANF old DONE Conn. Comp old DONE Triangles DONE Visualization STARTED
  • 90. Generalized Iterated Matrix Vector Multiplication (GIMV) C. Faloutsos (CMU) PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations . U Kang, Charalampos E. Tsourakakis, and Christos Faloutsos. ( ICDM ) 2009, Miami, Florida, USA. Best Application Paper (runner-up) . Hadoop Summit '10
  • 91.
  • 92.
  • 93.
  • 94.
  • 95.
  • 96.
  • 97.
  • 98. Outline – Algorithms & results C. Faloutsos (CMU) Hadoop Summit '10 Centralized Hadoop/PEGASUS Degree Distr. old old Pagerank old old Diameter/ANF old DONE Conn. Comp old DONE Triangles DONE Visualization STARTED
  • 99. Triangles : Computations [Tsourakakis ICDM 2008] C. Faloutsos (CMU) But: triangles are expensive to compute (3-way join; several approx. algos) Q: Can we do that quickly? A: Yes! #triangles = 1/6 Sum (  i 3 ) (and, because of skewness, we only need the top few eigenvalues! Mentioned already Hadoop Summit '10
  • 100. Triangle Law: #1 [Tsourakakis ICDM 2008] C. Faloutsos (CMU) ASN HEP-TH Epinions X-axis: # of Triangles a node participates in Y-axis: count of such nodes Mentioned already Hadoop Summit '10
  • 101. Outline – Algorithms & results C. Faloutsos (CMU) Hadoop Summit '10 Centralized Hadoop/PEGASUS Degree Distr. old old Pagerank old old Diameter/ANF old DONE Conn. Comp old DONE Triangles DONE Visualization STARTED
  • 102.
  • 103. C. Faloutsos (CMU) Hadoop Summit '10
  • 104.
  • 105.
  • 106.
  • 107.
  • 108.
  • 109.
  • 110.
  • 111.
  • 112.
  • 113.
  • 114.
  • 115.
  • 116.
  • 117.
  • 118.
  • 119.
  • 120.
  • 121.
  • 122.
  • 123.
  • 124.
  • 125.
  • 126.
  • 127.
  • 128.

Notes de l'éditeur

  1. Faloutsos
  2. Faloutsos et al 06/30/10
  3. Faloutsos
  4. Faloutsos
  5. Faloutsos
  6. Faloutsos
  7. Faloutsos
  8. Faloutsos
  9. Faloutsos
  10. Faloutsos
  11. Faloutsos
  12. Faloutsos
  13. Faloutsos
  14. Faloutsos
  15. Faloutsos
  16. Faloutsos
  17. A = U Sigma U^T
  18. A = U Sigma U^T vec{u}_1 vec{u}_i
  19. Faloutsos
  20. Faloutsos
  21. Faloutsos
  22. Faloutsos
  23. Faloutsos
  24. Faloutsos
  25. Faloutsos
  26. Faloutsos
  27. Faloutsos
  28. Faloutsos Diameter first, DPL second Check diameter formulas As the network grows the distances between nodes slowly grow
  29. Faloutsos Diameter first, DPL second Check diameter formulas As the network grows the distances between nodes slowly grow
  30. Faloutsos
  31. Faloutsos
  32. Faloutsos
  33. Faloutsos
  34. Faloutsos
  35. Faloutsos
  36. Faloutsos
  37. SOME OLD RULES
  38. SOME OLD RULES
  39. Faloutsos et al 06/30/10
  40. Faloutsos et al 06/30/10
  41. Faloutsos
  42. Faloutsos et al 06/30/10
  43. Faloutsos et al 06/30/10
  44. Faloutsos
  45. Faloutsos
  46. Faloutsos
  47. lambda_1 Faloutsos
  48. Faloutsos
  49. Faloutsos
  50. Faloutsos
  51. Faloutsos
  52. Faloutsos
  53. Faloutsos
  54. Faloutsos
  55. Faloutsos et al 06/30/10