SlideShare une entreprise Scribd logo
1  sur  29
Graph Processing
  Applications
praveensripati@gmail.com

www.thecloudavenue.com

    @praveensripati
Agenda

Introduction to Graphs

     Representing graphs

     Different types of graphs

     Algorithms in graphs

What constitutes a graph application

     Graph databases (examples and how they work)

     Graph computing engines (examples and how they work)

Questions & Answers
What are/aren't Graphs in this context?




         YES                   NO
How is a graph represented?
                                               4




                 1              2              3              6



                                                                               Vertex

                                                   5
                                                                      Edge

A collection of vertices connected to each other using edges, with both vertices and edges
having properties. A vertex can be a person, place, account or any item which needs to be
tracked.
W
                                                                                  Sh hom

                           n ds
                                ?      A social graph                               ee s
                                                                                      ta ho
                                                                                        l t ul
                                                                                           o d
                      f rie                                                                 be I r
                 's                                                                           fri eco
              run                                                Deepak
                                                                                                 en m
        reA                                                                                        ds m
    h oa                                                            4                                wi en
W                                                                                                      th d
                                                                                                         ?

                                             Friend              Relative
                                    Friend                                   Friend




                                                        Friend
                               1               2                     3      Bob       6   Sheetal
      Name:Arun                               Tom
       Age : 25
       Sex : M                                                    Friend Relation : Collegue
                                             Collegue
                                                                                                       Vertex
                                                                        5
                                                                                                Edge
Properties                                                         Prajval
Facebook Recruiting Competition
                     @
                 w           The challenge is to recommend missing links in a social
              vie
         inter ok?           network. Participants will be presented with an external
    t an cebo                anonymized, directed social graph (no, not Facebook, keep
  an Fa                      guessing) from which some edges have been deleted, and
W
                             asked to make ranked predictions for each user in the test set
                             of which other users they would want to follow.

                                             What is Kaggle?
                         4                   Kaggle is an innovative solution for
                                             statistical/analytics outsourcing. We are the
                                             leading platform for predictive modeling
                                             competitions. Companies, governments and
 1            2          3            6      researchers present datasets and problems - the
                                             world's best data scientists then compete to
                                             produce the best solutions. At the end of a
                                             competition, the competition host pays prize
                                             money in exchange for the intellectual property
                         5
                                             behind the winning model.

                               http://www.kaggle.com/c/FacebookRecruiting
I
                                                                           th wou
                   r tes
                        t
                 ho een ta?
                                A spatial graph                              e
                                                                                pl ld l
                                                                                  a
               s                                                             sh ce ike
           t he etw lcut                                                         or s, to
       t is e b Ca                                  New Delhi                      te wh co
                                                                                     st ic v
    ha tanc and                                                                         pa h er
   W is re
     D alo                                                4                               th is all
       g                                                                                    ? th
                                                                                                 e
   B an                         450 km
                                                                      600 km
                                                     250 km

                              350 km            450 km
                          1              2                 3 Lucknow      6    Kolkotta
   Name:Bangalore                      Mumbai
Populataion : 25,00,000                                  850 km
 Area : 35,000 SqKm                                                Distance : 700 km
                                                                                              Vertex
                                  800 km
                                                              5
                                                                                       Edge
      Properties                                         Chennai
How to represent a Graph for computing?
                                                                            3, 6
.... as an adjacency list for sparse graph                              4

1 -> 2,4,5
2 -> 3
3 -> 5                                  2, 4, 5           3                     5
4 -> 3.6
5 ->                                         1            2             3             6
6 -> 5
                                                                                      5
.... as an adjacency matrix for dense graph

       1     2    3     4     5    6
                                                                            5
  1    0     1    0     1     1    0
  2    0     0    1     0     0    0              A graph with few edges is sparse,
                                                       many edges is dense.
  3    0     0    0     0     1    0
  4    0     0    1     0     0    0
  5    0     0    0     0     0    0              Obviously, the web with billions
                                                  of pages cannot be represented
  6    0     0    0     0     1    0                   as an adjaceny matrix.
Different Graphs

 Social graph (Facebook, LinkedIn etc)

 Spacial graph (Google Maps, MapQuest, FedEx etc)

 Web graph (PageRank, Recomendations etc)

 Computer network graph (Optimal network layout
etc)

 Financial graph (Fraud detection, Currency Flow
etc)

 Data representations (Lists etc)

 Chemistry (to represent genomes/molucules)

 And others
Some of the Graph Algorithms

    Shortest path (Finding the shortest path from A to B)

    Minimal Spanning Tree (Cheapest way to connect objects, so that each
    object is connected to another – can be used in internet, cable wiring etc)





    Graph center (placing a warehouse, hospital in a city, so that all the
    locations can be reached easily)

    Bipartite Matching (Matching in a dating site, job to employee and others)

    Finding Planar Graph (as in the case of circuit designs).

                      http://www.graph-magics.com/practic_use.php
Graph Applications


                  Applications




                                                  Hama
                                   Giraph



Graph Databases                  Graph processing frameworks
How to store a Graph?
                                      Sim
                                      an ple, b
                                        de
Option 1 : In a flat file as               asy ut no
                                                to t effi
                                                  ma cie
       1- 4,5,6                                      inta nt
                                                          in.
       4- 2,5,6

Where vertex 1 is connected to vertex 4,5,6 and so on



Option 2 : In a relational database using referencing
tables or join tables.



Option 3 : Using a specialized database designed only
and only for graphs.
Comparing Graph with Relational DB
                 ld
             wou ring
        one r sto
    ich fer fo ata?
Wh pre h d              In a DB of 1,000,000 users finding friends-of-friends
          p
y ou Gra                         for 1,000 users at various depths.


     Depth                             Execution Time – MySQL             Execution Time –Neo4j
     2                                 0.016                              0.010
     3                                 30.267                             0.168
     4                                 1,543.505                          1.359
     5                                 Not Finished in 1 Hour             2.132




              http://www.neotechnology.com/2012/06/how-much-faster-is-a-graph-database-really/
So, what is a Graph DB?
A graph database is any storage system that
provides `index free adjacency`.                                          3, 6
                                                                     4



                                       2, 4, 5          3                    5
                                          1             2             3              6

                                                                                 5



                                                                         5
Every element (node or edge) has a direct pointer to it's adjacent element.

No Index lookup : We can determine which vertex is adjacent wo which other vertex
without lookup an index-tree.
So, what is a Graph DB? (.....)

                      n
                 p tio s.
           th e o raph
         is g g
    h DB istin
         s
 rap per
G en
 wh
So, what is a Graph DB? (.....)


                          Key Value Store like Amazon Dynamo.
Data Size




                                     Columnar Databases like Cassandra, HBase.


                                               Document Databases like MongoDB,
                                               CouchDB..

                                                        Graph Databases like Neo4J
                            ily
                            m
                          fa
                        L
                      Q
                    oS
                    N
                t he




                                  Data Complexity
             of
             rt
            Pa
Graph DB Bindings (~JDBC API)
//connect to the database
//begin transaction

Node firstNode;
Node secondNode;
Relationship relationship;

firstNode = graphDb.createNode();
firstNode.setProperty( "message", "Hello, " );
secondNode = graphDb.createNode();
secondNode.setProperty( "message", "World!" );

relationship = firstNode.createRelationshipTo( secondNode,
RelTypes.KNOWS );
relationship.setProperty( "message", "brave Neo4j " );

//end the transaction
//close the connection to the database


           http://docs.neo4j.org/chunked/milestone/tutorials-java-embedded-hello-world.html
Graph Adhoc Query (~SQL)

START john=node:node_auto_index(name = 'John')
MATCH john-[:friend]->()-[:friend]->fof
RETURN john, fof



 john                    fof
 Node[4]{name:"John"}    Node[2]{name:"Maria"}
 Node[4]{name:"John"}    Node[3]{name:"Steve"}




                  http://docs.neo4j.org/chunked/milestone/cypher-query-lang.html
Different Graph Databases
                                                      FlockDB from
                                                      Twitter

                           Allegrograph



GraphBase




                                                   From
                                                   Objectivity




     http://en.wikipedia.org/wiki/Graph_database
What is a Graph Computing Engine?

 Algorithms




                 Graph Computing                                     OutputFormat
                 Engine                                             Output Location




                 Graph engines come with some built-in graph
 InputFormat     processing algorithms, but also provide an easy to use
Input Location   API to build new algorithms and extend the framework.

                 http://incubator.apache.org/giraph/apidocs/index.html
                 http://incubator.apache.org/hama/docs/r0.3.0/api/index.html
Different Graph Computing Engines

Memory based graphs like (graph size < local machine ram)
     - jung.sourceforge.net
     - igraph.sourceforge.net
     - metworkx.lanl.gov

Disk based graphs like (graph size < local hard disk size)
       - Neo4j
       - Infinite Graph – objectivity.com
       - sparsity-technologies.com/dex

Cluster based graphs like (depends on the cluster specs)
                                                                                            l
       - Apache Hama                                                                     de
                                                                                       mo l
       - Apache Giraph                                                        SP llel) ege
                                                                             B a r
       - GoldenORB
                                                                      d  on Par le p
                                                                    se ous oog
                                                                 Ba ron f G
                                                                    h      o
                                                                y nc pirit
                                                           l k S he s
                                                       ( Bu in t
Bulk Synchronous Parallel

Some quick facts

• An alternate computing model to MapReduce (Not all problems can be solved with
  MapReduce efficiently). Also, any MR algorithm can be simulated on BSP and
  vice versa.

  Developed by Leslie Valinat during the 1980s. Was resurrected by Google in the
  Pregel Paper (extensively used for PageRank)

  Good for

  - Processing big data with complicated relationships, eg., graph and networks.
  - Iterative and Recursive scientific computations
  - Continious Event Processing (CEP)




         http://googleresearch.blogspot.in/2009/06/large-scale-graph-computing-at-google.html
                         http://arxiv.org/abs/1203.2081 – Comparing MR vs BSP
What is Bulk Synchronous Parallel?


                                                                       Super Step 1



                                                                       Super Step 2




                                                                       Super Step 3




            http://en.wikipedia.org/wiki/Bulk_synchronous_parallel/
    http://blog.octo.com/en/introduction-to-large-scale-graph-processing/
Hama vs Giraph
                        Derived                           Derived

                                Google Pregel **


                                                            Giraph


                  Hama                                        BSP


                   BSP                                  MapReduce



                                       HDFS

** http://googleresearch.blogspot.in/2009/06/large-scale-graph-computing-at-google.html
Hama vs Giraph (.....)

                    Hama                                                    Giraph
Pure BSP engine.                                     Uses BSP, but BSP API is not exposed.
Matrix, Graph, Network and other                     Just for Graph processing.
procesing.
Jobs are run as a BSP Job on HDFS.                   Jobs as run as MapReduce on Hadoop.

Both of them are derived from on `Pregel : A System for Large-Scale Graph
Processing` paper published by Google. Both have been recently promoted from
Incubator to Apache Top Level Project.
Both of them have a few graph algorithms implemented and also provide a very easy
API to implement new Graph algorithms.




        ** http://googleresearch.blogspot.in/2009/06/large-scale-graph-computing-at-google.html
Page Rank in Hama

           PageRank Algorithm assigns numerical
           weightage to each element of a hyperlinked set of
           documents

           .
           bin/hama jar ../hama-0.4.0-examples.jar pagerank
           <input path> <output path> [damping factor]
           [epsilon error] [tasks]


           Input                        Output

           Site1tSite2tSite3          Site1 0.5
           Site2tSite3                 Site2 1.3
           Site3                        Site3 1.2




 http://wiki.apache.org/hama/PageRank
What's next?
Deep dive into

       - Both Graph databases and frameworks with a Demo.
       - Bulk Syncronous Parallel procssing model.




Hadoop, Hive, Pig and others are too crowded. Graph Frameworks and
Databases are emerging and are an easy entry to contribute to in Apache.

Would suggest to subscribe/follow the mailing lists in Apache and try to get
familiar and contribute to them.
Q&A
Graph Processing Applications @ HUG

Contenu connexe

En vedette

Domain and range
Domain and rangeDomain and range
Domain and rangejeverson13
 
Graphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphXGraphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphXAndrea Iacono
 
Quantum Processes in Graph Computing
Quantum Processes in Graph ComputingQuantum Processes in Graph Computing
Quantum Processes in Graph ComputingMarko Rodriguez
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream DataDataWorks Summit
 
Cataloging of nonbook materials edited
Cataloging of nonbook materials editedCataloging of nonbook materials edited
Cataloging of nonbook materials editedIme Amor Mortel
 
Interpreting charts and graphs
Interpreting charts and graphsInterpreting charts and graphs
Interpreting charts and graphslesliejohnson441
 
Writing Objectives & Problem Statements
Writing Objectives & Problem StatementsWriting Objectives & Problem Statements
Writing Objectives & Problem StatementsMichael M Grant
 
Dictionary Skills
Dictionary SkillsDictionary Skills
Dictionary SkillsDilip Barad
 
Titan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataTitan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataMarko Rodriguez
 
Titan: Big Graph Data with Cassandra
Titan: Big Graph Data with CassandraTitan: Big Graph Data with Cassandra
Titan: Big Graph Data with CassandraMatthias Broecheler
 
17. Trees and Graphs
17. Trees and Graphs17. Trees and Graphs
17. Trees and GraphsIntro C# Book
 
Describing graphs
Describing graphsDescribing graphs
Describing graphsMeeri Sild
 
2014 Threat Detection Checklist: Six ways to tell a criminal from a customer
2014 Threat Detection Checklist: Six ways to tell a criminal from a customer2014 Threat Detection Checklist: Six ways to tell a criminal from a customer
2014 Threat Detection Checklist: Six ways to tell a criminal from a customerEMC
 
Writing research objectives
Writing research objectivesWriting research objectives
Writing research objectivesNursing Path
 

En vedette (15)

Domain and range
Domain and rangeDomain and range
Domain and range
 
Graphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphXGraphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphX
 
Quantum Processes in Graph Computing
Quantum Processes in Graph ComputingQuantum Processes in Graph Computing
Quantum Processes in Graph Computing
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream Data
 
Reading Graphs & Charts
Reading Graphs & ChartsReading Graphs & Charts
Reading Graphs & Charts
 
Cataloging of nonbook materials edited
Cataloging of nonbook materials editedCataloging of nonbook materials edited
Cataloging of nonbook materials edited
 
Interpreting charts and graphs
Interpreting charts and graphsInterpreting charts and graphs
Interpreting charts and graphs
 
Writing Objectives & Problem Statements
Writing Objectives & Problem StatementsWriting Objectives & Problem Statements
Writing Objectives & Problem Statements
 
Dictionary Skills
Dictionary SkillsDictionary Skills
Dictionary Skills
 
Titan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataTitan: The Rise of Big Graph Data
Titan: The Rise of Big Graph Data
 
Titan: Big Graph Data with Cassandra
Titan: Big Graph Data with CassandraTitan: Big Graph Data with Cassandra
Titan: Big Graph Data with Cassandra
 
17. Trees and Graphs
17. Trees and Graphs17. Trees and Graphs
17. Trees and Graphs
 
Describing graphs
Describing graphsDescribing graphs
Describing graphs
 
2014 Threat Detection Checklist: Six ways to tell a criminal from a customer
2014 Threat Detection Checklist: Six ways to tell a criminal from a customer2014 Threat Detection Checklist: Six ways to tell a criminal from a customer
2014 Threat Detection Checklist: Six ways to tell a criminal from a customer
 
Writing research objectives
Writing research objectivesWriting research objectives
Writing research objectives
 

Similaire à Graph Processing Applications @ HUG

Steven Davies - Design Portfolio
Steven Davies - Design PortfolioSteven Davies - Design Portfolio
Steven Davies - Design Portfoliosteverondavies
 
Increasing Social Media ROI Using Gladwell's Tipping Point Framework
Increasing Social Media ROI Using Gladwell's Tipping Point FrameworkIncreasing Social Media ROI Using Gladwell's Tipping Point Framework
Increasing Social Media ROI Using Gladwell's Tipping Point FrameworkColleen Carrington
 
L3 cmp technicalfile_180911 powerpoint
L3 cmp technicalfile_180911 powerpointL3 cmp technicalfile_180911 powerpoint
L3 cmp technicalfile_180911 powerpointMystifyingproductions
 
L3 cmp technicalfile_180911 powerpoint
L3 cmp technicalfile_180911 powerpointL3 cmp technicalfile_180911 powerpoint
L3 cmp technicalfile_180911 powerpointMystifyingproductions
 
Gremlin: A Graph-Based Programming Language
Gremlin: A Graph-Based Programming LanguageGremlin: A Graph-Based Programming Language
Gremlin: A Graph-Based Programming LanguageMarko Rodriguez
 
Data Driven Design Research Personas
Data Driven Design Research PersonasData Driven Design Research Personas
Data Driven Design Research PersonasTodd Zaki Warfel
 
Explainable AI - making ML and DL models more interpretable
Explainable AI - making ML and DL models more interpretableExplainable AI - making ML and DL models more interpretable
Explainable AI - making ML and DL models more interpretableAditya Bhattacharya
 
Folksonomies Indexing Und Retrieval In Bibliotheken
Folksonomies Indexing Und Retrieval In BibliothekenFolksonomies Indexing Und Retrieval In Bibliotheken
Folksonomies Indexing Und Retrieval In BibliothekenIsabella Peters
 
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...Marko Rodriguez
 
Semantic web user interfaces - Do they have to be ugly?
Semantic web user interfaces - Do they have to be ugly?Semantic web user interfaces - Do they have to be ugly?
Semantic web user interfaces - Do they have to be ugly?Andraz Tori
 
Improving decision-making based on government data and visualizations
Improving decision-making based on government data and visualizationsImproving decision-making based on government data and visualizations
Improving decision-making based on government data and visualizationsAlvaro Graves
 

Similaire à Graph Processing Applications @ HUG (12)

Steven Davies - Design Portfolio
Steven Davies - Design PortfolioSteven Davies - Design Portfolio
Steven Davies - Design Portfolio
 
Increasing Social Media ROI Using Gladwell's Tipping Point Framework
Increasing Social Media ROI Using Gladwell's Tipping Point FrameworkIncreasing Social Media ROI Using Gladwell's Tipping Point Framework
Increasing Social Media ROI Using Gladwell's Tipping Point Framework
 
Technical File Powerpoint
Technical File PowerpointTechnical File Powerpoint
Technical File Powerpoint
 
L3 cmp technicalfile_180911 powerpoint
L3 cmp technicalfile_180911 powerpointL3 cmp technicalfile_180911 powerpoint
L3 cmp technicalfile_180911 powerpoint
 
L3 cmp technicalfile_180911 powerpoint
L3 cmp technicalfile_180911 powerpointL3 cmp technicalfile_180911 powerpoint
L3 cmp technicalfile_180911 powerpoint
 
Gremlin: A Graph-Based Programming Language
Gremlin: A Graph-Based Programming LanguageGremlin: A Graph-Based Programming Language
Gremlin: A Graph-Based Programming Language
 
Data Driven Design Research Personas
Data Driven Design Research PersonasData Driven Design Research Personas
Data Driven Design Research Personas
 
Explainable AI - making ML and DL models more interpretable
Explainable AI - making ML and DL models more interpretableExplainable AI - making ML and DL models more interpretable
Explainable AI - making ML and DL models more interpretable
 
Folksonomies Indexing Und Retrieval In Bibliotheken
Folksonomies Indexing Und Retrieval In BibliothekenFolksonomies Indexing Und Retrieval In Bibliotheken
Folksonomies Indexing Und Retrieval In Bibliotheken
 
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
 
Semantic web user interfaces - Do they have to be ugly?
Semantic web user interfaces - Do they have to be ugly?Semantic web user interfaces - Do they have to be ugly?
Semantic web user interfaces - Do they have to be ugly?
 
Improving decision-making based on government data and visualizations
Improving decision-making based on government data and visualizationsImproving decision-making based on government data and visualizations
Improving decision-making based on government data and visualizations
 

Dernier

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 

Dernier (20)

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 

Graph Processing Applications @ HUG

  • 1. Graph Processing Applications praveensripati@gmail.com www.thecloudavenue.com @praveensripati
  • 2. Agenda Introduction to Graphs Representing graphs Different types of graphs Algorithms in graphs What constitutes a graph application Graph databases (examples and how they work) Graph computing engines (examples and how they work) Questions & Answers
  • 3. What are/aren't Graphs in this context? YES NO
  • 4. How is a graph represented? 4 1 2 3 6 Vertex 5 Edge A collection of vertices connected to each other using edges, with both vertices and edges having properties. A vertex can be a person, place, account or any item which needs to be tracked.
  • 5. W Sh hom n ds ? A social graph ee s ta ho l t ul o d f rie be I r 's fri eco run Deepak en m reA ds m h oa 4 wi en W th d ? Friend Relative Friend Friend Friend 1 2 3 Bob 6 Sheetal Name:Arun Tom Age : 25 Sex : M Friend Relation : Collegue Collegue Vertex 5 Edge Properties Prajval
  • 6. Facebook Recruiting Competition @ w The challenge is to recommend missing links in a social vie inter ok? network. Participants will be presented with an external t an cebo anonymized, directed social graph (no, not Facebook, keep an Fa guessing) from which some edges have been deleted, and W asked to make ranked predictions for each user in the test set of which other users they would want to follow. What is Kaggle? 4 Kaggle is an innovative solution for statistical/analytics outsourcing. We are the leading platform for predictive modeling competitions. Companies, governments and 1 2 3 6 researchers present datasets and problems - the world's best data scientists then compete to produce the best solutions. At the end of a competition, the competition host pays prize money in exchange for the intellectual property 5 behind the winning model. http://www.kaggle.com/c/FacebookRecruiting
  • 7. I th wou r tes t ho een ta? A spatial graph e pl ld l a s sh ce ike t he etw lcut or s, to t is e b Ca New Delhi te wh co st ic v ha tanc and pa h er W is re D alo 4 th is all g ? th e B an 450 km 600 km 250 km 350 km 450 km 1 2 3 Lucknow 6 Kolkotta Name:Bangalore Mumbai Populataion : 25,00,000 850 km Area : 35,000 SqKm Distance : 700 km Vertex 800 km 5 Edge Properties Chennai
  • 8. How to represent a Graph for computing? 3, 6 .... as an adjacency list for sparse graph 4 1 -> 2,4,5 2 -> 3 3 -> 5 2, 4, 5 3 5 4 -> 3.6 5 -> 1 2 3 6 6 -> 5 5 .... as an adjacency matrix for dense graph 1 2 3 4 5 6 5 1 0 1 0 1 1 0 2 0 0 1 0 0 0 A graph with few edges is sparse, many edges is dense. 3 0 0 0 0 1 0 4 0 0 1 0 0 0 5 0 0 0 0 0 0 Obviously, the web with billions of pages cannot be represented 6 0 0 0 0 1 0 as an adjaceny matrix.
  • 9. Different Graphs Social graph (Facebook, LinkedIn etc) Spacial graph (Google Maps, MapQuest, FedEx etc) Web graph (PageRank, Recomendations etc) Computer network graph (Optimal network layout etc) Financial graph (Fraud detection, Currency Flow etc) Data representations (Lists etc) Chemistry (to represent genomes/molucules) And others
  • 10. Some of the Graph Algorithms  Shortest path (Finding the shortest path from A to B)  Minimal Spanning Tree (Cheapest way to connect objects, so that each object is connected to another – can be used in internet, cable wiring etc)  Graph center (placing a warehouse, hospital in a city, so that all the locations can be reached easily)  Bipartite Matching (Matching in a dating site, job to employee and others)  Finding Planar Graph (as in the case of circuit designs). http://www.graph-magics.com/practic_use.php
  • 11. Graph Applications Applications Hama Giraph Graph Databases Graph processing frameworks
  • 12. How to store a Graph? Sim an ple, b de Option 1 : In a flat file as asy ut no to t effi ma cie 1- 4,5,6 inta nt in. 4- 2,5,6 Where vertex 1 is connected to vertex 4,5,6 and so on Option 2 : In a relational database using referencing tables or join tables. Option 3 : Using a specialized database designed only and only for graphs.
  • 13. Comparing Graph with Relational DB ld wou ring one r sto ich fer fo ata? Wh pre h d In a DB of 1,000,000 users finding friends-of-friends p y ou Gra for 1,000 users at various depths. Depth Execution Time – MySQL Execution Time –Neo4j 2 0.016 0.010 3 30.267 0.168 4 1,543.505 1.359 5 Not Finished in 1 Hour 2.132 http://www.neotechnology.com/2012/06/how-much-faster-is-a-graph-database-really/
  • 14. So, what is a Graph DB? A graph database is any storage system that provides `index free adjacency`. 3, 6 4 2, 4, 5 3 5 1 2 3 6 5 5 Every element (node or edge) has a direct pointer to it's adjacent element. No Index lookup : We can determine which vertex is adjacent wo which other vertex without lookup an index-tree.
  • 15. So, what is a Graph DB? (.....) n p tio s. th e o raph is g g h DB istin s rap per G en wh
  • 16. So, what is a Graph DB? (.....) Key Value Store like Amazon Dynamo. Data Size Columnar Databases like Cassandra, HBase. Document Databases like MongoDB, CouchDB.. Graph Databases like Neo4J ily m fa L Q oS N t he Data Complexity of rt Pa
  • 17. Graph DB Bindings (~JDBC API) //connect to the database //begin transaction Node firstNode; Node secondNode; Relationship relationship; firstNode = graphDb.createNode(); firstNode.setProperty( "message", "Hello, " ); secondNode = graphDb.createNode(); secondNode.setProperty( "message", "World!" ); relationship = firstNode.createRelationshipTo( secondNode, RelTypes.KNOWS ); relationship.setProperty( "message", "brave Neo4j " ); //end the transaction //close the connection to the database http://docs.neo4j.org/chunked/milestone/tutorials-java-embedded-hello-world.html
  • 18. Graph Adhoc Query (~SQL) START john=node:node_auto_index(name = 'John') MATCH john-[:friend]->()-[:friend]->fof RETURN john, fof john fof Node[4]{name:"John"} Node[2]{name:"Maria"} Node[4]{name:"John"} Node[3]{name:"Steve"} http://docs.neo4j.org/chunked/milestone/cypher-query-lang.html
  • 19. Different Graph Databases FlockDB from Twitter Allegrograph GraphBase From Objectivity http://en.wikipedia.org/wiki/Graph_database
  • 20. What is a Graph Computing Engine? Algorithms Graph Computing OutputFormat Engine Output Location Graph engines come with some built-in graph InputFormat processing algorithms, but also provide an easy to use Input Location API to build new algorithms and extend the framework. http://incubator.apache.org/giraph/apidocs/index.html http://incubator.apache.org/hama/docs/r0.3.0/api/index.html
  • 21. Different Graph Computing Engines Memory based graphs like (graph size < local machine ram) - jung.sourceforge.net - igraph.sourceforge.net - metworkx.lanl.gov Disk based graphs like (graph size < local hard disk size) - Neo4j - Infinite Graph – objectivity.com - sparsity-technologies.com/dex Cluster based graphs like (depends on the cluster specs) l - Apache Hama de mo l - Apache Giraph SP llel) ege B a r - GoldenORB d on Par le p se ous oog Ba ron f G h o y nc pirit l k S he s ( Bu in t
  • 22. Bulk Synchronous Parallel Some quick facts • An alternate computing model to MapReduce (Not all problems can be solved with MapReduce efficiently). Also, any MR algorithm can be simulated on BSP and vice versa. Developed by Leslie Valinat during the 1980s. Was resurrected by Google in the Pregel Paper (extensively used for PageRank) Good for - Processing big data with complicated relationships, eg., graph and networks. - Iterative and Recursive scientific computations - Continious Event Processing (CEP) http://googleresearch.blogspot.in/2009/06/large-scale-graph-computing-at-google.html http://arxiv.org/abs/1203.2081 – Comparing MR vs BSP
  • 23. What is Bulk Synchronous Parallel? Super Step 1 Super Step 2 Super Step 3 http://en.wikipedia.org/wiki/Bulk_synchronous_parallel/ http://blog.octo.com/en/introduction-to-large-scale-graph-processing/
  • 24. Hama vs Giraph Derived Derived Google Pregel ** Giraph Hama BSP BSP MapReduce HDFS ** http://googleresearch.blogspot.in/2009/06/large-scale-graph-computing-at-google.html
  • 25. Hama vs Giraph (.....) Hama Giraph Pure BSP engine. Uses BSP, but BSP API is not exposed. Matrix, Graph, Network and other Just for Graph processing. procesing. Jobs are run as a BSP Job on HDFS. Jobs as run as MapReduce on Hadoop. Both of them are derived from on `Pregel : A System for Large-Scale Graph Processing` paper published by Google. Both have been recently promoted from Incubator to Apache Top Level Project. Both of them have a few graph algorithms implemented and also provide a very easy API to implement new Graph algorithms. ** http://googleresearch.blogspot.in/2009/06/large-scale-graph-computing-at-google.html
  • 26. Page Rank in Hama PageRank Algorithm assigns numerical weightage to each element of a hyperlinked set of documents . bin/hama jar ../hama-0.4.0-examples.jar pagerank <input path> <output path> [damping factor] [epsilon error] [tasks] Input Output Site1tSite2tSite3 Site1 0.5 Site2tSite3 Site2 1.3 Site3 Site3 1.2 http://wiki.apache.org/hama/PageRank
  • 27. What's next? Deep dive into - Both Graph databases and frameworks with a Demo. - Bulk Syncronous Parallel procssing model. Hadoop, Hive, Pig and others are too crowded. Graph Frameworks and Databases are emerging and are an easy entry to contribute to in Apache. Would suggest to subscribe/follow the mailing lists in Apache and try to get familiar and contribute to them.
  • 28. Q&A