9. MapReduce : Finding Triangle
Problem: Enumerating 3-cycle sub graph from given
graph
10. MapReduce : Finding Triangle
• In the first map operation for enumerating triangles, the mapper records
each edge under the vertex with the lowest degree.
• The incoming records’ key doesn’t matter.
12. MapReduce : Finding Triangle
• The second map for enumerating triangles brings together the edge
and open triad records.
• In the process, it rekeys the edge records so that both record types
are binned under the vertices they connect.
13. • In the second reduce, each bin contains at most one edge record and some
number of triad records (perhaps none).
• For every combination of edge record and triad record in a bin, the reduce
emits a triangle record. The output key isn’t significant.
MapReduce : Finding Triangle
14. MapReduce : Finding Connected
Components
Problem: Finding Connected Components for given
Graph
3
1 2 5
6
4
22. Apache Giraph
Graph Processing System
• In-memory Computation
• Inspired by Google Pregel
• Vertex-Centic High-level programming model
• Batch oriented processing
• Based on Valient's Bulk Synchronization Parallel
Model
24. • Each vertex has
• Vertex-Identifier
• Variable
• Each directed edge has
• Source Vertex identifier
• Target Vertex identifier
• Variable
• Computation consists of,
• Input
• Supersteps separated by global synchronization points
• Algorithm termination
• Output
• Each vertex compute in parallel with same user defined
function.
Apache Giraph Model
34. Graph Databases
• A database which follows graph structure
• Each node knows its adjacent nodes
• As the number of nodes increases, the cost of local
step remains the same
• Index for lookups
• Optimized for traversing connected data
35. Graph Databases: Model
Key1 : Value 1
Key2 : Value 2
Key1 : Value 1
Key2 : Value 2
Key1 : Value 1
Key2 : Value 2
Key1 : Value 1
Key2 : Value 2
Key1 : Value 1
Key2 : Value 2
37. Neo4j
• Graph database from Neo Technology
• A schema-free labeled Property Graph Database +
Lucene Index
• Perfect for complex, highly connected data
• Reliable with real ACID Transactions
• Scalable: Billions of Nodes and Relationships, Scale
out with highly available Neo4j Cluster
• Server with REST API or Embeddable
• Declarative Query Language (Cypher)
38. Neo4j: Strengths & Weakness
Strengths
• Powerful data model
• Whiteboard friendly
• Fast for connected data
• Easy to query
Weakness
• Requires Conceptual Shift (Graph like thinking)
39. Four Building Blocks
• Nodes
• Relationships
• Properties
• Labels
(:USER)
[:RELATIVE
] (:PET)
Name: Mike
Animal: Dog
Name: Apple
Age: 25
Relation: Owner
40. 40Serendio Proprietary and Confidential
SQL to Graph DB: Data Model
Transformation
SQL Graph DB
Table Type of Node(Labels)
Rows of Table Nodes
Columns of Table Node-Properties
Foreign-key, Joins Relationships
41. SQL to Graph DB: Data Model
Transformation
Name Movies
Language
Rajnikant Tamil
Maheshbabu Telugu
Vijay Tamil
Prabhas Telugu
Name Lead Actor
Bahubali Prabhas
Puli Vijay
Shrimanthu
du
Maheshbabu
Robot Rajnikant
Table: Actor
Table: Movie
ACTOR
MOVIE
ACTOR
MOVIE
Name Prabhas
Movie
Language
Telugu
Name Rajnikant
Movie
Language
Tamil
Name Bahubali
Name Robot
LEAD_ACTOR
LEAD_ACTOR
42. How to query Graph Database?
• Graph Query Language
– Cypher
– Gremlin
43. How to query Graph Database?
• Graph Query Language
– Cypher
– Gremlin
44. Cypher Query Language
• Declarative
• SQL-inspired
• Pattern based
Ramesh Suresh
FRIEND
(Ramesh:PERSON) - [connect:FRIEND] -> (Orange:PERSON)
45. Cypher: Getting Started
Structure:
• Similar to SQL
• Most common clauses:
– MATCH: the graph pattern for matching
– WHERE: add constrains or filter
– RETURN: what to return
46. CRUD Operations
MATCH:
• MATCH (n) RETURN n
• MATCH (movie:Movie) RETURN movie
• MATCH (movie:Movie { title: 'Bahubali' }) RETURN movie
• MATCH (director { name:'Rajamouli' })--(movie) RETURN movie.title
• MATCH (raj:Person { name:'Rajamouli'})--(movie:Movie) RETURN movie
• MATCH (raj:Person { name:'Rajamouli'})-->(movie:Movie) RETURN movie
• MATCH (raj:Person { name:'Rajamouli'})<--(movie:Movie) RETURN movie
• MATCH (raj:Person { name:'Rajamouli'})-[:DIRECTED]->(movie:Movie) RETURN
movie
49. CRUD Operations
CREATE:
Node:
• CREATE (n)
• CREATE (n),(m)
• CREATE (n:Person)
• CREATE (n:Person:Swedish)
• CREATE (n:Person { name : 'Andres', title : 'Developer' })
• CREATE (a:Person { name : 'Roman' }) RETURN a
50. CRUD Operations
CREATE:
Relationships:
• MATCH (a:Person),(b:Person)
WHERE a.name = 'Roman' AND b.name = 'Andres'
CREATE (a)-[r:RELTYPE]->(b)
RETURN r
• MATCH (a:Person),(b:Person)
WHERE a.name = 'Roman' AND b.name = 'Andres'
CREATE (a)-[r:RELTYPE { name : a.name + '<->' + b.name }]->(b)
RETURN r
52. CRUD Operations
UPDATE:
Properties:
• MATCH (n:Person { name : 'Andres' }) SET n :Person:Coder
• MATCH (n:Person { name : 'Andres', title : 'Developer' }) SET
n.title = 'Mang'
53. CRUD Operations
DELETE:
• MATCH (n:Person)
WHERE n.name = 'Andres'
DELETE n
• MATCH (n { name: 'Andres' })-[r]-()
DELETE n, r
• MATCH (n:Person)
DELETE n
• MATCH (n)
OPTIONAL MATCH (n)-[r]-()
DELETE n,r
54. Functions
Predicates:
• ALL(identifier in collection WHERE predicate)
• ANY(identifier in collection WHERE predicate)
• NONE(identifier in collection WHERE predicate)
• SINGLE(identifier in collection WHERE predicate)
• EXISTS( pattern-or-property )
Scalar Function:
• LENGTH( collection/pattern expression )
• TYPE( relationship )
• ID( property-container )
• COALESCE( expression [, expression]* )
• HEAD( expression )
• LAST( expression )
• TIMESTAMP()
56. Use Case: Movie Recommendation*
Problem:
• We are running IMDB type website.
• We have dataset which contains movie rating done by users.
• Our problem is to generate list of movies which will be
recommended to individual users.
*http://neo4j.com/graphgist/a7c915c8-a3d6-43b9-8127-1836fecc6e2f
58. Use Case: Movie Recommendation
Solution:
• We will find the people who has given similar rating to the
movies watch by both of them.
• After that we will recommend movies which one has not seen
and other has rated high.
• Cosine Similarity function to calculate similarity between
users.
• k-Nearest Neighbors for finding similar users
60. Use Case: Movie Recommendation
Query:Add Cosine Similarity
MATCH (p1:Person)-[x:RATED]->(m:Movie)<-[y:RATED]-(p2:Person)
WITH SUM(x.rating * y.rating) AS xyDotProduct,
SQRT(REDUCE(xDot = 0.0, a IN COLLECT(x.rating) | xDot + a^2)) AS
xLength,
SQRT(REDUCE(yDot = 0.0, b IN COLLECT(y.rating) | yDot + b^2)) AS
yLength,
p1, p2
MERGE (p1)-[s:SIMILARITY]-(p2)
SET s.similarity = xyDotProduct / (xLength * yLength)
62. Use Case: Movie Recommendation
Query: See who is your neighbor in
similarity
MATCH (p1:Person {name:'Michael Sherman'})-[s:SIMILARITY]-(p2:Person)
WITH p2, s.similarity AS sim
ORDER BY sim DESC
LIMIT 5
RETURN p2.name AS Neighbor, sim AS Similarity
63. Use Case: Movie Recommendation (Conti..)
Query: Recommendation Finally
MATCH (b:Person)-[r:RATED]->(m:Movie), (b)-[s:SIMILARITY]-(a:Person
{name:'Michael Sherman'})
WHERE NOT((a)-[:RATED]->(m))
WITH m, s.similarity AS similarity, r.rating AS rating ORDER BY m.name,
similarity DESC
WITH m.name AS movie, COLLECT(rating)[0..3] AS ratings
WITH movie, REDUCE(s = 0, i IN ratings | s + i)*1.0 / LENGTH(ratings) AS reco
ORDER BY reco DESC
RETURN movie AS Movie, reco AS Recommendation
67. Conclusion
• The graph is important data model to represent lot of real
world scenarios as connected object provide more
information that isolated objects.
• The de-facto big data technologies are inefficient for solving
large scale graph problems.
• The technologies, designed to solve large scale graph
problems in real time as well as offline are available.
• These graph technologies are matured enough to use in
production.