SlideShare une entreprise Scribd logo
1  sur  85
| © Copyright 2015 Hitachi Consulting1
Graph Analytics
Basic Theory and Applications
Khalid M. Salama, Ph.D.
Business Insights & Analytics
Hitachi Consulting UK
We Make it Happen. Better.
| © Copyright 2015 Hitachi Consulting2
Outline
 Overview on Graphs
 Path Analytics
 Connectivity Analytics
 Community Analytics
 Centrality Analytics
 Pattern Matching
 Parallel Programming Model for Graphs
 Applied Graph Analytics
 Useful Resources
| © Copyright 2015 Hitachi Consulting3
Introduction
Graph Analytics - “Built on the mathematics of graph theory, graph analytics
help to understand, codify, and visualize relationships that exist between
objects in a given domain context, in order to uncover insights about the
structures and patterns of the objects relationships.”
Graph Databases – “A NoSQL family of data stores that is optimized to
store, model, and process data in a graphical form, as well as answering
graph-related queries efficiently.”
Graph Analytics and Databases
| © Copyright 2015 Hitachi Consulting4
Graphs Overview
| © Copyright 2015 Hitachi Consulting5
What is NOT a Graph?
Basic Concepts
These are NOT graphs!
These are charts!
| © Copyright 2015 Hitachi Consulting6
What is a Graph?
Basic Concepts
In computing, a graph is abstract data structure that represents set
objects and their relationships as vertices and edges, and supports a
number of graph-related operations
| © Copyright 2015 Hitachi Consulting7
What is a Graph?
Basic Concepts
In computing, a graph is abstract data structure that represents set
objects and their relationships as vertices and edges, and supports a
number of graph-related operations
A
B C
D
 Objects (nodes): {A, B, C, D}
| © Copyright 2015 Hitachi Consulting8
What is a Graph?
Basic Concepts
In computing, a graph is abstract data structure that represents set
objects and their relationships as vertices and edges, and supports a
number of graph-related operations
A
B C
D
 Objects (nodes): {A, B, C, D}
 Relationships (edges): {(D,B),(D,A),(B,C),(B,A),(C,A)}
| © Copyright 2015 Hitachi Consulting9
What is a Graph?
Basic Concepts
In computing, a graph is abstract data structure that represents set
objects and their relationships as vertices and edges, and supports a
number of graph-related operations
A
B C
D
 Objects (nodes): {A, B, C, D}
 Relationships (edges): {(D,B),(D,A),(B,C),(B,A),(C,A)}
 Operation: shortest path between D and A
| © Copyright 2015 Hitachi Consulting10
What is a Graph?
Graph operation examples
 graph.GetNodes(<condition>)
 graph.GetEdges(<condition>)
 graph.AddNode(node)
 graph.AddEdge(node1,node2)
 graph.AddEdge(edge)
 graph.RemoveNode(node)
 graph.GetShortestPath(node1,node2)
 graph.Neighbours(node,level)
 graph.GetDistance(node1,node2)
 node.GetParents()
 node.GetChildren()
 node.GetAncestors(level)
 node.GetDescendants(level)
 node.IsAncestorTo(node2)
 node.IsDescendant(node2)
 node.AddParent(parentNode)
 node.AddChild(childeNode)
 node.IsReachable(node2)
| © Copyright 2015 Hitachi Consulting11
What is a Graph?
Real-world examples…
Social Media –
Twitter
Biology –
Biological Entities
Networks
Geo IS –
Smart Cities
Reasoning –
Predictive
Maintenance
 Identify groups (communities) and group interactions
 Find influencers in community
 Extract topic interests
 Discover unknown relationships (gene/ protein to
disease, disease to disease, cure to disease, etc.)
 Exploratory Data Analysis & anomaly detection
 Coverage analysis
 Traffic flow, congestion estimation, routing
 Failure Impact analysis
 Predict the next state given the current (and previous
state(s))
 Compute the probability of sequence of event
| © Copyright 2015 Hitachi Consulting12
Why Graphs?
Importance of graph data structures
Efficient Data ProcessingIntuitive Representation Efficient Query/Analytics
Suitable for Relation/Interaction-Intensive Data Domains
| © Copyright 2015 Hitachi Consulting13
Graph Types
Directed Graphs
Directed Acyclic
Graphs
Undirected Graphs
A
B C
A
B C
A
B C
Dependency networksState-transition models Connectivity networks
Directionality and circulation
| © Copyright 2015 Hitachi Consulting14
Simple Graph Representation
Adjacency Matrix
A
B C
| © Copyright 2015 Hitachi Consulting15
Simple Graph Representation
Adjacency Matrix
A
B C
A B C
A 0 0 0
B 1 1 1
C 1 1 0
From
To
| © Copyright 2015 Hitachi Consulting16
Simple Graph Representation
Adjacency Matrix
A
B C
A
B C
A B C
A 0 0 0
B 1 1 1
C 1 1 0
From
To
A B C
A 0 0 0
B 3 4 2
C 1 5 0
From
1
2
3
4
5
Weighted DG
| © Copyright 2015 Hitachi Consulting17
Simple Graph Representation
Edge Table
A
B C
FROM TO WEIGHT
B A ..
B C ..
B B ..
C B ..
C A ..
Useful in Relational Databases
| © Copyright 2015 Hitachi Consulting18
Simple Graph Representation
Adjacency list
A
B C
Node IN OUT
A B,C -
B B,C A,C
C B A,B
Useful in MapReduce
| © Copyright 2015 Hitachi Consulting19
Label Property Graph Model
Defining information-rich graphs
In a simple model, a graph consist of:
 A set of vertices (nodes)
 A set of edges (each connecting two nodes)
| © Copyright 2015 Hitachi Consulting20
Label Property Graph Model
Defining information-rich graphs
In a simple model, a graph consist of:
 A set of vertices (nodes)
 A set of edges (each connecting two nodes)
In the Label Property Graph Model, each element (vertex/edge) has:
 Unique Identifier
 Class (label)
 A set of Key/Value pairs (properties)
| © Copyright 2015 Hitachi Consulting21
Label Property Graph Model
Defining information-rich graphs
A
Person
In a simple model, a graph consist of:
 A set of vertices (nodes)
 A set of edges (each connecting two nodes)
In the Label Property Graph Model, each element (vertex/edge) has:
 Unique Identifier
 Class (label)
 A set of Key/Value pairs (properties)
| © Copyright 2015 Hitachi Consulting22
 Id: A
 Label: Person
 Name: Khalid Salama
 Age: 31
 Profession: Consultant
Label Property Graph Model
Defining information-rich graphs
A
Person
In a simple model, a graph consist of:
 A set of vertices (nodes)
 A set of edges (each connecting two nodes)
In the Label Property Graph Model, each element (vertex/edge) has:
 Unique Identifier
 Class (label)
 A set of Key/Value pairs (properties)
| © Copyright 2015 Hitachi Consulting23
 Id: B
 Label: Post
 Title: Graph Databases
 Tages: [Big Data, NoSQL, Analytics]
 Id: A
 Label: Person
 Name: Khalid Salama
 Age: 31
 Profession: Consultant
Label Property Graph Model
Defining information-rich graphs
A B
Person Post
Posted
In a simple model, a graph consist of:
 A set of vertices (nodes)
 A set of edges (each connecting two nodes)
In the Label Property Graph Model, each element (vertex/edge) has:
 Unique Identifier
 Class (label)
 A set of Key/Value pairs (properties)
| © Copyright 2015 Hitachi Consulting24
 Id: X (A,B)
 Label: Posted
 Datetime: 10-10-2016
 Id: B
 Label: Post
 Title: Graph Databases
 Tages: [Big Data, NoSQL, Analytics]
 Id: A
 Label: Person
 Name: Khalid Salama
 Age: 31
 Profession: Consultant
Label Property Graph Model
Defining information-rich graphs
A B
Person Post
Posted
In a simple model, a graph consist of:
 A set of vertices (nodes)
 A set of edges (each connecting two nodes)
In the Label Property Graph Model, each element (vertex/edge) has:
 Unique Identifier
 Class (label)
 A set of Key/Value pairs (properties)
| © Copyright 2015 Hitachi Consulting25
 Id: X (A,B)
 Label: Posted
 Datetime: 10-10-2016
 Id: C
 Label: Person
 Name: Dishan
 Id: B
 Label: Post
 Title: Graph Databases
 Tages: [Big Data, NoSQL, Analytics]
 Id: A
 Label: Person
 Name: Khalid Salama
 Age: 31
 Profession: Consultant
Label Property Graph Model
Defining information-rich graphs
A B
C
Person
Person
Post
LikesFollows
Posted
In a simple model, a graph consist of:
 A set of vertices (nodes)
 A set of edges (each connecting two nodes)
In the Label Property Graph Model, each element (vertex/edge) has:
 Unique Identifier
 Class (label)
 A set of Key/Value pairs (properties)
| © Copyright 2015 Hitachi Consulting26
Types of Graphs Analytics
| © Copyright 2015 Hitachi Consulting27
Types of Graph Analytics
Relationships Analytics
Path Analytics &
Traversing
Connectivity
Analytical
Community
Analytics
Centrality Analytics Pattern Matching
| © Copyright 2015 Hitachi Consulting28
Connectivity Analytics
| © Copyright 2015 Hitachi Consulting29
Connectivity Analytics
Graph structural analysis
How big is the graph?
Number of
Vertices
Number of
Edges
Degree
Distribution
Volume – Number of edges increases quadraticly with respect to number of nodes
Velocity – How frequent a new vertex or edge is added to the graph
Degree
 In-degree of a vertex: number of edges pointing to the vertex (parents)
 Out-degree of a vertex: number of edges point out of the vertex (children)
 Degree of a vertex: number of neighbour of a vertex in an undirected graph
| © Copyright 2015 Hitachi Consulting30
Connectivity Analytics
Graph structural analysis
Degree Histogram – describes the skewness of the degree distribution in a graph
0
50
100
150
200
0 5 10 15
NUMBER
OFVVERTICES
DEGREE OF A VERTEX
0
50
100
150
200
0 5 10 15
NUMBER
OFVVERTICES
DEGREE OF A VERTEX
Exponentially unlikely to find a vertex with
increased degree
In some case, it is more likely to find more
vertices with high number of edges
0
50
100
150
0 2 4 6 8 10 12
NUMBER
OFVVERTICES
DEGREE OF A VERTEX
Or it can be multi-modal
| © Copyright 2015 Hitachi Consulting31
Connectivity Analytics
Graph structural analysis
Degree Histogram – Random vs Natural Graphs
1
10
100
1000
10000
0 5 10 15
LOG-NUMBER
OFVERTICES
DEGREE OF A VERTEX
In random graphs, exponentially unlikely to
find a vertex with increased degree
A vertex with higher degree (more connections) is more likely to get a
new edge, compared to less connected vertices – Social Networks
1
10
100
1000
10000
0 5 10 15
LOG-NUMBER
OFVERTICES
DEGREE OF A VERTEX
In some case, it is more likely to find more
vertices with high number of edges
Exponential Distribution ZipF Distribution
| © Copyright 2015 Hitachi Consulting32
Connectivity Analytics
Graph structural analysis
 Highly connected nodes – nodes with high In/Out-Degree.
 Graph Robustness – how easy to break the graph by removing a few nodes/edges (Built-in Redundancy)
 Connectivity Coefficient: minimum number of nodes you need to remove to disconnect a graph (E.g. node B)
- Useful in network fragility analysis and social media advertising
 Connectivity: X is reachable from Y OR Y is reachable from X
 Strong Connectivity: X is reachable from Y AND Y is reachable from X
- High degree nodes make the network more vulnerable.
 Graph Comparison – how similar graph G1 to G2?
 Number of nodes
 Number of edges
 Ratio of Nodes to Edges
 In/Out Degree Histogram
 Connectivity Coefficient
C
E
F
D
B
G
A
| © Copyright 2015 Hitachi Consulting33
Connectivity Analytics
Graph structural analysis
 Fully connected graph: Each node has edges to all the other
nodes (usually undirected graph)
 Can we find subgraphs, in a given graphs, that are fully connected?
(Cliques)
A
D
B
E
C
| © Copyright 2015 Hitachi Consulting34
 Fully connected graph: Each node has edges to all the other
nodes
 Terminal node: A node with no outgoing edges
A
D
B
E
C
Connectivity Analytics
Graph structural analysis
| © Copyright 2015 Hitachi Consulting35
 Fully connected graph: Each node has edges to all the other
nodes
 Terminal node: A node with no outgoing edges
 Unreachable node: A node no ingoing edges
A
D
B
E
C
Connectivity Analytics
Graph structural analysis
| © Copyright 2015 Hitachi Consulting36
 Fully connected graph: Each node has edges to all the other
nodes
 Terminal node: A node with no outgoing edges
 Unreachable node: A node no ingoing edges
 Hub vs. Authorities: High In-degree vs High Out-degree
A is a hub node, C is an authority node
E.g.: Social Networks: Talkers vs. Listener
E.g.: Web structure
A
D
B
E
C
Connectivity Analytics
Graph structural analysis
| © Copyright 2015 Hitachi Consulting37
Path Analytics
| © Copyright 2015 Hitachi Consulting38
Path Analytics & Graph Traversing
Concepts and operations
 Path: A set of (ordered) edges between node x and node y
C  B  E B  A A
D
B
E
C
| © Copyright 2015 Hitachi Consulting39
Path Analytics & Graph Traversing
Concepts and operations
 Path: A set of (ordered) edges between node x and node y
 Cycle: A path where the start and the end nodes are the same
ECBAE
A
D
B
E
C
| © Copyright 2015 Hitachi Consulting40
Path Analytics & Graph Traversing
Concepts and operations
 Path: A set of (ordered) edges between node x and node y
 Cycle: A path where the start and the end nodes are the same
 Trail: A path with no repeated edges
ECBEDA
A
D
B
E
C
| © Copyright 2015 Hitachi Consulting41
Path Analytics & Graph Traversing
Concepts and operations
 Path: A set of (ordered) edges between node x and node y
 Cycle: A path where the start and the end nodes are the same
 Trail: A path with no repeated edges
 Tour: A cycle traversing all the nodes, only once.
DAECBD
A
D
B
E
C
| © Copyright 2015 Hitachi Consulting42
Path Analytics & Graph Traversing
Concepts and operations
 Path: A set of (ordered) edges between node x and node y
 Cycle: A path where the start and the end nodes are the same
 Trail: A path with no repeated edges
 Tour: A cycle traversing all the nodes, only once.
 Reachability: Can we reach node D from node C?
A
D
B
E
C
| © Copyright 2015 Hitachi Consulting43
Path Analytics & Graph Traversing
Concepts and operations
 Path: A set of (ordered) edges between node x and node y
 Cycle: A path where the start and the end nodes are the same
 Trail: A path with no repeated edges
 Tour: A cycle traversing all the nodes, only once.
 Reachability: Can we reach node D from node C?
 Shortest path: minimum steps (edges) between two nodes
 Breadth-First Search
 Dijkstra's algorithm
A
D
B
E
C
| © Copyright 2015 Hitachi Consulting44
Path Analytics & Graph Traversing
Concepts and operations
 Path: A set of (ordered) edges between node x and node y
 Cycle: A path where the start and the end nodes are the same
 Trail: A path with no repeated edges
 Tour: A cycle traversing all the nodes, only once.
 Reachability: Can we reach node D from node C?
 Shortest path: minimum steps (edges) between two nodes
 Breadth-First Search
 Dijkstra's algorithm
 Best path (weighted graph): path that minimize total weight
 Optimize a given function
 Satisfy given constrains
A
D
B
E
C
10
20
3
10
5
8
6
4
| © Copyright 2015 Hitachi Consulting45
Path Analytics & Graph Traversing
Concepts and operations
 Path: A set of (ordered) edges between node x and node y
 Cycle: A path where the start and the end nodes are the same
 Trail: A path with no repeated edges
 Tour: A cycle traversing all the nodes, only once.
 Reachability: Can we reach node D from node C?
 Shortest path: minimum steps (edges) between two nodes
 Breadth-First Search
 Dijkstra's algorithm
 Best path (weighted graph): path that minimize total weight
 Optimize a given function
 Satisfy given constrains
 Graph Diameter: The longest “shortest path” between two
(reachable) nodes (Distance Matrix) – Structural Analysis
A B C D E
A - 8 ∞ 10 5
B 8 - 12 13 ∞
C 11 - 20 ∞
D 4 9 16 - 10
E 7 ∞ ∞ 5 -
Distance Matrix
(Shortest Path Paris)
In this example (Directed Graph),
the Graph Diameter is 20,
which is the longest shortest path
(that is the one from C to D)
| © Copyright 2015 Hitachi Consulting46
Path Analytics & Graph Traversing
Concepts and operations
 Path: A set of (ordered) edges between node x and node y
 Cycle: A path where the start and the end nodes are the same
 Trail: A path with no repeated edges
 Tour: A cycle traversing all the nodes, only once.
 Reachability: Can we reach node D from node C?
 Shortest path: minimum steps (edges) between two nodes
 Breadth-First Search
 Dijkstra's algorithm
 Best path (weighted graph): path that minimize total weight
 Optimize a given function
 Satisfy given constrains
 Graph Diameter: The longest “shortest path” between two
(reachable) nodes (Distance Matrix) – Structural Analysis
 Minimum Spanning Trees: edges that connect all the nodes with
no cycles and minimum weight.
A
D
B
E
C
10
20
3
10
5
80
6
40
| © Copyright 2015 Hitachi Consulting47
Community Analytics
| © Copyright 2015 Hitachi Consulting48
Community Analytics
Graph Clustering/Partitioning
A dense subgraph (cluster) with in a graph, in which its nodes are more connected
with a cluster than to the nodes outside the cluster
 Cohesion – Connectivity “within” the cluster is high
 Separation – Connectivity “between” clusters is low
Analytical Questions
 Static – Discover community
 Static – Describe interaction with a community
 Static – Describe interaction between communities
 Temporal – How a community emerged/dissolved?
 Temporal – Which communities are stable
 Temporal – Predict of a node will migrate to another community?
| © Copyright 2015 Hitachi Consulting49
Community Analytics
Graph Clustering/Partitioning
Finding Communities
Global PropertiesLocal Properties
 n-Clique (distance): largest subgraph that the maximum
distance between each two nodes is <= n
 n-Clans (distance): an n-clique in which the largest
distance between nodes in the subgraph is <=n
 k-Core (density): largest subgraph that each nodes is
connected to at least k-nodes within the sub graph
Modularity –
 The fraction of the edges that fall within the given
subgraph minus the expected such fraction if edges were
distributed at random
 Reflects the concentration of edges within subgraph
compared with random distribution of edges between
all nodes regardless of subgraphs.
| © Copyright 2015 Hitachi Consulting50
Centrality Analytics
| © Copyright 2015 Hitachi Consulting51
Centrality Analytics
Vertex Importance Analysis
Network Centralization (graph-level measure) – Measure of degree of variation of centrality
score amongst the nodes of the network
Connectivity Importance
Closeness Importance
Betweeness Importance
Vulnerability
 Average length of all its shortest paths, compared to the averages of
the other vertices (using Distance Matrix)
- I.e., From vertex X, you can reach most of the other vertices quicker
 Simply, the degree of node X (in and out degrees).
- I.e., the queen bees in a community (used for target marketing)
 The fraction of the shortest paths that X appears in.
- I.e., if x is important, then most of the (shortest) paths between any two
vertices in a graph pass through x (important underground station).
 Vertex X belongs to the minimum node set that, if removed from the
graph, the graph is disconnected.
- Or, its removal will cause a high disruption in the network
| © Copyright 2015 Hitachi Consulting52
Centrality Analytics
Vertex Importance Analysis
Page Rank - The importance (rank) of a vertex is computed as the total rank of
all its adjacent edges (a.k.a Eigenvector Centrality).
 I.e., the importance of a given vertex is not only how well-connected it is,
it is also how well-connected its neighbours are.
 Including a damping factor: the further the you go away the vertex, the less
important it is on the rank of the vertex
PageRank can be interpreted as the probability to visit a page…
| © Copyright 2015 Hitachi Consulting53
Pattern Matching
| © Copyright 2015 Hitachi Consulting54
Pattern Matching
Graph Query
 Find the following patterns in a given graph
 Find the following patter in a given Property Model graph
X Y X
Y Z
A
B X
C
W X
ZY
SIBLING
SIBLING
MARRIEDMARRIED
MAN
WOMAN
WOMAN
MAN
W X Y
Z
DRUG DRRUG GENE
DISEASE
INTERFERES
REGULATES
ASSOCIATED
| © Copyright 2015 Hitachi Consulting55
Pattern Matching
Applications
Banking – Fraud Detection
Security – Threat Detection
Bioinformatics & Biochemistry – Association Analysis
Social Networks – Job/Candidate suggestion
GPS & Smart Cities – Traffic/Accident Analysis
Telecom – Targeted Campaigning
| © Copyright 2015 Hitachi Consulting56
Parallel Programming Model
for Graphs
| © Copyright 2015 Hitachi Consulting57
Parallel Programming Model for Graphs
Graph Processing
Communication Parallelism Type
Shared
Memory
Message
Passing Task Data
Distributed ComputingParallel Processing
Big Data ProcessingHigh Performance Computing
| © Copyright 2015 Hitachi Consulting58
Parallel Programming Model for Graphs
Graph Processing
 Data Parallelism – Each compute node has a subset of graph vertex.
 Message Passing – A vertex can communicate (send/receive a message) to a vertex
(in another compute node) if it has an outgoing edge to.
 Processing of vertices is performed in parallel – E.g., Bulk Synchronous Parallelism
(BSP)
| © Copyright 2015 Hitachi Consulting59
Parallel Programming Model for Graphs
Graph Processing
 Data Parallelism – Each compute node has a subset of graph vertex.
 Message Passing – A vertex can communicate (send/receive a message) to a vertex
(in another compute node) if it has an outgoing edge to.
 Processing of vertices can is performed in parallel – E.g., Bulk Synchronous
Parallelism (BSP)
A
B
C
D
E
F
G
H
Compute Node 1
Compute Node 2
Compute Node 3
Compute Node 4
E.g.: Find the shortest path
between A, H, in parallel
5
3
4
2
1
3
5
4
3
1
| © Copyright 2015 Hitachi Consulting60
Parallel Programming Model for Graphs
Graph Processing
Pregel - A System for Large Scale Graph Processing
 Published by Google
 Based on Bulk Synchronous Parallelism
 Receive Messages from parent Nodes
 Compute
 Send Messages to Child Nodes
 Pause & Synchronize
 Example Application: PageRank
Graph Processing Tools:
Giraph – HDFS, MapReduce, YARN (JAVA)
GraphX – Spark, RDDs (Scala)
}Superstep
| © Copyright 2015 Hitachi Consulting61
Parallel Programming Model for Graphs
Graph Processing - GraphX
class Graph[VD, ED] {
// Information about the Graph ===========================================
val numEdges: Long
val numVertices: Long
val inDegrees: VertexRDD[Int]
val outDegrees: VertexRDD[Int]
val degrees: VertexRDD[Int]
// Views of the graph as collections =====================================
val vertices: VertexRDD[VD]
val edges: EdgeRDD[ED]
val triplets: RDD[EdgeTriplet[VD, ED]]
// Functions for caching graphs ====================================
def persist(newLevel: StorageLevel = StorageLevel.MEMORY_ONLY): Graph[VD, ED]
def cache(): Graph[VD, ED]
def unpersistVertices(blocking: Boolean = true): Graph[VD, ED]
// Change the partitioning heuristic =====================================
def partitionBy(partitionStrategy: PartitionStrategy): Graph[VD, ED]
// Transform vertex and edge attributes ======================================
def mapVertices[VD2](map: (VertexID, VD) => VD2): Graph[VD2, ED]
def mapEdges[ED2](map: Edge[ED] => ED2): Graph[VD, ED2]
def mapEdges[ED2](map: (PartitionID, Iterator[Edge[ED]]) => Iterator[ED2]): Graph[VD, ED2]
def mapTriplets[ED2](map: EdgeTriplet[VD, ED] => ED2): Graph[VD, ED2]
def mapTriplets[ED2](map: (PartitionID, Iterator[EdgeTriplet[VD, ED]]) => Iterator[ED2])
: Graph[VD, ED2]
def reverse: Graph[VD, ED]
def subgraph(epred: EdgeTriplet[VD,ED] => Boolean = (x => true),
vpred: (VertexID, VD) => Boolean = ((v, d) => true)): Graph[VD, ED]
// Modify the graph structure ===========================================================
def mask[VD2, ED2](other: Graph[VD2, ED2]): Graph[VD, ED]
def groupEdges(merge: (ED, ED) => ED): Graph[VD, ED]
// Join RDDs with the graph ============================================================
def joinVertices[U](table: RDD[(VertexID, U)])(mapFunc: (VertexID, VD, U) => VD)
: Graph[VD, ED]
def outerJoinVertices[U, VD2](other: RDD[(VertexID, U)])
(mapFunc: (VertexID, VD, Option[U]) => VD2) : Graph[VD2, ED]
// Aggregate information about adjacent triplets ==============================================
def collectNeighborIds(edgeDirection: EdgeDirection): VertexRDD[Array[VertexID]]
def collectNeighbors(edgeDirection: EdgeDirection): VertexRDD[Array[(VertexID, VD)]]
def aggregateMessages[Msg: ClassTag](
sendMsg: EdgeContext[VD, ED, Msg] => Unit,
mergeMsg: (Msg, Msg) => Msg,
tripletFields: TripletFields = TripletFields.All)
: VertexRDD[A]
// Iterative graph-parallel computation =====================================================
def pregel[A](initialMsg: A, maxIterations: Int, activeDirection: EdgeDirection)(
vprog: (VertexID, VD, A) => VD,
sendMsg: EdgeTriplet[VD, ED] => Iterator[(VertexID,A)],
mergeMsg: (A, A) => A)
: Graph[VD, ED]
// Basic graph algorithms ===================================================================
def pageRank(tol: Double, resetProb: Double = 0.15): Graph[Double, Double]
def connectedComponents(): Graph[VertexID, ED]
def triangleCount(): Graph[Int, ED]
def stronglyConnectedComponents(numIter: Int): Graph[VertexID, ED]
}
| © Copyright 2015 Hitachi Consulting62
Applied Graph Analytics
| © Copyright 2015 Hitachi Consulting63
 Represent data in graphical structures: Nodes and Edges.
 Nodes represent entities, Edges represent relationships
between entities.
 Relationships are directed, semantics of the direction
is up to the application. E.g. “Married” is reflexive, “Owns” is not.
 Each Node/Edge has a set of Key/Value properties
 Each Node/Edge has a label (type of entity/relationship)
 Optimized to process graph-related queries and analytics.
 Example Tools
 Neo4j
 OrientDB
 Titan
 Apache Giraph
 Microsoft Graph Engine (Trinity)
Graph Databases
NoSQL Graph Stores
Id: 1
Name: Khalid Salama
Age: 30
Email: Khalid.Salama@gmail.com
Id: 2
Name: Fatima Salama
Twitter: @fatbenamar
Id: 3
Model: Jaguar
Colour: Red
Id: 100
Since: 2014
Id: 101
Frequency: 2
Id: 102
Since: 2015
 Social Networks
 Network and IT Operations
 Fraud Detection
 Digital Assets Management
Real-world Scenarios
Person
Person
Car
Own
DriveOwned by
Married
Id: 103
Licence No:234
| © Copyright 2015 Hitachi Consulting64
Graph Databases
NoSQL Graph Stores
O’REILLY - GRAPH DATABASES
| © Copyright 2015 Hitachi Consulting65
Graph Databases
NoSQL Graph Stores
index-free adjacency; connected nodes
physically “point” to each other in the database
Any database behaves like a graphDB;
exposes a graph data model through
CRUD operations
Storage is designed and optimize to
store, process, and query graph data structures
Graphs are serialized in any database;
Relational, Document, or objectDBs
| © Copyright 2015 Hitachi Consulting66
Applied Graph Analytics
Neo4j Graph Database
 Most Popular GraphDB (according to db-engines).
 Free Community Edition and Commercial Enterprise Edition.
 Native Graph Processing and Storage.
 Uses Cypher Query Language (CQL).
 Scalability (Redundancy and Load Balancing) with High Availability (HA) package.
 Read capacity of HA cluster increases linearly with the number of servers.
 Can commit 10K of writes per second while maintaining fully ACID transactions.
| © Copyright 2015 Hitachi Consulting67
Applied Graph Analytics
Neo4j Graph Database
Create a Database
1. Create a folder in your file system (e.g. sample.graphdb)
2. Set the location of the database in .dblocation
3. Lunch neo4j
| © Copyright 2015 Hitachi Consulting68
Applied Graph Analytics
Neo4j Graph Database
Create a node
create (<Id>:<label>{<Property>:”Value”,…})
Example
create (p1:Person{name:”khalid”, age:”31”, gender:”male”})
| © Copyright 2015 Hitachi Consulting69
Applied Graph Analytics
Neo4j Graph Database
Create an edge
create ((<nodeId>)
-[<edgeId>:<label>{<Property>:”Value”,…}]->
(<nodeId>))
Example
create ((p1)-[e1:follows{datetime:”2010-10-05”}]->(p2))
| © Copyright 2015 Hitachi Consulting70
Applied Graph Analytics
Neo4j Graph Database
Retrieve nodes/edges
match (<pattern>) return (<objects>)
Example
match (p:perons) return p
Match (p1)-[r]->(p1) return p1,p2, r
| © Copyright 2015 Hitachi Consulting71
Applied Graph Analytics
Neo4j Graph Database
Update graph
match (<pattern>) merge (<objects>)
match (<pattern>) set (<object>.property = value)
Example
match (p:perons{name=“khalid Salama”) merge (p)-[:marriedTo]-
>(m:perons{name=“Fatima Zahra”})
match (p:person) where name = “khalid Salame” set job=“IT Manager”
| © Copyright 2015 Hitachi Consulting72
Applied Graph Analytics
Neo4j Graph Database
Delete nodes/edges
match (<pattern>) delete (<objects>)
Example
match (n)-[e]-() delete n,e
| © Copyright 2015 Hitachi Consulting73
Applied Graph Analytics
Neo4j Graph Database
Import csv Data to Neo4j
Source Target Distance
A B 4
A C 5
B D 5
C B 6
LOAD CSV WITH HEADER <filepath>.csv AS line
MERGE (x:city{name:line.Source})
MAEGE (y:city{name:line.Target})
MERGE (x)-[:To{distance=line.Distance}]->(y)
| © Copyright 2015 Hitachi Consulting74
Applied Graph Analytics
Neo4j Graph Database
//Counting the number of nodes
match (n:Label)
return count(n)
//Counting the number of edges
match (n:Label)-[r]->()
return count(r)
//Finding leaf nodes:
match (n:Label)-[r:TO]->(m)
where not ((m)-->())
return m
//Finding root nodes:
match (m)-[r:TO]->(n:Label)
where not (()-->(m))
return m
//Finding triangles:
match (a)-[:TO]->(b)-[:TO]->(c)-[:TO]->(a)
return distinct a, b, c
//Finding 2nd neighbors of D:
match (a)-[:TO*..2]-(b)
where a.Name='D'
return distinct a, b
//Finding the types of a node:
match (n)
where n.Name = ‘Egypt'
return labels(n)
//Finding the label of an edge:
match (n {Name: ‘Egypt'})<-[r]-()
return distinct type(r)
//Finding all properties of a node:
match (n:Actor)
return * limit 20
//Finding loops:
match (n)-[r]->(n)
return n, r limit 10
//Finding multigraphs:
match (n)-[r1]->(m), (n)-[r2]-(m)
where r1 <> r2
return n, r1, r2, m limit 10
//Finding the induced subgraph given a set of nodes:
match (n)-[r:TO]-(m)
where n.Name in ['A', 'B', 'C', 'D', 'E'] and m.Name in ['A', 'B', 'C', 'D', 'E']
return n, r, m
Basic Queries
| © Copyright 2015 Hitachi Consulting75
Applied Graph Analytics
Neo4j Graph Database
//Finding paths between specific nodes:
match p=(a)-[:TO]-(c)
where a.Name='H' and c.Name='P'
return p limit 1
//Finding the length between specific nodes:
match p=(a)-[:TO*]-(c)
where a.Name='H' and c.Name='P'
return length(p) limit 1
//Finding a shortest path between specific nodes:
match p=shortestPath((a)-[:TO*]-(c))
where a.Name='A' and c.Name='P'
return p, length(p) limit 1
//All Shortest Paths with Path Conditions:
match p = allShortestPaths((source)-[r:TO*]->(destination))
where source.Name='A' and destination.Name = 'P' and
length(nodes(p)) > 5
return extract(n in NODES(p)| n.Name) as Path, length(p) as
PathLenght
//Diameter of the graph:
match (n:Label), (m:Label)
where n <> m
with n, m
match p=shortestPath((n)-[*]->(m))
return n.Name, m.Name, length(p)
order by length(p) desc limit 1
//Extracting and computing with node and properties:
match p=(a)-[:TO*]-(c)
where a.Name='H' and c.Name='P'
return extract(n in nodes(p)|n.Name) as Path, length(p) as pathLength,
reduce(s=0, e in relationships(p)| s + toInt(e.dist)) as pathDist limit 1
//Graph not containing a selected node:
match (n)-[r:TO]->(m)
where n.Name <> 'D' and m.Name <> 'D'
return n, r, m
match (d {Name:'D'})-[:TO]-(b)<-[:TO]-(root)
where not((root)<--())
return (root)
//Graph not containing a selected neighborhood:
match (a {Name: 'F'})-[:TO*..2]-(b)
with collect(distinct b.Name) as MyList
match (n)-[r:TO]->(m)
where not(n.Name in MyList) and not (m.Name in MyList)
return distinct n, r, m
Path Analysis
| © Copyright 2015 Hitachi Consulting76
Applied Graph Analytics
Neo4j Graph Database
// Find the outdegree of all nodes
match (n:Label)-[r]->()
return n.Name as Node, count(r) as Outdegree
order by Outdegree
union
match (a:Label)-[r]->(leaf)
where not((leaf)-->())
return leaf.Name as Node, 0 as Outdegree
// Find the indegree of all nodes
match (n:Label)<-[r]-()
return n.Name as Node, count(r) as Indegree
order by Indegree
union
match (a:Label)<-[r]-(root)
where not((root)<--())
return root.Name as Node, 0 as Indegree
//Find the degree of all nodes
match (n:Label)-[r]-()
return n.Name, count(distinct r) as degree
order by degree
// Find degree histogram of the graph
match (n:Label)-[r]-()
with n as nodes, count(distinct r) as degree
return degree, count(nodes) order by degree asc
//Save the degree of the node as a new node property
match (n:Label)-[r]-()
with n, count(distinct r) as degree
set n.deg = degree
return n.Name, n.deg
// Construct the Adjacency Matrix of the graph
match (n:Label), (m:Label)
return n.Name, m.Name,
case
when (n)-->(m) then 1
else 0
end as value
Connectivity Analysis
| © Copyright 2015 Hitachi Consulting77
Applied Graph Analytics
Neo4j Graph Database - Example
A
B D
EC
| © Copyright 2015 Hitachi Consulting78
A
B D
EC
Following
Following
Following
Following
Following
FollowingFollowing
Following
Following
Applied Graph Analytics
Neo4j Graph Database - Example
| © Copyright 2015 Hitachi Consulting79
A
B D
EC
Following
Following
Following
Following
Following
FollowingFollowing
Following
Following
P2P1
P3
Posted
Posted
Posted
Applied Graph Analytics
Neo4j Graph Database - Example
| © Copyright 2015 Hitachi Consulting80
A
B D
EC
P2P1
P3
Following
Following
Following
Following
Following
FollowingFollowing
Following
Following
Likes
Likes
Likes
Likes
Likes
Likes
LikesPosted
Posted
Posted
Applied Graph Analytics
Neo4j Graph Database - Example
| © Copyright 2015 Hitachi Consulting81
CREATE
(a:User{name:"Khalid Salama", grade:"Manager"}),
(b:User{name:"Paul Lineham", grade:"Senior Manager"}),
(c:User{name:"Vaughn Rees", grade:"Senior Manager"}),
(d:User{name:"Sutha Thiru", grade:"Director"}),
(e:User{name:"Mark Hill", grade:"VP"}),
(a)-[:Following{since:'2014'}]->(d),
(a)-[:Following{since:'2014'}]->(b),
(b)-[:Following{since:'2010'}]->(a),
(d)-[:Following{since:'2011', strength:"high"}]->(e),
(e)-[:Following{since:'2014'}]->(d),
(e)-[:Following{since:'2015'}]->(c),
(c)-[:Following]->(d),
(c)-[:Following{since:'2013', strength:"low"}]->(a),
(b)-[:Following]->(c),
(p1:Post{title:"post 1", lastupdate:"01/01/2016", tags:['sports','life style']}),
(p2:Post{title:"post 2", lastupdate:"03/05/2015"}),
(p3:Post{title:"post 3", lastupdate:"121/7/2015", tags:['economics','politcs']}),
(a)-[:Posted]->(p1),
(d)-[:Posted]->(p2),
(c)-[:Posted]->(p3),
(b)-[:Liked]->(p1),
(c)-[:Liked]->(p1),
(a)-[:Liked]->(p2),
(b)-[:Liked]->(p2),
(e)-[:Liked]->(p2),
(a)-[:Liked]->(p3),
(e)-[:Liked]->(p3)
Applied Graph Analytics
Neo4j Graph Database - Example
| © Copyright 2015 Hitachi Consulting82
//fetch one node
MATCH (u:User{name:"Khalid Salama"}) RETURN u
// fetch an attribute of a node
MATCH (u:User{name:"Khalid Salama"}) RETURN u.grade
// fetch nodes by conditions
MATCH (u:User{grade:"Senior Manager"}) RETURN u
--
MATCH (u:User)
WHERE u.grade = 'Senior Manager'
RETURN u
--
MATCH (u:User)
WHERE u.name =~ "Sutha.+" // START WITH, END WITH, CONTAIN, IN [,],
RETURN u
--
MATCH ()-[r:Posted]->(p:Post)
WHERE 'sports' IN p.tags
RETURN p
// Whom khalid is following?
MATCH (x:User{name:"Khalid Salama"})-[r:Following]->(y:User)
RETURN x,r,y
// Who is Following Khalid
MATCH (x:User{name:"Khalid Salama"})<-[r:Following]-(y:User)
RETURN x,r,y
// Update
MERGE (u:User { name:"Khalid Salama" })
SET u.practice = "Data Insights & Analytics"
RETURN u
// Get Count of Posts
MATCH (p:Post) RETURN COUNT(p)
-- Get User Count By Grade
MATCH (u:User) RETURN u.grade, COUNT(u)
-- Get User and Followers
MATCH (u:User)<-[:Following]-(f:User)
RETURN u.name AS User,COLLECT(f.name) AS followrs,COUNT(f) AS Total
// Constraint
CREATE CONSTRAINT ON (u:User) ASSERT u.name IS UNIQUE
-- Index
CREATE INDEX ON :User(grade)
// Get users following each other
MATCH (u1:User)-[:Following]->(u2:User)-[:Following]->(u1)
RETURN u1.name,u2.name
// Get Users likes a post posted by a follower
MATCH (u:User)-[:Liked]->(p:Post)<-[:Posted]-(u2:User)-[:Following]->(u)
RETURN u,p,u2
// Get Following of Following
MATCH (u:User)-[:Following]->()-[:Following]->(u2:User)
Return u.name,COLLECT(DISTINCT u2.name)
// Get User with max 3 steps from Paul
MATCH (u:User)-[:Following*..3]->(us:User{name:"Paul Lineham"})
Return u
// Shortest path
MATCH
(u1:User{name:"Mark Hill"}),
(u2;User{name:"Paul Lineham"}),
p=SHORTESTPATH((u1)-[:Following*..10]->(u2))
RETURN p
-- Get nodes having a property
MATCH(p)
WHERE EXSITS(p.tags)
http://neo4j.com/docs/developer-manual/current/#cypher-query-lang
Applied Graph Analytics
Neo4j Graph Database - Example
| © Copyright 2015 Hitachi Consulting83
Useful Resources
 Coursera – Graph Analytics for Big Data
https://www.coursera.org/learn/big-data-graph-analytics/home/welcome
 Coursera – Data Manipulation at Scale (Lessons 21-24)
https://www.coursera.org/learn/data-manipulation/home/week/4
 Neo4j – Getting Started Tutorials
https://neo4j.com/developer/get-started
 Apache Spark – GraphX Documentation
http://spark.apache.org/docs/latest/graphx-programming-guide.html
| © Copyright 2015 Hitachi Consulting84
My Background
Applying Computational Intelligence in Data Mining
• Honorary Research Fellow, School of Computing , University of Kent.
• Ph.D. Computer Science, University of Kent, Canterbury, UK.
• M.Sc. Computer Science , The American University in Cairo, Egypt.
• 25+ published journal and conference papers, focusing on:
– classification rules induction,
– decision trees construction,
– Bayesian classification modelling,
– data reduction,
– instance-based learning,
– evolving neural networks, and
– data clustering
• Journals: Swarm Intelligence, Swarm & Evolutionary Computation,
, Applied Soft Computing, and Memetic Computing.
• Conferences: ANTS, IEEE CEC, IEEE SIS, EvoBio,
ECTA, IEEE WCCI and INNS-BigData.
ResearchGate.org
| © Copyright 2015 Hitachi Consulting85
Thank you!

Contenu connexe

Tendances

K means Clustering
K means ClusteringK means Clustering
K means Clustering
Edureka!
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based Search
Neo4j
 

Tendances (20)

Community detection in social networks
Community detection in social networksCommunity detection in social networks
Community detection in social networks
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
 
Graph based data models
Graph based data modelsGraph based data models
Graph based data models
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
 
K means Clustering
K means ClusteringK means Clustering
K means Clustering
 
Graph mining ppt
Graph mining pptGraph mining ppt
Graph mining ppt
 
Data Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisData Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data Analysis
 
03. Data Exploration.pptx
03. Data Exploration.pptx03. Data Exploration.pptx
03. Data Exploration.pptx
 
Lect12 graph mining
Lect12 graph miningLect12 graph mining
Lect12 graph mining
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 
Knowledge Graphs and Generative AI
Knowledge Graphs and Generative AIKnowledge Graphs and Generative AI
Knowledge Graphs and Generative AI
 
Introduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AIIntroduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AI
 
5.5 graph mining
5.5 graph mining5.5 graph mining
5.5 graph mining
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
Network centrality measures and their effectiveness
Network centrality measures and their effectivenessNetwork centrality measures and their effectiveness
Network centrality measures and their effectiveness
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based Search
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 

En vedette

Improving personalized recommendations through temporal overlapping community...
Improving personalized recommendations through temporal overlapping community...Improving personalized recommendations through temporal overlapping community...
Improving personalized recommendations through temporal overlapping community...
Mani kandan
 
Graph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph AnalyticsGraph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph Analytics
Nesreen K. Ahmed
 

En vedette (20)

Machine learning with Spark
Machine learning with SparkMachine learning with Spark
Machine learning with Spark
 
Microservices, DevOps, and Continuous Delivery
Microservices, DevOps, and Continuous DeliveryMicroservices, DevOps, and Continuous Delivery
Microservices, DevOps, and Continuous Delivery
 
8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and ...
8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and ...8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and ...
8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and ...
 
Developing apache spark jobs in .net using mobius
Developing apache spark jobs in .net using mobiusDeveloping apache spark jobs in .net using mobius
Developing apache spark jobs in .net using mobius
 
8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine
8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine
8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine
 
Improving personalized recommendations through temporal overlapping community...
Improving personalized recommendations through temporal overlapping community...Improving personalized recommendations through temporal overlapping community...
Improving personalized recommendations through temporal overlapping community...
 
Graph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph AnalyticsGraph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph Analytics
 
Apache giraph
Apache giraphApache giraph
Apache giraph
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!
 
Fast, Scalable Graph Processing: Apache Giraph on YARN
Fast, Scalable Graph Processing: Apache Giraph on YARNFast, Scalable Graph Processing: Apache Giraph on YARN
Fast, Scalable Graph Processing: Apache Giraph on YARN
 
Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...
Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...
Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...
 
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
 
Hadoop Graph Processing with Apache Giraph
Hadoop Graph Processing with Apache GiraphHadoop Graph Processing with Apache Giraph
Hadoop Graph Processing with Apache Giraph
 
Giraph at Hadoop Summit 2014
Giraph at Hadoop Summit 2014Giraph at Hadoop Summit 2014
Giraph at Hadoop Summit 2014
 
Graph Database Prototyping made easy with Graphgen
Graph Database Prototyping made easy with GraphgenGraph Database Prototyping made easy with Graphgen
Graph Database Prototyping made easy with Graphgen
 
Mobius: C# Language Binding For Spark
Mobius: C# Language Binding For SparkMobius: C# Language Binding For Spark
Mobius: C# Language Binding For Spark
 
(ADV403) Dynamic Ad Perf. Reporting w/ Redshift: Data Science, Queries at Sca...
(ADV403) Dynamic Ad Perf. Reporting w/ Redshift: Data Science, Queries at Sca...(ADV403) Dynamic Ad Perf. Reporting w/ Redshift: Data Science, Queries at Sca...
(ADV403) Dynamic Ad Perf. Reporting w/ Redshift: Data Science, Queries at Sca...
 
Graph Data -- RDF and Property Graphs
Graph Data -- RDF and Property GraphsGraph Data -- RDF and Property Graphs
Graph Data -- RDF and Property Graphs
 
Billions and Billions: Machines, Algorithms, and Growing Business in Programa...
Billions and Billions: Machines, Algorithms, and Growing Business in Programa...Billions and Billions: Machines, Algorithms, and Growing Business in Programa...
Billions and Billions: Machines, Algorithms, and Growing Business in Programa...
 
2011.10.14 Apache Giraph - Hortonworks
2011.10.14 Apache Giraph - Hortonworks2011.10.14 Apache Giraph - Hortonworks
2011.10.14 Apache Giraph - Hortonworks
 

Similaire à Graph Analytics

The Role of Selfies in Creating the Next Generation Computer Vision Infused O...
The Role of Selfies in Creating the Next Generation Computer Vision Infused O...The Role of Selfies in Creating the Next Generation Computer Vision Infused O...
The Role of Selfies in Creating the Next Generation Computer Vision Infused O...
hanumayamma
 
"Understanding and Implementing Face Landmark Detection and Tracking," a Pres...
"Understanding and Implementing Face Landmark Detection and Tracking," a Pres..."Understanding and Implementing Face Landmark Detection and Tracking," a Pres...
"Understanding and Implementing Face Landmark Detection and Tracking," a Pres...
Edge AI and Vision Alliance
 

Similaire à Graph Analytics (20)

Visual analysis of large graphs state of the art and future research challenges
Visual analysis of large graphs state of the art and future research challengesVisual analysis of large graphs state of the art and future research challenges
Visual analysis of large graphs state of the art and future research challenges
 
Graph Analysis over Relational Database. Roberto Franchini - Arcade Analytics
Graph Analysis over Relational Database. Roberto Franchini - Arcade AnalyticsGraph Analysis over Relational Database. Roberto Franchini - Arcade Analytics
Graph Analysis over Relational Database. Roberto Franchini - Arcade Analytics
 
Data analysis
Data analysisData analysis
Data analysis
 
TOGAF 9 Architectural Artifacts
TOGAF 9  Architectural ArtifactsTOGAF 9  Architectural Artifacts
TOGAF 9 Architectural Artifacts
 
COMPUTER CONTROL IN PROCESS PLANNING Unit 2 (ME CAD/CAM)
COMPUTER CONTROL IN PROCESS PLANNING Unit 2 (ME CAD/CAM)COMPUTER CONTROL IN PROCESS PLANNING Unit 2 (ME CAD/CAM)
COMPUTER CONTROL IN PROCESS PLANNING Unit 2 (ME CAD/CAM)
 
Graph analysis over relational database
Graph analysis over relational databaseGraph analysis over relational database
Graph analysis over relational database
 
Data visualization
Data visualizationData visualization
Data visualization
 
Graph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXGraph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkX
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
“Semantic PDF Processing & Document Representation”
“Semantic PDF Processing & Document Representation”“Semantic PDF Processing & Document Representation”
“Semantic PDF Processing & Document Representation”
 
Using Graph Algorithms For Advanced Analytics - Part 4 Similarity 30 graph al...
Using Graph Algorithms For Advanced Analytics - Part 4 Similarity 30 graph al...Using Graph Algorithms For Advanced Analytics - Part 4 Similarity 30 graph al...
Using Graph Algorithms For Advanced Analytics - Part 4 Similarity 30 graph al...
 
What Makes Graph Queries Difficult?
What Makes Graph Queries Difficult?What Makes Graph Queries Difficult?
What Makes Graph Queries Difficult?
 
Multiple Graphs: Updatable Views
Multiple Graphs: Updatable ViewsMultiple Graphs: Updatable Views
Multiple Graphs: Updatable Views
 
The Role of Selfies in Creating the Next Generation Computer Vision Infused O...
The Role of Selfies in Creating the Next Generation Computer Vision Infused O...The Role of Selfies in Creating the Next Generation Computer Vision Infused O...
The Role of Selfies in Creating the Next Generation Computer Vision Infused O...
 
Interactive data visualization project
Interactive data visualization project Interactive data visualization project
Interactive data visualization project
 
Graph Gurus Episode 26: Using Graph Algorithms for Advanced Analytics Part 1
Graph Gurus Episode 26: Using Graph Algorithms for Advanced Analytics Part 1Graph Gurus Episode 26: Using Graph Algorithms for Advanced Analytics Part 1
Graph Gurus Episode 26: Using Graph Algorithms for Advanced Analytics Part 1
 
"Understanding and Implementing Face Landmark Detection and Tracking," a Pres...
"Understanding and Implementing Face Landmark Detection and Tracking," a Pres..."Understanding and Implementing Face Landmark Detection and Tracking," a Pres...
"Understanding and Implementing Face Landmark Detection and Tracking," a Pres...
 
Software training report on AutoCAD and Staad Pro. Civil Engineering
Software training report on AutoCAD and Staad Pro. Civil EngineeringSoftware training report on AutoCAD and Staad Pro. Civil Engineering
Software training report on AutoCAD and Staad Pro. Civil Engineering
 
Data Visualization - Presentation at Microsoft IT Pro Mumbai July 2010
Data Visualization - Presentation at Microsoft IT Pro Mumbai July 2010Data Visualization - Presentation at Microsoft IT Pro Mumbai July 2010
Data Visualization - Presentation at Microsoft IT Pro Mumbai July 2010
 
Using Graphs for Feature Engineering_ Graph Reduce-2.pdf
Using Graphs for Feature Engineering_ Graph Reduce-2.pdfUsing Graphs for Feature Engineering_ Graph Reduce-2.pdf
Using Graphs for Feature Engineering_ Graph Reduce-2.pdf
 

Plus de Khalid Salama

Plus de Khalid Salama (10)

Microsoft R - ScaleR Overview
Microsoft R - ScaleR OverviewMicrosoft R - ScaleR Overview
Microsoft R - ScaleR Overview
 
Operational Machine Learning: Using Microsoft Technologies for Applied Data S...
Operational Machine Learning: Using Microsoft Technologies for Applied Data S...Operational Machine Learning: Using Microsoft Technologies for Applied Data S...
Operational Machine Learning: Using Microsoft Technologies for Applied Data S...
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Spark with HDInsight
Spark with HDInsightSpark with HDInsight
Spark with HDInsight
 
Enterprise Cloud Data Platforms - with Microsoft Azure
Enterprise Cloud Data Platforms - with Microsoft AzureEnterprise Cloud Data Platforms - with Microsoft Azure
Enterprise Cloud Data Platforms - with Microsoft Azure
 
Microsoft Azure Batch
Microsoft Azure BatchMicrosoft Azure Batch
Microsoft Azure Batch
 
NoSQL with Microsoft Azure
NoSQL with Microsoft AzureNoSQL with Microsoft Azure
NoSQL with Microsoft Azure
 
Hive with HDInsight
Hive with HDInsightHive with HDInsight
Hive with HDInsight
 
Real-Time Event & Stream Processing on MS Azure
Real-Time Event & Stream Processing on MS AzureReal-Time Event & Stream Processing on MS Azure
Real-Time Event & Stream Processing on MS Azure
 
Intorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft AzureIntorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft Azure
 

Dernier

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Dernier (20)

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 

Graph Analytics

  • 1. | © Copyright 2015 Hitachi Consulting1 Graph Analytics Basic Theory and Applications Khalid M. Salama, Ph.D. Business Insights & Analytics Hitachi Consulting UK We Make it Happen. Better.
  • 2. | © Copyright 2015 Hitachi Consulting2 Outline  Overview on Graphs  Path Analytics  Connectivity Analytics  Community Analytics  Centrality Analytics  Pattern Matching  Parallel Programming Model for Graphs  Applied Graph Analytics  Useful Resources
  • 3. | © Copyright 2015 Hitachi Consulting3 Introduction Graph Analytics - “Built on the mathematics of graph theory, graph analytics help to understand, codify, and visualize relationships that exist between objects in a given domain context, in order to uncover insights about the structures and patterns of the objects relationships.” Graph Databases – “A NoSQL family of data stores that is optimized to store, model, and process data in a graphical form, as well as answering graph-related queries efficiently.” Graph Analytics and Databases
  • 4. | © Copyright 2015 Hitachi Consulting4 Graphs Overview
  • 5. | © Copyright 2015 Hitachi Consulting5 What is NOT a Graph? Basic Concepts These are NOT graphs! These are charts!
  • 6. | © Copyright 2015 Hitachi Consulting6 What is a Graph? Basic Concepts In computing, a graph is abstract data structure that represents set objects and their relationships as vertices and edges, and supports a number of graph-related operations
  • 7. | © Copyright 2015 Hitachi Consulting7 What is a Graph? Basic Concepts In computing, a graph is abstract data structure that represents set objects and their relationships as vertices and edges, and supports a number of graph-related operations A B C D  Objects (nodes): {A, B, C, D}
  • 8. | © Copyright 2015 Hitachi Consulting8 What is a Graph? Basic Concepts In computing, a graph is abstract data structure that represents set objects and their relationships as vertices and edges, and supports a number of graph-related operations A B C D  Objects (nodes): {A, B, C, D}  Relationships (edges): {(D,B),(D,A),(B,C),(B,A),(C,A)}
  • 9. | © Copyright 2015 Hitachi Consulting9 What is a Graph? Basic Concepts In computing, a graph is abstract data structure that represents set objects and their relationships as vertices and edges, and supports a number of graph-related operations A B C D  Objects (nodes): {A, B, C, D}  Relationships (edges): {(D,B),(D,A),(B,C),(B,A),(C,A)}  Operation: shortest path between D and A
  • 10. | © Copyright 2015 Hitachi Consulting10 What is a Graph? Graph operation examples  graph.GetNodes(<condition>)  graph.GetEdges(<condition>)  graph.AddNode(node)  graph.AddEdge(node1,node2)  graph.AddEdge(edge)  graph.RemoveNode(node)  graph.GetShortestPath(node1,node2)  graph.Neighbours(node,level)  graph.GetDistance(node1,node2)  node.GetParents()  node.GetChildren()  node.GetAncestors(level)  node.GetDescendants(level)  node.IsAncestorTo(node2)  node.IsDescendant(node2)  node.AddParent(parentNode)  node.AddChild(childeNode)  node.IsReachable(node2)
  • 11. | © Copyright 2015 Hitachi Consulting11 What is a Graph? Real-world examples… Social Media – Twitter Biology – Biological Entities Networks Geo IS – Smart Cities Reasoning – Predictive Maintenance  Identify groups (communities) and group interactions  Find influencers in community  Extract topic interests  Discover unknown relationships (gene/ protein to disease, disease to disease, cure to disease, etc.)  Exploratory Data Analysis & anomaly detection  Coverage analysis  Traffic flow, congestion estimation, routing  Failure Impact analysis  Predict the next state given the current (and previous state(s))  Compute the probability of sequence of event
  • 12. | © Copyright 2015 Hitachi Consulting12 Why Graphs? Importance of graph data structures Efficient Data ProcessingIntuitive Representation Efficient Query/Analytics Suitable for Relation/Interaction-Intensive Data Domains
  • 13. | © Copyright 2015 Hitachi Consulting13 Graph Types Directed Graphs Directed Acyclic Graphs Undirected Graphs A B C A B C A B C Dependency networksState-transition models Connectivity networks Directionality and circulation
  • 14. | © Copyright 2015 Hitachi Consulting14 Simple Graph Representation Adjacency Matrix A B C
  • 15. | © Copyright 2015 Hitachi Consulting15 Simple Graph Representation Adjacency Matrix A B C A B C A 0 0 0 B 1 1 1 C 1 1 0 From To
  • 16. | © Copyright 2015 Hitachi Consulting16 Simple Graph Representation Adjacency Matrix A B C A B C A B C A 0 0 0 B 1 1 1 C 1 1 0 From To A B C A 0 0 0 B 3 4 2 C 1 5 0 From 1 2 3 4 5 Weighted DG
  • 17. | © Copyright 2015 Hitachi Consulting17 Simple Graph Representation Edge Table A B C FROM TO WEIGHT B A .. B C .. B B .. C B .. C A .. Useful in Relational Databases
  • 18. | © Copyright 2015 Hitachi Consulting18 Simple Graph Representation Adjacency list A B C Node IN OUT A B,C - B B,C A,C C B A,B Useful in MapReduce
  • 19. | © Copyright 2015 Hitachi Consulting19 Label Property Graph Model Defining information-rich graphs In a simple model, a graph consist of:  A set of vertices (nodes)  A set of edges (each connecting two nodes)
  • 20. | © Copyright 2015 Hitachi Consulting20 Label Property Graph Model Defining information-rich graphs In a simple model, a graph consist of:  A set of vertices (nodes)  A set of edges (each connecting two nodes) In the Label Property Graph Model, each element (vertex/edge) has:  Unique Identifier  Class (label)  A set of Key/Value pairs (properties)
  • 21. | © Copyright 2015 Hitachi Consulting21 Label Property Graph Model Defining information-rich graphs A Person In a simple model, a graph consist of:  A set of vertices (nodes)  A set of edges (each connecting two nodes) In the Label Property Graph Model, each element (vertex/edge) has:  Unique Identifier  Class (label)  A set of Key/Value pairs (properties)
  • 22. | © Copyright 2015 Hitachi Consulting22  Id: A  Label: Person  Name: Khalid Salama  Age: 31  Profession: Consultant Label Property Graph Model Defining information-rich graphs A Person In a simple model, a graph consist of:  A set of vertices (nodes)  A set of edges (each connecting two nodes) In the Label Property Graph Model, each element (vertex/edge) has:  Unique Identifier  Class (label)  A set of Key/Value pairs (properties)
  • 23. | © Copyright 2015 Hitachi Consulting23  Id: B  Label: Post  Title: Graph Databases  Tages: [Big Data, NoSQL, Analytics]  Id: A  Label: Person  Name: Khalid Salama  Age: 31  Profession: Consultant Label Property Graph Model Defining information-rich graphs A B Person Post Posted In a simple model, a graph consist of:  A set of vertices (nodes)  A set of edges (each connecting two nodes) In the Label Property Graph Model, each element (vertex/edge) has:  Unique Identifier  Class (label)  A set of Key/Value pairs (properties)
  • 24. | © Copyright 2015 Hitachi Consulting24  Id: X (A,B)  Label: Posted  Datetime: 10-10-2016  Id: B  Label: Post  Title: Graph Databases  Tages: [Big Data, NoSQL, Analytics]  Id: A  Label: Person  Name: Khalid Salama  Age: 31  Profession: Consultant Label Property Graph Model Defining information-rich graphs A B Person Post Posted In a simple model, a graph consist of:  A set of vertices (nodes)  A set of edges (each connecting two nodes) In the Label Property Graph Model, each element (vertex/edge) has:  Unique Identifier  Class (label)  A set of Key/Value pairs (properties)
  • 25. | © Copyright 2015 Hitachi Consulting25  Id: X (A,B)  Label: Posted  Datetime: 10-10-2016  Id: C  Label: Person  Name: Dishan  Id: B  Label: Post  Title: Graph Databases  Tages: [Big Data, NoSQL, Analytics]  Id: A  Label: Person  Name: Khalid Salama  Age: 31  Profession: Consultant Label Property Graph Model Defining information-rich graphs A B C Person Person Post LikesFollows Posted In a simple model, a graph consist of:  A set of vertices (nodes)  A set of edges (each connecting two nodes) In the Label Property Graph Model, each element (vertex/edge) has:  Unique Identifier  Class (label)  A set of Key/Value pairs (properties)
  • 26. | © Copyright 2015 Hitachi Consulting26 Types of Graphs Analytics
  • 27. | © Copyright 2015 Hitachi Consulting27 Types of Graph Analytics Relationships Analytics Path Analytics & Traversing Connectivity Analytical Community Analytics Centrality Analytics Pattern Matching
  • 28. | © Copyright 2015 Hitachi Consulting28 Connectivity Analytics
  • 29. | © Copyright 2015 Hitachi Consulting29 Connectivity Analytics Graph structural analysis How big is the graph? Number of Vertices Number of Edges Degree Distribution Volume – Number of edges increases quadraticly with respect to number of nodes Velocity – How frequent a new vertex or edge is added to the graph Degree  In-degree of a vertex: number of edges pointing to the vertex (parents)  Out-degree of a vertex: number of edges point out of the vertex (children)  Degree of a vertex: number of neighbour of a vertex in an undirected graph
  • 30. | © Copyright 2015 Hitachi Consulting30 Connectivity Analytics Graph structural analysis Degree Histogram – describes the skewness of the degree distribution in a graph 0 50 100 150 200 0 5 10 15 NUMBER OFVVERTICES DEGREE OF A VERTEX 0 50 100 150 200 0 5 10 15 NUMBER OFVVERTICES DEGREE OF A VERTEX Exponentially unlikely to find a vertex with increased degree In some case, it is more likely to find more vertices with high number of edges 0 50 100 150 0 2 4 6 8 10 12 NUMBER OFVVERTICES DEGREE OF A VERTEX Or it can be multi-modal
  • 31. | © Copyright 2015 Hitachi Consulting31 Connectivity Analytics Graph structural analysis Degree Histogram – Random vs Natural Graphs 1 10 100 1000 10000 0 5 10 15 LOG-NUMBER OFVERTICES DEGREE OF A VERTEX In random graphs, exponentially unlikely to find a vertex with increased degree A vertex with higher degree (more connections) is more likely to get a new edge, compared to less connected vertices – Social Networks 1 10 100 1000 10000 0 5 10 15 LOG-NUMBER OFVERTICES DEGREE OF A VERTEX In some case, it is more likely to find more vertices with high number of edges Exponential Distribution ZipF Distribution
  • 32. | © Copyright 2015 Hitachi Consulting32 Connectivity Analytics Graph structural analysis  Highly connected nodes – nodes with high In/Out-Degree.  Graph Robustness – how easy to break the graph by removing a few nodes/edges (Built-in Redundancy)  Connectivity Coefficient: minimum number of nodes you need to remove to disconnect a graph (E.g. node B) - Useful in network fragility analysis and social media advertising  Connectivity: X is reachable from Y OR Y is reachable from X  Strong Connectivity: X is reachable from Y AND Y is reachable from X - High degree nodes make the network more vulnerable.  Graph Comparison – how similar graph G1 to G2?  Number of nodes  Number of edges  Ratio of Nodes to Edges  In/Out Degree Histogram  Connectivity Coefficient C E F D B G A
  • 33. | © Copyright 2015 Hitachi Consulting33 Connectivity Analytics Graph structural analysis  Fully connected graph: Each node has edges to all the other nodes (usually undirected graph)  Can we find subgraphs, in a given graphs, that are fully connected? (Cliques) A D B E C
  • 34. | © Copyright 2015 Hitachi Consulting34  Fully connected graph: Each node has edges to all the other nodes  Terminal node: A node with no outgoing edges A D B E C Connectivity Analytics Graph structural analysis
  • 35. | © Copyright 2015 Hitachi Consulting35  Fully connected graph: Each node has edges to all the other nodes  Terminal node: A node with no outgoing edges  Unreachable node: A node no ingoing edges A D B E C Connectivity Analytics Graph structural analysis
  • 36. | © Copyright 2015 Hitachi Consulting36  Fully connected graph: Each node has edges to all the other nodes  Terminal node: A node with no outgoing edges  Unreachable node: A node no ingoing edges  Hub vs. Authorities: High In-degree vs High Out-degree A is a hub node, C is an authority node E.g.: Social Networks: Talkers vs. Listener E.g.: Web structure A D B E C Connectivity Analytics Graph structural analysis
  • 37. | © Copyright 2015 Hitachi Consulting37 Path Analytics
  • 38. | © Copyright 2015 Hitachi Consulting38 Path Analytics & Graph Traversing Concepts and operations  Path: A set of (ordered) edges between node x and node y C  B  E B  A A D B E C
  • 39. | © Copyright 2015 Hitachi Consulting39 Path Analytics & Graph Traversing Concepts and operations  Path: A set of (ordered) edges between node x and node y  Cycle: A path where the start and the end nodes are the same ECBAE A D B E C
  • 40. | © Copyright 2015 Hitachi Consulting40 Path Analytics & Graph Traversing Concepts and operations  Path: A set of (ordered) edges between node x and node y  Cycle: A path where the start and the end nodes are the same  Trail: A path with no repeated edges ECBEDA A D B E C
  • 41. | © Copyright 2015 Hitachi Consulting41 Path Analytics & Graph Traversing Concepts and operations  Path: A set of (ordered) edges between node x and node y  Cycle: A path where the start and the end nodes are the same  Trail: A path with no repeated edges  Tour: A cycle traversing all the nodes, only once. DAECBD A D B E C
  • 42. | © Copyright 2015 Hitachi Consulting42 Path Analytics & Graph Traversing Concepts and operations  Path: A set of (ordered) edges between node x and node y  Cycle: A path where the start and the end nodes are the same  Trail: A path with no repeated edges  Tour: A cycle traversing all the nodes, only once.  Reachability: Can we reach node D from node C? A D B E C
  • 43. | © Copyright 2015 Hitachi Consulting43 Path Analytics & Graph Traversing Concepts and operations  Path: A set of (ordered) edges between node x and node y  Cycle: A path where the start and the end nodes are the same  Trail: A path with no repeated edges  Tour: A cycle traversing all the nodes, only once.  Reachability: Can we reach node D from node C?  Shortest path: minimum steps (edges) between two nodes  Breadth-First Search  Dijkstra's algorithm A D B E C
  • 44. | © Copyright 2015 Hitachi Consulting44 Path Analytics & Graph Traversing Concepts and operations  Path: A set of (ordered) edges between node x and node y  Cycle: A path where the start and the end nodes are the same  Trail: A path with no repeated edges  Tour: A cycle traversing all the nodes, only once.  Reachability: Can we reach node D from node C?  Shortest path: minimum steps (edges) between two nodes  Breadth-First Search  Dijkstra's algorithm  Best path (weighted graph): path that minimize total weight  Optimize a given function  Satisfy given constrains A D B E C 10 20 3 10 5 8 6 4
  • 45. | © Copyright 2015 Hitachi Consulting45 Path Analytics & Graph Traversing Concepts and operations  Path: A set of (ordered) edges between node x and node y  Cycle: A path where the start and the end nodes are the same  Trail: A path with no repeated edges  Tour: A cycle traversing all the nodes, only once.  Reachability: Can we reach node D from node C?  Shortest path: minimum steps (edges) between two nodes  Breadth-First Search  Dijkstra's algorithm  Best path (weighted graph): path that minimize total weight  Optimize a given function  Satisfy given constrains  Graph Diameter: The longest “shortest path” between two (reachable) nodes (Distance Matrix) – Structural Analysis A B C D E A - 8 ∞ 10 5 B 8 - 12 13 ∞ C 11 - 20 ∞ D 4 9 16 - 10 E 7 ∞ ∞ 5 - Distance Matrix (Shortest Path Paris) In this example (Directed Graph), the Graph Diameter is 20, which is the longest shortest path (that is the one from C to D)
  • 46. | © Copyright 2015 Hitachi Consulting46 Path Analytics & Graph Traversing Concepts and operations  Path: A set of (ordered) edges between node x and node y  Cycle: A path where the start and the end nodes are the same  Trail: A path with no repeated edges  Tour: A cycle traversing all the nodes, only once.  Reachability: Can we reach node D from node C?  Shortest path: minimum steps (edges) between two nodes  Breadth-First Search  Dijkstra's algorithm  Best path (weighted graph): path that minimize total weight  Optimize a given function  Satisfy given constrains  Graph Diameter: The longest “shortest path” between two (reachable) nodes (Distance Matrix) – Structural Analysis  Minimum Spanning Trees: edges that connect all the nodes with no cycles and minimum weight. A D B E C 10 20 3 10 5 80 6 40
  • 47. | © Copyright 2015 Hitachi Consulting47 Community Analytics
  • 48. | © Copyright 2015 Hitachi Consulting48 Community Analytics Graph Clustering/Partitioning A dense subgraph (cluster) with in a graph, in which its nodes are more connected with a cluster than to the nodes outside the cluster  Cohesion – Connectivity “within” the cluster is high  Separation – Connectivity “between” clusters is low Analytical Questions  Static – Discover community  Static – Describe interaction with a community  Static – Describe interaction between communities  Temporal – How a community emerged/dissolved?  Temporal – Which communities are stable  Temporal – Predict of a node will migrate to another community?
  • 49. | © Copyright 2015 Hitachi Consulting49 Community Analytics Graph Clustering/Partitioning Finding Communities Global PropertiesLocal Properties  n-Clique (distance): largest subgraph that the maximum distance between each two nodes is <= n  n-Clans (distance): an n-clique in which the largest distance between nodes in the subgraph is <=n  k-Core (density): largest subgraph that each nodes is connected to at least k-nodes within the sub graph Modularity –  The fraction of the edges that fall within the given subgraph minus the expected such fraction if edges were distributed at random  Reflects the concentration of edges within subgraph compared with random distribution of edges between all nodes regardless of subgraphs.
  • 50. | © Copyright 2015 Hitachi Consulting50 Centrality Analytics
  • 51. | © Copyright 2015 Hitachi Consulting51 Centrality Analytics Vertex Importance Analysis Network Centralization (graph-level measure) – Measure of degree of variation of centrality score amongst the nodes of the network Connectivity Importance Closeness Importance Betweeness Importance Vulnerability  Average length of all its shortest paths, compared to the averages of the other vertices (using Distance Matrix) - I.e., From vertex X, you can reach most of the other vertices quicker  Simply, the degree of node X (in and out degrees). - I.e., the queen bees in a community (used for target marketing)  The fraction of the shortest paths that X appears in. - I.e., if x is important, then most of the (shortest) paths between any two vertices in a graph pass through x (important underground station).  Vertex X belongs to the minimum node set that, if removed from the graph, the graph is disconnected. - Or, its removal will cause a high disruption in the network
  • 52. | © Copyright 2015 Hitachi Consulting52 Centrality Analytics Vertex Importance Analysis Page Rank - The importance (rank) of a vertex is computed as the total rank of all its adjacent edges (a.k.a Eigenvector Centrality).  I.e., the importance of a given vertex is not only how well-connected it is, it is also how well-connected its neighbours are.  Including a damping factor: the further the you go away the vertex, the less important it is on the rank of the vertex PageRank can be interpreted as the probability to visit a page…
  • 53. | © Copyright 2015 Hitachi Consulting53 Pattern Matching
  • 54. | © Copyright 2015 Hitachi Consulting54 Pattern Matching Graph Query  Find the following patterns in a given graph  Find the following patter in a given Property Model graph X Y X Y Z A B X C W X ZY SIBLING SIBLING MARRIEDMARRIED MAN WOMAN WOMAN MAN W X Y Z DRUG DRRUG GENE DISEASE INTERFERES REGULATES ASSOCIATED
  • 55. | © Copyright 2015 Hitachi Consulting55 Pattern Matching Applications Banking – Fraud Detection Security – Threat Detection Bioinformatics & Biochemistry – Association Analysis Social Networks – Job/Candidate suggestion GPS & Smart Cities – Traffic/Accident Analysis Telecom – Targeted Campaigning
  • 56. | © Copyright 2015 Hitachi Consulting56 Parallel Programming Model for Graphs
  • 57. | © Copyright 2015 Hitachi Consulting57 Parallel Programming Model for Graphs Graph Processing Communication Parallelism Type Shared Memory Message Passing Task Data Distributed ComputingParallel Processing Big Data ProcessingHigh Performance Computing
  • 58. | © Copyright 2015 Hitachi Consulting58 Parallel Programming Model for Graphs Graph Processing  Data Parallelism – Each compute node has a subset of graph vertex.  Message Passing – A vertex can communicate (send/receive a message) to a vertex (in another compute node) if it has an outgoing edge to.  Processing of vertices is performed in parallel – E.g., Bulk Synchronous Parallelism (BSP)
  • 59. | © Copyright 2015 Hitachi Consulting59 Parallel Programming Model for Graphs Graph Processing  Data Parallelism – Each compute node has a subset of graph vertex.  Message Passing – A vertex can communicate (send/receive a message) to a vertex (in another compute node) if it has an outgoing edge to.  Processing of vertices can is performed in parallel – E.g., Bulk Synchronous Parallelism (BSP) A B C D E F G H Compute Node 1 Compute Node 2 Compute Node 3 Compute Node 4 E.g.: Find the shortest path between A, H, in parallel 5 3 4 2 1 3 5 4 3 1
  • 60. | © Copyright 2015 Hitachi Consulting60 Parallel Programming Model for Graphs Graph Processing Pregel - A System for Large Scale Graph Processing  Published by Google  Based on Bulk Synchronous Parallelism  Receive Messages from parent Nodes  Compute  Send Messages to Child Nodes  Pause & Synchronize  Example Application: PageRank Graph Processing Tools: Giraph – HDFS, MapReduce, YARN (JAVA) GraphX – Spark, RDDs (Scala) }Superstep
  • 61. | © Copyright 2015 Hitachi Consulting61 Parallel Programming Model for Graphs Graph Processing - GraphX class Graph[VD, ED] { // Information about the Graph =========================================== val numEdges: Long val numVertices: Long val inDegrees: VertexRDD[Int] val outDegrees: VertexRDD[Int] val degrees: VertexRDD[Int] // Views of the graph as collections ===================================== val vertices: VertexRDD[VD] val edges: EdgeRDD[ED] val triplets: RDD[EdgeTriplet[VD, ED]] // Functions for caching graphs ==================================== def persist(newLevel: StorageLevel = StorageLevel.MEMORY_ONLY): Graph[VD, ED] def cache(): Graph[VD, ED] def unpersistVertices(blocking: Boolean = true): Graph[VD, ED] // Change the partitioning heuristic ===================================== def partitionBy(partitionStrategy: PartitionStrategy): Graph[VD, ED] // Transform vertex and edge attributes ====================================== def mapVertices[VD2](map: (VertexID, VD) => VD2): Graph[VD2, ED] def mapEdges[ED2](map: Edge[ED] => ED2): Graph[VD, ED2] def mapEdges[ED2](map: (PartitionID, Iterator[Edge[ED]]) => Iterator[ED2]): Graph[VD, ED2] def mapTriplets[ED2](map: EdgeTriplet[VD, ED] => ED2): Graph[VD, ED2] def mapTriplets[ED2](map: (PartitionID, Iterator[EdgeTriplet[VD, ED]]) => Iterator[ED2]) : Graph[VD, ED2] def reverse: Graph[VD, ED] def subgraph(epred: EdgeTriplet[VD,ED] => Boolean = (x => true), vpred: (VertexID, VD) => Boolean = ((v, d) => true)): Graph[VD, ED] // Modify the graph structure =========================================================== def mask[VD2, ED2](other: Graph[VD2, ED2]): Graph[VD, ED] def groupEdges(merge: (ED, ED) => ED): Graph[VD, ED] // Join RDDs with the graph ============================================================ def joinVertices[U](table: RDD[(VertexID, U)])(mapFunc: (VertexID, VD, U) => VD) : Graph[VD, ED] def outerJoinVertices[U, VD2](other: RDD[(VertexID, U)]) (mapFunc: (VertexID, VD, Option[U]) => VD2) : Graph[VD2, ED] // Aggregate information about adjacent triplets ============================================== def collectNeighborIds(edgeDirection: EdgeDirection): VertexRDD[Array[VertexID]] def collectNeighbors(edgeDirection: EdgeDirection): VertexRDD[Array[(VertexID, VD)]] def aggregateMessages[Msg: ClassTag]( sendMsg: EdgeContext[VD, ED, Msg] => Unit, mergeMsg: (Msg, Msg) => Msg, tripletFields: TripletFields = TripletFields.All) : VertexRDD[A] // Iterative graph-parallel computation ===================================================== def pregel[A](initialMsg: A, maxIterations: Int, activeDirection: EdgeDirection)( vprog: (VertexID, VD, A) => VD, sendMsg: EdgeTriplet[VD, ED] => Iterator[(VertexID,A)], mergeMsg: (A, A) => A) : Graph[VD, ED] // Basic graph algorithms =================================================================== def pageRank(tol: Double, resetProb: Double = 0.15): Graph[Double, Double] def connectedComponents(): Graph[VertexID, ED] def triangleCount(): Graph[Int, ED] def stronglyConnectedComponents(numIter: Int): Graph[VertexID, ED] }
  • 62. | © Copyright 2015 Hitachi Consulting62 Applied Graph Analytics
  • 63. | © Copyright 2015 Hitachi Consulting63  Represent data in graphical structures: Nodes and Edges.  Nodes represent entities, Edges represent relationships between entities.  Relationships are directed, semantics of the direction is up to the application. E.g. “Married” is reflexive, “Owns” is not.  Each Node/Edge has a set of Key/Value properties  Each Node/Edge has a label (type of entity/relationship)  Optimized to process graph-related queries and analytics.  Example Tools  Neo4j  OrientDB  Titan  Apache Giraph  Microsoft Graph Engine (Trinity) Graph Databases NoSQL Graph Stores Id: 1 Name: Khalid Salama Age: 30 Email: Khalid.Salama@gmail.com Id: 2 Name: Fatima Salama Twitter: @fatbenamar Id: 3 Model: Jaguar Colour: Red Id: 100 Since: 2014 Id: 101 Frequency: 2 Id: 102 Since: 2015  Social Networks  Network and IT Operations  Fraud Detection  Digital Assets Management Real-world Scenarios Person Person Car Own DriveOwned by Married Id: 103 Licence No:234
  • 64. | © Copyright 2015 Hitachi Consulting64 Graph Databases NoSQL Graph Stores O’REILLY - GRAPH DATABASES
  • 65. | © Copyright 2015 Hitachi Consulting65 Graph Databases NoSQL Graph Stores index-free adjacency; connected nodes physically “point” to each other in the database Any database behaves like a graphDB; exposes a graph data model through CRUD operations Storage is designed and optimize to store, process, and query graph data structures Graphs are serialized in any database; Relational, Document, or objectDBs
  • 66. | © Copyright 2015 Hitachi Consulting66 Applied Graph Analytics Neo4j Graph Database  Most Popular GraphDB (according to db-engines).  Free Community Edition and Commercial Enterprise Edition.  Native Graph Processing and Storage.  Uses Cypher Query Language (CQL).  Scalability (Redundancy and Load Balancing) with High Availability (HA) package.  Read capacity of HA cluster increases linearly with the number of servers.  Can commit 10K of writes per second while maintaining fully ACID transactions.
  • 67. | © Copyright 2015 Hitachi Consulting67 Applied Graph Analytics Neo4j Graph Database Create a Database 1. Create a folder in your file system (e.g. sample.graphdb) 2. Set the location of the database in .dblocation 3. Lunch neo4j
  • 68. | © Copyright 2015 Hitachi Consulting68 Applied Graph Analytics Neo4j Graph Database Create a node create (<Id>:<label>{<Property>:”Value”,…}) Example create (p1:Person{name:”khalid”, age:”31”, gender:”male”})
  • 69. | © Copyright 2015 Hitachi Consulting69 Applied Graph Analytics Neo4j Graph Database Create an edge create ((<nodeId>) -[<edgeId>:<label>{<Property>:”Value”,…}]-> (<nodeId>)) Example create ((p1)-[e1:follows{datetime:”2010-10-05”}]->(p2))
  • 70. | © Copyright 2015 Hitachi Consulting70 Applied Graph Analytics Neo4j Graph Database Retrieve nodes/edges match (<pattern>) return (<objects>) Example match (p:perons) return p Match (p1)-[r]->(p1) return p1,p2, r
  • 71. | © Copyright 2015 Hitachi Consulting71 Applied Graph Analytics Neo4j Graph Database Update graph match (<pattern>) merge (<objects>) match (<pattern>) set (<object>.property = value) Example match (p:perons{name=“khalid Salama”) merge (p)-[:marriedTo]- >(m:perons{name=“Fatima Zahra”}) match (p:person) where name = “khalid Salame” set job=“IT Manager”
  • 72. | © Copyright 2015 Hitachi Consulting72 Applied Graph Analytics Neo4j Graph Database Delete nodes/edges match (<pattern>) delete (<objects>) Example match (n)-[e]-() delete n,e
  • 73. | © Copyright 2015 Hitachi Consulting73 Applied Graph Analytics Neo4j Graph Database Import csv Data to Neo4j Source Target Distance A B 4 A C 5 B D 5 C B 6 LOAD CSV WITH HEADER <filepath>.csv AS line MERGE (x:city{name:line.Source}) MAEGE (y:city{name:line.Target}) MERGE (x)-[:To{distance=line.Distance}]->(y)
  • 74. | © Copyright 2015 Hitachi Consulting74 Applied Graph Analytics Neo4j Graph Database //Counting the number of nodes match (n:Label) return count(n) //Counting the number of edges match (n:Label)-[r]->() return count(r) //Finding leaf nodes: match (n:Label)-[r:TO]->(m) where not ((m)-->()) return m //Finding root nodes: match (m)-[r:TO]->(n:Label) where not (()-->(m)) return m //Finding triangles: match (a)-[:TO]->(b)-[:TO]->(c)-[:TO]->(a) return distinct a, b, c //Finding 2nd neighbors of D: match (a)-[:TO*..2]-(b) where a.Name='D' return distinct a, b //Finding the types of a node: match (n) where n.Name = ‘Egypt' return labels(n) //Finding the label of an edge: match (n {Name: ‘Egypt'})<-[r]-() return distinct type(r) //Finding all properties of a node: match (n:Actor) return * limit 20 //Finding loops: match (n)-[r]->(n) return n, r limit 10 //Finding multigraphs: match (n)-[r1]->(m), (n)-[r2]-(m) where r1 <> r2 return n, r1, r2, m limit 10 //Finding the induced subgraph given a set of nodes: match (n)-[r:TO]-(m) where n.Name in ['A', 'B', 'C', 'D', 'E'] and m.Name in ['A', 'B', 'C', 'D', 'E'] return n, r, m Basic Queries
  • 75. | © Copyright 2015 Hitachi Consulting75 Applied Graph Analytics Neo4j Graph Database //Finding paths between specific nodes: match p=(a)-[:TO]-(c) where a.Name='H' and c.Name='P' return p limit 1 //Finding the length between specific nodes: match p=(a)-[:TO*]-(c) where a.Name='H' and c.Name='P' return length(p) limit 1 //Finding a shortest path between specific nodes: match p=shortestPath((a)-[:TO*]-(c)) where a.Name='A' and c.Name='P' return p, length(p) limit 1 //All Shortest Paths with Path Conditions: match p = allShortestPaths((source)-[r:TO*]->(destination)) where source.Name='A' and destination.Name = 'P' and length(nodes(p)) > 5 return extract(n in NODES(p)| n.Name) as Path, length(p) as PathLenght //Diameter of the graph: match (n:Label), (m:Label) where n <> m with n, m match p=shortestPath((n)-[*]->(m)) return n.Name, m.Name, length(p) order by length(p) desc limit 1 //Extracting and computing with node and properties: match p=(a)-[:TO*]-(c) where a.Name='H' and c.Name='P' return extract(n in nodes(p)|n.Name) as Path, length(p) as pathLength, reduce(s=0, e in relationships(p)| s + toInt(e.dist)) as pathDist limit 1 //Graph not containing a selected node: match (n)-[r:TO]->(m) where n.Name <> 'D' and m.Name <> 'D' return n, r, m match (d {Name:'D'})-[:TO]-(b)<-[:TO]-(root) where not((root)<--()) return (root) //Graph not containing a selected neighborhood: match (a {Name: 'F'})-[:TO*..2]-(b) with collect(distinct b.Name) as MyList match (n)-[r:TO]->(m) where not(n.Name in MyList) and not (m.Name in MyList) return distinct n, r, m Path Analysis
  • 76. | © Copyright 2015 Hitachi Consulting76 Applied Graph Analytics Neo4j Graph Database // Find the outdegree of all nodes match (n:Label)-[r]->() return n.Name as Node, count(r) as Outdegree order by Outdegree union match (a:Label)-[r]->(leaf) where not((leaf)-->()) return leaf.Name as Node, 0 as Outdegree // Find the indegree of all nodes match (n:Label)<-[r]-() return n.Name as Node, count(r) as Indegree order by Indegree union match (a:Label)<-[r]-(root) where not((root)<--()) return root.Name as Node, 0 as Indegree //Find the degree of all nodes match (n:Label)-[r]-() return n.Name, count(distinct r) as degree order by degree // Find degree histogram of the graph match (n:Label)-[r]-() with n as nodes, count(distinct r) as degree return degree, count(nodes) order by degree asc //Save the degree of the node as a new node property match (n:Label)-[r]-() with n, count(distinct r) as degree set n.deg = degree return n.Name, n.deg // Construct the Adjacency Matrix of the graph match (n:Label), (m:Label) return n.Name, m.Name, case when (n)-->(m) then 1 else 0 end as value Connectivity Analysis
  • 77. | © Copyright 2015 Hitachi Consulting77 Applied Graph Analytics Neo4j Graph Database - Example A B D EC
  • 78. | © Copyright 2015 Hitachi Consulting78 A B D EC Following Following Following Following Following FollowingFollowing Following Following Applied Graph Analytics Neo4j Graph Database - Example
  • 79. | © Copyright 2015 Hitachi Consulting79 A B D EC Following Following Following Following Following FollowingFollowing Following Following P2P1 P3 Posted Posted Posted Applied Graph Analytics Neo4j Graph Database - Example
  • 80. | © Copyright 2015 Hitachi Consulting80 A B D EC P2P1 P3 Following Following Following Following Following FollowingFollowing Following Following Likes Likes Likes Likes Likes Likes LikesPosted Posted Posted Applied Graph Analytics Neo4j Graph Database - Example
  • 81. | © Copyright 2015 Hitachi Consulting81 CREATE (a:User{name:"Khalid Salama", grade:"Manager"}), (b:User{name:"Paul Lineham", grade:"Senior Manager"}), (c:User{name:"Vaughn Rees", grade:"Senior Manager"}), (d:User{name:"Sutha Thiru", grade:"Director"}), (e:User{name:"Mark Hill", grade:"VP"}), (a)-[:Following{since:'2014'}]->(d), (a)-[:Following{since:'2014'}]->(b), (b)-[:Following{since:'2010'}]->(a), (d)-[:Following{since:'2011', strength:"high"}]->(e), (e)-[:Following{since:'2014'}]->(d), (e)-[:Following{since:'2015'}]->(c), (c)-[:Following]->(d), (c)-[:Following{since:'2013', strength:"low"}]->(a), (b)-[:Following]->(c), (p1:Post{title:"post 1", lastupdate:"01/01/2016", tags:['sports','life style']}), (p2:Post{title:"post 2", lastupdate:"03/05/2015"}), (p3:Post{title:"post 3", lastupdate:"121/7/2015", tags:['economics','politcs']}), (a)-[:Posted]->(p1), (d)-[:Posted]->(p2), (c)-[:Posted]->(p3), (b)-[:Liked]->(p1), (c)-[:Liked]->(p1), (a)-[:Liked]->(p2), (b)-[:Liked]->(p2), (e)-[:Liked]->(p2), (a)-[:Liked]->(p3), (e)-[:Liked]->(p3) Applied Graph Analytics Neo4j Graph Database - Example
  • 82. | © Copyright 2015 Hitachi Consulting82 //fetch one node MATCH (u:User{name:"Khalid Salama"}) RETURN u // fetch an attribute of a node MATCH (u:User{name:"Khalid Salama"}) RETURN u.grade // fetch nodes by conditions MATCH (u:User{grade:"Senior Manager"}) RETURN u -- MATCH (u:User) WHERE u.grade = 'Senior Manager' RETURN u -- MATCH (u:User) WHERE u.name =~ "Sutha.+" // START WITH, END WITH, CONTAIN, IN [,], RETURN u -- MATCH ()-[r:Posted]->(p:Post) WHERE 'sports' IN p.tags RETURN p // Whom khalid is following? MATCH (x:User{name:"Khalid Salama"})-[r:Following]->(y:User) RETURN x,r,y // Who is Following Khalid MATCH (x:User{name:"Khalid Salama"})<-[r:Following]-(y:User) RETURN x,r,y // Update MERGE (u:User { name:"Khalid Salama" }) SET u.practice = "Data Insights & Analytics" RETURN u // Get Count of Posts MATCH (p:Post) RETURN COUNT(p) -- Get User Count By Grade MATCH (u:User) RETURN u.grade, COUNT(u) -- Get User and Followers MATCH (u:User)<-[:Following]-(f:User) RETURN u.name AS User,COLLECT(f.name) AS followrs,COUNT(f) AS Total // Constraint CREATE CONSTRAINT ON (u:User) ASSERT u.name IS UNIQUE -- Index CREATE INDEX ON :User(grade) // Get users following each other MATCH (u1:User)-[:Following]->(u2:User)-[:Following]->(u1) RETURN u1.name,u2.name // Get Users likes a post posted by a follower MATCH (u:User)-[:Liked]->(p:Post)<-[:Posted]-(u2:User)-[:Following]->(u) RETURN u,p,u2 // Get Following of Following MATCH (u:User)-[:Following]->()-[:Following]->(u2:User) Return u.name,COLLECT(DISTINCT u2.name) // Get User with max 3 steps from Paul MATCH (u:User)-[:Following*..3]->(us:User{name:"Paul Lineham"}) Return u // Shortest path MATCH (u1:User{name:"Mark Hill"}), (u2;User{name:"Paul Lineham"}), p=SHORTESTPATH((u1)-[:Following*..10]->(u2)) RETURN p -- Get nodes having a property MATCH(p) WHERE EXSITS(p.tags) http://neo4j.com/docs/developer-manual/current/#cypher-query-lang Applied Graph Analytics Neo4j Graph Database - Example
  • 83. | © Copyright 2015 Hitachi Consulting83 Useful Resources  Coursera – Graph Analytics for Big Data https://www.coursera.org/learn/big-data-graph-analytics/home/welcome  Coursera – Data Manipulation at Scale (Lessons 21-24) https://www.coursera.org/learn/data-manipulation/home/week/4  Neo4j – Getting Started Tutorials https://neo4j.com/developer/get-started  Apache Spark – GraphX Documentation http://spark.apache.org/docs/latest/graphx-programming-guide.html
  • 84. | © Copyright 2015 Hitachi Consulting84 My Background Applying Computational Intelligence in Data Mining • Honorary Research Fellow, School of Computing , University of Kent. • Ph.D. Computer Science, University of Kent, Canterbury, UK. • M.Sc. Computer Science , The American University in Cairo, Egypt. • 25+ published journal and conference papers, focusing on: – classification rules induction, – decision trees construction, – Bayesian classification modelling, – data reduction, – instance-based learning, – evolving neural networks, and – data clustering • Journals: Swarm Intelligence, Swarm & Evolutionary Computation, , Applied Soft Computing, and Memetic Computing. • Conferences: ANTS, IEEE CEC, IEEE SIS, EvoBio, ECTA, IEEE WCCI and INNS-BigData. ResearchGate.org
  • 85. | © Copyright 2015 Hitachi Consulting85 Thank you!