1. Large Graph Analysis In
The GMine System
By Saurabh Jogalekar
TE C 51
Seminar Guide: Prof. S V Jagtap
2. Large Graph
A large graph is a graph with hundreds of thousands of nodes and a million edges
Our friend list, recommendations, likes, comments in case of social networks is the
best example of Large Graphs
Other examples of large graphs include web graphs i.e. web pages pointing to each
other through hyperlinks, bipartite graphs and computer communication graphs in
which IP addresses send packets to other IP addresses.
3. Representing Graphs
The three techniques traditionally used for graph representation are
•
•
•
1. Adjacency matrix
2. Adjacency list
3. Binary Decision Diagrams
4. Representing Large Graphs
•
•
Representation of large graphs is a
challenging task in the way that the
overall visibility of the graph is reduced
due to huge amounts of nodes and
edges.
Thus the traditional methods for
representation fail
Example of a large graph
5. Large Graph Representation
•
•
Another problem with representing large graphs is that to acquire or mine
the required nodes and edges, several complex calculations are required
To overcome such hindrances in graph representation, a graph
summarization method called CEPS (CEntre Piece Subgraph) is utilized
6. GRAPH-TREE
•
•
The CEPS is utilized from Graph-tree, which is hierarchical representation of
graph containing SuperGraph, SuperNodes and SuperEdges
The graph-tree is formed as shown in the figure
7.
8. FILLING A GRAPH-TREE
Algorithm FillGraphTree(ptr)
•
•
If ptr is leaf then set ptr -> fliepath to the file of
corr. Subgraph
Else for each child of ptr do:
•
•
•
•
FillGraphTree(child)
Instantiate a SuperEdge for each pair of children,
find matches between unresolved edges from each
pair and store them in superEdges
Use external edges to determine ptr’s open nodes
Propagate unresolved external edges to the parent
9. SuperNodes and GraphNodes connectivity
•
•
•
SuperNodes connectivity for two SuperNodes is the set of edges, where each of the
source belongs to coverage of first SuperNode and target belongs to the coverage
of second SuperNode
Graph Node connectivity is the set of edges connecting the graph node to other
graph nodes which are not a part of coverage of the SuperNode which includes the
Graph Node
Both of the connectivity are useful in constructing the graph from its hierarchical
representation
10. Motivation behind CEPS
•
•
Using a Graph-tree and hierarchical representation of a SuperGraph lessens
the problem of inspecting large graphs
However, the information retrieved from reaching the sub-graph is
sometimes much greater than required information. To overcome this
lacuna, CEPS is utilized
11. CEPS
•
•
•
. A centre-piece subgraph contains the collection of paths connecting a subset
of graph nodes of interest
CEPS helps interaction by significantly reducing the number of edges and of
nodes to be inspected
CEPS uses a Random Walk Restart method to fine the ‘importance’ score
between 2 nodes
12. GOODNESS SCORE
•
Goodness score is calculated by a method
Random Walk Restart. A matrix A(i, j) is
defined which stores the steady state
probabilities for each node ‘j’ with respect to
the query ‘i’.
0.0088
5
0.0333
0.0024
0.0076
11
12
4
0.1260
0.0024
10
0.0283
13
3
0.1235
2
1
0.5767
0.0076
6
0.1260
0.0333
9
8
7
0.0088
Individual Score Matrix
Q1
Node 1
Node 2
Node 3
Node 4
Node 5
Node 6
Node 7
Node 8
Node 9
Node 10
Node 11
Node 12
Node 13
0.5767
0.1235
0.0283
0.0076
0.0088
0.0076
0.0088
0.0333
0.1260
0.1260
0.0333
0.0024
0.0024
Q2
0.0088
0.0076
0.0283
0.1235
0.5767
0.0076
0.0088
0.0024
0.0024
0.0333
0.1260
0.1260
0.0333
Q3
0.0088
0.0076
0.0283
0.0076
0.0088
0.1235
0.5767
0.1260
0.0333
0.0024
0.0024
0.0333
0.1260
13. EXTRACT ALGORITHM
•
•
•
•
•
The “EXTRACT” algorithm takes as input the weighted graph W, the importance scores on all
nodes, the budget b; and produces as output a small, unweighted, undirected graph H.
It is performed using dynamic programming or greedy method
1. Initialize output graph H be null
2. Let len be the maximum allowable path length
3. While H is not big enough
•
•
•
3.1. Pick up destination node pd
3.2. For each active source node qi wrt node pd
•
•
3.2.1. discover a key path P(qi, pd)
3.2.2. add P(qi, pd) to H
4. Output the final H
14. GMINE SYSTEM
•
•
•
•
GMine is a graph visualisation tool, used for handling large graphs.
The tool makes use of Graph-Trees to offer good and readable graph
exploration
As the user interacts with the visualization, the system keeps track of the
connectivity among communities of nodes at different levels of the
partitioned graph.
When the user changes the focus position on the tree structure, the system
works on demand to calculate and present contextual information.
16. REFERENCES
•
•
•
•
•
Jose F. Rodrigues Jr, Hanghang Tong, Jia-Yu Pan, Agma J.M. Traina, Caetano Traina Jr. and
Christos Faloutsos, “Large Graph Analysis in the GMine System”, IEEE transactions on
knowledge and data engineering, vol. 25, no. 1, January 2013
Christos Falustos, Jose F. Rodrigues Jr, HanghangTong, Agma J.M. Traina, “GMine: A system
for scalable, interactive, graph visualization and mining” In IEEE/ACM International
Conference, pages 1195–1198, Oconomowoc, Wisconsin, USA.
Hanghang Tong, Christos Falustos, Center Piece Subgraphs: Problem definition and fast
solutions”, Carnegie-Mellon University, Research Track Paper, page 404-414
www.cmu.edu (Carnegie-Mellon University Site )
Jose F. Rodrigues Jr, Agma J.M. Traina, Caetano Traina Jr. Caio, Cesar Moreli , “GMine:
Interactive browsing of large graphs”, Workshop On Information Visualization and Analysis In
Social Networks – WIVA 2008