Graph Convolutional Networks (GCN) have found multiple applications of graph-based machine learning. However, training GCNs on large graphs of billions of nodes and edges with rich node attributes consume significant amount of time and memory resources. This makes it impossible to train such GCNs on general purpose commodity hardware. Such use cases demand high-end servers with accelerators and ample amounts of memory. In this paper we implement a memory efficient GCN based link prediction on top of a distributed graph database server called JasmineGraph. Our approach is based on federated training on partitioned graphs with multiple parallel workers. We conduct experiments with three real world graph datasets called DBLP-V11, Reddit, and Twitter. We demonstrate that our approach produces optimal performance for a given hardware setting. JasmineGraph was able to train a GCN on the largest dataset DBLP-V11(>10GB) in 20 hours and 24 minutes for 5 training rounds and 3 epochs by partitioning it into 16 partitions with 2 workers on a single server while the conventional training method could not process it at all due to lack of memory. The second largest dataset Reddit took 9 hours 8 minutes to train with conventional training while JasmineGraph took only 3 hours and 11 minutes with 8 partitions-4 workers in the same hardware giving 3 times improved performance. In case of Twitter dataset JasmineGraph was able to give 5 times improved performance. (10 hours 31 minutes vs 2 hours 6 minutes;16 partitions-16 workers).
Memory Efficient Graph Convolutional Network based Distributed Link Prediction
1. Memory Efficient Graph
Convolutional Network based
Distributed Link Prediction
Damitha Senevirathne, Isuru Wijesiri,
Suchitha Dehigaspitiya, Miyuru
Dayarathna, Sanath Jayasena,
Toyotaro Suzumura
2020 IEEE International Conference on Big Data, Seventh
International Workshop on High Performance Big Graph Data
Management, Analysis, and Mining
University of Moratuwa, Sri Lanka
WSO2, Inc. USA
IBM T.J. Watson Research Center, USA
MIT-IBM Watson AI Lab, USA
Barcelona Supercomputing Center, Spain
6. Mining on Graphs
● Traditional graph mining focused on using graph properties only
(eg. PageRank, triangle count, degree distribution)
● Graph machine learning expands the horizons of mining on graph
data
6
7. Graph-based Machine Learning
● Network embedding a key part of graph-based machine learning
● Unsupervised learning of features generalizes the input for
downstream machine learning tasks
● Early approaches such as node2vec based on graph walks
● But cannot incorporate feature data
7
8. Graph Convolutional Networks (GCNs)
● Learning of embeddings using both features as well as graph
structure
● Offer significantly better results in downstream machine learning
tasks such as node classification, link prediction, graph clustering
etc.
● Uses the idea of aggregating neighbourhood information to
incorporate structure into embeddings
8
11. Problem and Context
● Graph data is useful for many applications and offer much more
contextual information for machine learning tasks
● Graphs becoming too large in memory to handle in standard model
training approaches and impossible to train on commodity
hardware
○ Millions of nodes and edges
○ Large amounts of node features
● How to conduct efficient model training on large graphs?
11
12. ● We propose a mechanism that partitions graphs and conducts
distributed training on the partitions while ensuring memory
efficiency by using an appropriate scheduling algorithm.
● We provide a mechanism to train any graph machine learning
model aimed at any task such as node embedding, node
classification, link prediction
● We evaluate the above mechanism by implementing a GCN based
link prediction application for several graph based use cases
Proposed Solution and Contributions
12
13. ● Develop a generic graph machine learning mechanism on top of the
distributed graph database system - JasmineGraph1
○ Ensure good model performance as well as training time reduction
○ Ensure memory is utilized fully while eliminating overflow using
scheduling
Objectives
1. M. Dayarathna (2018), miyurud/jasminegraph, GitHub. [Online]. Available:
https://github.com/miyurud/jasminegraph
13
15. Related Work
No Related Work Relatedness Limitation
1 DeepWalk [25]
and Node2Vec
[10]
Early node embedding
methods
● Uses only graph walks to capture node
neighborhood information
● Does not utilize node features
2 GCN [4] Node embedding
adapting the conv.
theory to graphs
● Learn a function to generate node
embeddings by aggregating target node’s
and neighborhood features
15
[10] Aditya Grover and Jure Leskovec. 2016. Node2Vec: Scalable Feature Learning for Networks. In Proceedings of the 22Nd ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining (KDD ’16). ACM, New York, NY, USA, 855–864. https://doi.org/10.1145/2939672.2939754
[25] B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, KDD ’14, pages 701–710, New York, NY, USA, 2014. ACM
[4] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun. Spectral networks and locally connected networks on graphs. In 2nd International Conference on Learning
Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014
16. Related Work ctd.
No Related Work Relatedness Limitation
3 PyTorch-BigGraph
(PBG) [16]
Distributed graph training
mechanism
● Random node partitioning
● Shared file system
● GCNs not utilized
4 Euler [1] Distributed graph learning
framework
● Train models developed in Tensorflow to be trained on
heterogeneous graphs
● But untested for large graphs like DBLP-V11
● Depends on HDFS based shared file system
5 JanusGraph [3],
Acacia [7] and Trinity
[23]
Distributed graph
databases
● Distributed processing of graphs
● But does not support graph machine learning
16
[1] Alibaba. 2019. Euler. URL: https://github.com/alibaba/euler.
[3] Apache Software Foundation. 2020. JanusGraph. URL: https://janusgraph.org/.
[7] M. Dayarathna and T. Suzumura. 2014. Towards Scalable Distributed Graph Database
Engine for Hybrid Clouds. In 2014 5th International Workshop on Data-Intensive Computing
in the Clouds. 1–8. https://doi.org/10.1109/DataCloud.2014.9
[16] Adam Lerer, Ledell Wu, Jiajun Shen, Timothée Lacroix, Luca Wehrstedt, Abhijit Bose, and
Alexander Peysakhovich. 2019. PyTorch-BigGraph: A Large-scale Graph Embedding System.
CoRR abs/1903.12287 (2019). arXiv:1903.12287 http://arxiv.org/abs/1903.12287
[23] Bin Shao, Haixun Wang, and Yatao Li. 2013. Trinity: A Distributed Graph Engine on a
Memory Cloud. In Proceedings of the 2013 ACM SIGMOD International Conference on
Management of Data (SIGMOD ’13). Association for Computing Machinery, New York, NY,
USA, 505–516. https://doi.org/10.1145/2463676.2467799
18. Overview of JasmineGraph
● Two main components;
Master and Worker
● Communication protocols
between Master-worker and
worker-worker have been
designed
● Graphs partitioned during
upload process using METIS
18
21. JasmineGraph Architecture
● Python workers (Client and
server) sitting alongside
standard C++ workers run ML
processes
● Model updates exchanged
directly between Python
workers
21
22. JasmineGraph Architecture ctd.
● Update sharing increases
model accuracy while
simultaneously increasing
communication overheads
● However, in the end results in
one simple graph ML model to
be used in desired downstream
tasks
22
23. Training Flow
● Training conducted on
partitions by distributed
workers/clients
● After every training
round, model updates
sent to server and
aggregated and sent
back to workers/clients
23
25. Partition Scheduling
● All graph partitions might not fit into
memory at once
● Decides which partitions are to train in
parallel in a given moment
● Ensures that memory overflow is
avoided
● Packs partitions into memory in a way
that the training time is optimized
● Uses best first fit approach
25
29. Training and Aggregation
● Assign global model weights to
client initialized models
● Sample graph for training
● Clients in parallel do training
based on schedule
● After training round, send weights
to aggregator
29
31. Datasets
GROUP 22 - SID
Data Set Vertices Edges
No of
features
Edge file
size(MB)
Feature file
size(MB)
Size when
training
(MB)
DBLP-V11a
4,107,340 36,624,464 948 508 9523 2.5
Redditb
232,965 11,606,919 602 145 270 3.84
Twitterc
81,306 1,768,149 1007 16 157
107.5
(Estimate)
31
● Original sources:
a. https://www.aminer.org/citation
b. http://snap.stanford.edu/graphsage/
c. https://snap.stanford.edu/data/ego-Twitter.html
● Our prepared versions available at https://github.com/limetreeestate/graph-datasets
32. Datasets ctd.
GROUP 22 - SID
32
Twitter: Suggest new users to
follow
● Nodes - Twitter Users
● Edges (Directed) - User follows
another
● Features - Twitter handles and
hashtags used in user node’s tweets
Reddit: Recommend content/posts
that user might find interesting
● Nodes - Reddit posts
● Edges - There are common users
between two posts
● Features - Extracted from textual
content of the post node
Link prediction predicts whether there will be links between two nodes based
on the attribute information and the observed existing link Information.
33. Datasets ctd.
GROUP 22 - SID
33
DBLP-V11: Suggest new papers that a researcher might find
useful/interesting
● Nodes - Research papers
● Edges (Directed) - One paper cites the other in its work
● Features - The field(s) of study that the paper node belongs to
34. Model
● Generate node
embeddings for nodes of
a potential links
● Generate link/edge
representation using
inner product
● Classify potential link
34
35. GROUP 22 - SID
Processor Intel®Xeon®CPU E7-4820 v3 @ 1.90GHz, 40 CPU cores (80 hardware
threads via hyperthreading),
Main memory 64GB RAM
Cache memory 32KB L1 (d/i) cache, 256K L2 cache, and 25600K L3 cache.
Storage 1.8TB hard disk drive
Operating System Ubuntu Linux version 16.04 with Linux kernel 4.4.0-148-generic.
35
Experiment Environment
36. Model Performance
GROUP 22 - SID
36
Dataset Accuracy Recall AUC F1 Precision
Twitter 0.7887 0.9869 0.9576 0.8350 0.7233
Reddit 0.7174 0.9026 0.8037 0.7616 0.6587
DBLP-V11 Cannot train in conventional setting, crashes
The following numbers reflect how a unpartitioned and trained link
prediction model performs on these datasets
37. Partition count Accuracy Recall AUC F1 Score Precision
1 (unpartitioned) 0.7887 0.9869 0.9576 0.835 0.7233
2 0.7047 0.9831 0.9292 0.77 0.6336
4 0.6395 0.973 0.8672 0.7306 0.5861
8 0.6537 0.9844 0.8977 0.7412 0.5962
16 0.5936 0.986 0.8441 0.7088 0.5538
37
Model performance (Twitter)
For following table, used client count is equal to the number of partitions
39. Partition count Accuracy Recall AUC F1 Score Precision
1 (unpartitioned) 0.7174 0.9026 0.8037 0.7616 0.6587
2 0.702 0.9559 0.8458 0.7625 0.6344
4 0.6836 0.9534 0.8201 0.751 0.6202
39
Model performance (Reddit)
Client count is equal to the number of partitions in the following
results
41. Number of clients Elapsed Time
(seconds)
1 (unpartitioned graph) 37908.31
2 19575.20
4 12922.13
41
Elapsed Training Times
Following table contains results related to twitter/Reddit dataset
using 16 partitions. Trained for 5 rounds with 3 epochs per round.
Number of clients Elapsed Time
(seconds)
1 (unpartitioned graph) 32883.68
2 22011.78
4 15019.63
Twitter dataset Reddit dataset
43. Implementation on Large Graphs (DBLP-V11)
● We were unable to train DBLP-V11 using conventional training
● But using proposed solution (with scheduling) we were able to train
DBLP-V11 using 16 partitions with 2 clients (20.5 hours)
● But due to a memory growth in the system we trained DBLP-V11
dataset by using two steps (3 training rounds and then 2 i.e 15
epochs total)
GROUP 22 - SID
43
Dataset Accuracy Recall AUC F1 Precision
DBLP-V11 0.56529 0.99584 0.88943 0.69677 0.53630
45. Conclusion
45
● Conventional training schemes cannot handle training Graph
Convolutional Networks (GCNs) on large graphs
● Distributed mechanism needed to train GCNs on large graphs
46. Conclusion ctd.
GROUP 22 - SID
46
● Can any graph machine learning model for any task
○ We evaluate on using an offline developed model for link prediction
● Reduced time taken for training by partitioning and scheduling
○ DBLP-v11 dataset (>10GB) trained for 15 epochs in 20 hours 24 minutes with
16 partitions and 2 workers where conventional training couldn’t process it at
all
○ Reddit trained in 3 hours 11 minutes (8 partitions, 4 workers). Conventional
took 9 hours 11 minutes
● Future work
○ Horizontal scaling experiments
○ Secure collaborative graph machine learning between organizations