SlideShare une entreprise Scribd logo
1  sur  47
Télécharger pour lire hors ligne
Memory Efficient Graph
Convolutional Network based
Distributed Link Prediction
Damitha Senevirathne, Isuru Wijesiri,
Suchitha Dehigaspitiya, Miyuru
Dayarathna, Sanath Jayasena,
Toyotaro Suzumura
2020 IEEE International Conference on Big Data, Seventh
International Workshop on High Performance Big Graph Data
Management, Analysis, and Mining
University of Moratuwa, Sri Lanka
WSO2, Inc. USA
IBM T.J. Watson Research Center, USA
MIT-IBM Watson AI Lab, USA
Barcelona Supercomputing Center, Spain
Introduction
2
Why graphs?
● Network/Graph data encompasses numerous real world scenarios
● Richer data structures compared to standard feature based
structures
3
Why graphs ctd.
Knowledge graphsSocial graphs
4
Why graphs ctd.
Protein-protein interaction graphsPatient interaction networks
5
Mining on Graphs
● Traditional graph mining focused on using graph properties only
(eg. PageRank, triangle count, degree distribution)
● Graph machine learning expands the horizons of mining on graph
data
6
Graph-based Machine Learning
● Network embedding a key part of graph-based machine learning
● Unsupervised learning of features generalizes the input for
downstream machine learning tasks
● Early approaches such as node2vec based on graph walks
● But cannot incorporate feature data
7
Graph Convolutional Networks (GCNs)
● Learning of embeddings using both features as well as graph
structure
● Offer significantly better results in downstream machine learning
tasks such as node classification, link prediction, graph clustering
etc.
● Uses the idea of aggregating neighbourhood information to
incorporate structure into embeddings
8
Presentation Outline
● Introduction
● Research Problem
● Proposed Solution
● Related Work
● Methodology
● Evaluation
● Conclusion
9
Research Problem
10
Problem and Context
● Graph data is useful for many applications and offer much more
contextual information for machine learning tasks
● Graphs becoming too large in memory to handle in standard model
training approaches and impossible to train on commodity
hardware
○ Millions of nodes and edges
○ Large amounts of node features
● How to conduct efficient model training on large graphs?
11
● We propose a mechanism that partitions graphs and conducts
distributed training on the partitions while ensuring memory
efficiency by using an appropriate scheduling algorithm.
● We provide a mechanism to train any graph machine learning
model aimed at any task such as node embedding, node
classification, link prediction
● We evaluate the above mechanism by implementing a GCN based
link prediction application for several graph based use cases
Proposed Solution and Contributions
12
● Develop a generic graph machine learning mechanism on top of the
distributed graph database system - JasmineGraph1
○ Ensure good model performance as well as training time reduction
○ Ensure memory is utilized fully while eliminating overflow using
scheduling
Objectives
1. M. Dayarathna (2018), miyurud/jasminegraph, GitHub. [Online]. Available:
https://github.com/miyurud/jasminegraph
13
Related Work
14
Related Work
No Related Work Relatedness Limitation
1 DeepWalk [25]
and Node2Vec
[10]
Early node embedding
methods
● Uses only graph walks to capture node
neighborhood information
● Does not utilize node features
2 GCN [4] Node embedding
adapting the conv.
theory to graphs
● Learn a function to generate node
embeddings by aggregating target node’s
and neighborhood features
15
[10] Aditya Grover and Jure Leskovec. 2016. Node2Vec: Scalable Feature Learning for Networks. In Proceedings of the 22Nd ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining (KDD ’16). ACM, New York, NY, USA, 855–864. https://doi.org/10.1145/2939672.2939754
[25] B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, KDD ’14, pages 701–710, New York, NY, USA, 2014. ACM
[4] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun. Spectral networks and locally connected networks on graphs. In 2nd International Conference on Learning
Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014
Related Work ctd.
No Related Work Relatedness Limitation
3 PyTorch-BigGraph
(PBG) [16]
Distributed graph training
mechanism
● Random node partitioning
● Shared file system
● GCNs not utilized
4 Euler [1] Distributed graph learning
framework
● Train models developed in Tensorflow to be trained on
heterogeneous graphs
● But untested for large graphs like DBLP-V11
● Depends on HDFS based shared file system
5 JanusGraph [3],
Acacia [7] and Trinity
[23]
Distributed graph
databases
● Distributed processing of graphs
● But does not support graph machine learning
16
[1] Alibaba. 2019. Euler. URL: https://github.com/alibaba/euler.
[3] Apache Software Foundation. 2020. JanusGraph. URL: https://janusgraph.org/.
[7] M. Dayarathna and T. Suzumura. 2014. Towards Scalable Distributed Graph Database
Engine for Hybrid Clouds. In 2014 5th International Workshop on Data-Intensive Computing
in the Clouds. 1–8. https://doi.org/10.1109/DataCloud.2014.9
[16] Adam Lerer, Ledell Wu, Jiajun Shen, Timothée Lacroix, Luca Wehrstedt, Abhijit Bose, and
Alexander Peysakhovich. 2019. PyTorch-BigGraph: A Large-scale Graph Embedding System.
CoRR abs/1903.12287 (2019). arXiv:1903.12287 http://arxiv.org/abs/1903.12287
[23] Bin Shao, Haixun Wang, and Yatao Li. 2013. Trinity: A Distributed Graph Engine on a
Memory Cloud. In Proceedings of the 2013 ACM SIGMOD International Conference on
Management of Data (SIGMOD ’13). Association for Computing Machinery, New York, NY,
USA, 505–516. https://doi.org/10.1145/2463676.2467799
Methodology
17
Overview of JasmineGraph
● Two main components;
Master and Worker
● Communication protocols
between Master-worker and
worker-worker have been
designed
● Graphs partitioned during
upload process using METIS
18
Graph partitioning (METIS) and Reconstruction
GROUP 22 - SID
19
Horizontal and Vertical scaling
GROUP 22 - SID
20
JasmineGraph Architecture
● Python workers (Client and
server) sitting alongside
standard C++ workers run ML
processes
● Model updates exchanged
directly between Python
workers
21
JasmineGraph Architecture ctd.
● Update sharing increases
model accuracy while
simultaneously increasing
communication overheads
● However, in the end results in
one simple graph ML model to
be used in desired downstream
tasks
22
Training Flow
● Training conducted on
partitions by distributed
workers/clients
● After every training
round, model updates
sent to server and
aggregated and sent
back to workers/clients
23
Memory estimation
● Estimate partition size in
memory based on
number of nodes, edges
and attributes
24
Partition Scheduling
● All graph partitions might not fit into
memory at once
● Decides which partitions are to train in
parallel in a given moment
● Ensures that memory overflow is
avoided
● Packs partitions into memory in a way
that the training time is optimized
● Uses best first fit approach
25
Partition Scheduling ctd.
26
Partition Scheduling ctd.
27
Partition Scheduling ctd.
28
Training and Aggregation
● Assign global model weights to
client initialized models
● Sample graph for training
● Clients in parallel do training
based on schedule
● After training round, send weights
to aggregator
29
Evaluation
30
Datasets
GROUP 22 - SID
Data Set Vertices Edges
No of
features
Edge file
size(MB)
Feature file
size(MB)
Size when
training
(MB)
DBLP-V11a
4,107,340 36,624,464 948 508 9523 2.5
Redditb
232,965 11,606,919 602 145 270 3.84
Twitterc
81,306 1,768,149 1007 16 157
107.5
(Estimate)
31
● Original sources:
a. https://www.aminer.org/citation
b. http://snap.stanford.edu/graphsage/
c. https://snap.stanford.edu/data/ego-Twitter.html
● Our prepared versions available at https://github.com/limetreeestate/graph-datasets
Datasets ctd.
GROUP 22 - SID
32
Twitter: Suggest new users to
follow
● Nodes - Twitter Users
● Edges (Directed) - User follows
another
● Features - Twitter handles and
hashtags used in user node’s tweets
Reddit: Recommend content/posts
that user might find interesting
● Nodes - Reddit posts
● Edges - There are common users
between two posts
● Features - Extracted from textual
content of the post node
Link prediction predicts whether there will be links between two nodes based
on the attribute information and the observed existing link Information.
Datasets ctd.
GROUP 22 - SID
33
DBLP-V11: Suggest new papers that a researcher might find
useful/interesting
● Nodes - Research papers
● Edges (Directed) - One paper cites the other in its work
● Features - The field(s) of study that the paper node belongs to
Model
● Generate node
embeddings for nodes of
a potential links
● Generate link/edge
representation using
inner product
● Classify potential link
34
GROUP 22 - SID
Processor Intel®Xeon®CPU E7-4820 v3 @ 1.90GHz, 40 CPU cores (80 hardware
threads via hyperthreading),
Main memory 64GB RAM
Cache memory 32KB L1 (d/i) cache, 256K L2 cache, and 25600K L3 cache.
Storage 1.8TB hard disk drive
Operating System Ubuntu Linux version 16.04 with Linux kernel 4.4.0-148-generic.
35
Experiment Environment
Model Performance
GROUP 22 - SID
36
Dataset Accuracy Recall AUC F1 Precision
Twitter 0.7887 0.9869 0.9576 0.8350 0.7233
Reddit 0.7174 0.9026 0.8037 0.7616 0.6587
DBLP-V11 Cannot train in conventional setting, crashes
The following numbers reflect how a unpartitioned and trained link
prediction model performs on these datasets
Partition count Accuracy Recall AUC F1 Score Precision
1 (unpartitioned) 0.7887 0.9869 0.9576 0.835 0.7233
2 0.7047 0.9831 0.9292 0.77 0.6336
4 0.6395 0.973 0.8672 0.7306 0.5861
8 0.6537 0.9844 0.8977 0.7412 0.5962
16 0.5936 0.986 0.8441 0.7088 0.5538
37
Model performance (Twitter)
For following table, used client count is equal to the number of partitions
Model performance (Twitter) ctd.
GROUP 22 - SID
38
Partition count Accuracy Recall AUC F1 Score Precision
1 (unpartitioned) 0.7174 0.9026 0.8037 0.7616 0.6587
2 0.702 0.9559 0.8458 0.7625 0.6344
4 0.6836 0.9534 0.8201 0.751 0.6202
39
Model performance (Reddit)
Client count is equal to the number of partitions in the following
results
Model performance (Reddit) ctd.
GROUP 22 - SID
40
Number of clients Elapsed Time
(seconds)
1 (unpartitioned graph) 37908.31
2 19575.20
4 12922.13
41
Elapsed Training Times
Following table contains results related to twitter/Reddit dataset
using 16 partitions. Trained for 5 rounds with 3 epochs per round.
Number of clients Elapsed Time
(seconds)
1 (unpartitioned graph) 32883.68
2 22011.78
4 15019.63
Twitter dataset Reddit dataset
Elapsed Training Times ctd.
GROUP 22 - SID
42
Implementation on Large Graphs (DBLP-V11)
● We were unable to train DBLP-V11 using conventional training
● But using proposed solution (with scheduling) we were able to train
DBLP-V11 using 16 partitions with 2 clients (20.5 hours)
● But due to a memory growth in the system we trained DBLP-V11
dataset by using two steps (3 training rounds and then 2 i.e 15
epochs total)
GROUP 22 - SID
43
Dataset Accuracy Recall AUC F1 Precision
DBLP-V11 0.56529 0.99584 0.88943 0.69677 0.53630
Conclusion
44
Conclusion
45
● Conventional training schemes cannot handle training Graph
Convolutional Networks (GCNs) on large graphs
● Distributed mechanism needed to train GCNs on large graphs
Conclusion ctd.
GROUP 22 - SID
46
● Can any graph machine learning model for any task
○ We evaluate on using an offline developed model for link prediction
● Reduced time taken for training by partitioning and scheduling
○ DBLP-v11 dataset (>10GB) trained for 15 epochs in 20 hours 24 minutes with
16 partitions and 2 workers where conventional training couldn’t process it at
all
○ Reddit trained in 3 hours 11 minutes (8 partitions, 4 workers). Conventional
took 9 hours 11 minutes
● Future work
○ Horizontal scaling experiments
○ Secure collaborative graph machine learning between organizations
GROUP 22 - SID
THANK YOU
47

Contenu connexe

Tendances

IoT applications and use cases part-2
IoT applications and use cases part-2IoT applications and use cases part-2
IoT applications and use cases part-2Divya Tiwari
 
GAN in medical imaging
GAN in medical imagingGAN in medical imaging
GAN in medical imagingCheng-Bin Jin
 
Deep Learning for Time Series Data
Deep Learning for Time Series DataDeep Learning for Time Series Data
Deep Learning for Time Series DataArun Kejariwal
 
Generative Adversarial Networks and Their Medical Imaging Applications
Generative Adversarial Networks and Their Medical Imaging ApplicationsGenerative Adversarial Networks and Their Medical Imaging Applications
Generative Adversarial Networks and Their Medical Imaging ApplicationsKyuhwan Jung
 
Graph neural networks overview
Graph neural networks overviewGraph neural networks overview
Graph neural networks overviewRodion Kiryukhin
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex GraphseXascale Infolab
 
Prospects of IOT in Agriculture of bangladesh
Prospects of IOT in Agriculture of bangladeshProspects of IOT in Agriculture of bangladesh
Prospects of IOT in Agriculture of bangladeshYashna Islam
 
Spectral clustering
Spectral clusteringSpectral clustering
Spectral clusteringSOYEON KIM
 
Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Manohar Mukku
 
Generative AI to Accelerate Discovery of Materials
Generative AI to Accelerate Discovery of MaterialsGenerative AI to Accelerate Discovery of Materials
Generative AI to Accelerate Discovery of MaterialsDeakin University
 
Deformable DETR Review [CDM]
Deformable DETR Review [CDM]Deformable DETR Review [CDM]
Deformable DETR Review [CDM]Dongmin Choi
 
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn..."Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...Edge AI and Vision Alliance
 
Evolution of the StyleGAN family
Evolution of the StyleGAN familyEvolution of the StyleGAN family
Evolution of the StyleGAN familyVitaly Bondar
 
IOT Unit-1 (Introduction to IOT) by Durgacharan
IOT Unit-1 (Introduction to IOT) by DurgacharanIOT Unit-1 (Introduction to IOT) by Durgacharan
IOT Unit-1 (Introduction to IOT) by DurgacharanDurgacharan Kondabathula
 
Anomaly Detection At The Edge
Anomaly Detection At The EdgeAnomaly Detection At The Edge
Anomaly Detection At The EdgeArun Kejariwal
 
Basic Generative Adversarial Networks
Basic Generative Adversarial NetworksBasic Generative Adversarial Networks
Basic Generative Adversarial NetworksDong Heon Cho
 
Conditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN DecodersConditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN Decoderssuga93
 
Federated learning
Federated learningFederated learning
Federated learningMindos Cheng
 

Tendances (20)

IoT applications and use cases part-2
IoT applications and use cases part-2IoT applications and use cases part-2
IoT applications and use cases part-2
 
GAN in medical imaging
GAN in medical imagingGAN in medical imaging
GAN in medical imaging
 
Meta learning tutorial
Meta learning tutorialMeta learning tutorial
Meta learning tutorial
 
Deep Learning for Time Series Data
Deep Learning for Time Series DataDeep Learning for Time Series Data
Deep Learning for Time Series Data
 
Generative Adversarial Networks and Their Medical Imaging Applications
Generative Adversarial Networks and Their Medical Imaging ApplicationsGenerative Adversarial Networks and Their Medical Imaging Applications
Generative Adversarial Networks and Their Medical Imaging Applications
 
PhD Defense
PhD DefensePhD Defense
PhD Defense
 
Graph neural networks overview
Graph neural networks overviewGraph neural networks overview
Graph neural networks overview
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex Graphs
 
Prospects of IOT in Agriculture of bangladesh
Prospects of IOT in Agriculture of bangladeshProspects of IOT in Agriculture of bangladesh
Prospects of IOT in Agriculture of bangladesh
 
Spectral clustering
Spectral clusteringSpectral clustering
Spectral clustering
 
Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)
 
Generative AI to Accelerate Discovery of Materials
Generative AI to Accelerate Discovery of MaterialsGenerative AI to Accelerate Discovery of Materials
Generative AI to Accelerate Discovery of Materials
 
Deformable DETR Review [CDM]
Deformable DETR Review [CDM]Deformable DETR Review [CDM]
Deformable DETR Review [CDM]
 
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn..."Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
 
Evolution of the StyleGAN family
Evolution of the StyleGAN familyEvolution of the StyleGAN family
Evolution of the StyleGAN family
 
IOT Unit-1 (Introduction to IOT) by Durgacharan
IOT Unit-1 (Introduction to IOT) by DurgacharanIOT Unit-1 (Introduction to IOT) by Durgacharan
IOT Unit-1 (Introduction to IOT) by Durgacharan
 
Anomaly Detection At The Edge
Anomaly Detection At The EdgeAnomaly Detection At The Edge
Anomaly Detection At The Edge
 
Basic Generative Adversarial Networks
Basic Generative Adversarial NetworksBasic Generative Adversarial Networks
Basic Generative Adversarial Networks
 
Conditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN DecodersConditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN Decoders
 
Federated learning
Federated learningFederated learning
Federated learning
 

Similaire à Memory Efficient Graph Convolutional Network based Distributed Link Prediction

Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...miyurud
 
BISSA: Empowering Web gadget Communication with Tuple Spaces
BISSA: Empowering Web gadget Communication with Tuple SpacesBISSA: Empowering Web gadget Communication with Tuple Spaces
BISSA: Empowering Web gadget Communication with Tuple SpacesSrinath Perera
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataeSAT Publishing House
 
Performance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and MindsporePerformance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and Mindsporeijdms
 
Programming Modes and Performance of Raspberry-Pi Clusters
Programming Modes and Performance of Raspberry-Pi ClustersProgramming Modes and Performance of Raspberry-Pi Clusters
Programming Modes and Performance of Raspberry-Pi ClustersAM Publications
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
GaruaGeo: Global Scale Data Aggregation in Hybrid Edge and Cloud Computing En...
GaruaGeo: Global Scale Data Aggregation in Hybrid Edge and Cloud Computing En...GaruaGeo: Global Scale Data Aggregation in Hybrid Edge and Cloud Computing En...
GaruaGeo: Global Scale Data Aggregation in Hybrid Edge and Cloud Computing En...Otávio Carvalho
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Ahsan Javed Awan
 
Performance Optimization of Clustering On GPU
 Performance Optimization of Clustering On GPU Performance Optimization of Clustering On GPU
Performance Optimization of Clustering On GPUijsrd.com
 
Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Larry Smarr
 
OpenACC and Hackathons Monthly Highlights: April 2023
OpenACC and Hackathons Monthly Highlights: April  2023OpenACC and Hackathons Monthly Highlights: April  2023
OpenACC and Hackathons Monthly Highlights: April 2023OpenACC
 
gridcomputing (506).PPTX
gridcomputing (506).PPTXgridcomputing (506).PPTX
gridcomputing (506).PPTXRoshini5096
 
DLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningDLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningBrodmann17
 
Web Traffic Time Series Forecasting
Web Traffic  Time Series ForecastingWeb Traffic  Time Series Forecasting
Web Traffic Time Series ForecastingBillTubbs
 
The Future is Big Graphs: A Community View on Graph Processing Systems
The Future is Big Graphs: A Community View on Graph Processing SystemsThe Future is Big Graphs: A Community View on Graph Processing Systems
The Future is Big Graphs: A Community View on Graph Processing SystemsNeo4j
 
Improving Resource Utilization in Data Centers using an LSTM-based Prediction...
Improving Resource Utilization in Data Centers using an LSTM-based Prediction...Improving Resource Utilization in Data Centers using an LSTM-based Prediction...
Improving Resource Utilization in Data Centers using an LSTM-based Prediction...Kundjanasith Thonglek
 

Similaire à Memory Efficient Graph Convolutional Network based Distributed Link Prediction (20)

Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
 
BISSA: Empowering Web gadget Communication with Tuple Spaces
BISSA: Empowering Web gadget Communication with Tuple SpacesBISSA: Empowering Web gadget Communication with Tuple Spaces
BISSA: Empowering Web gadget Communication with Tuple Spaces
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big data
 
GRID COMPUTING
GRID COMPUTINGGRID COMPUTING
GRID COMPUTING
 
Performance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and MindsporePerformance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and Mindspore
 
Programming Modes and Performance of Raspberry-Pi Clusters
Programming Modes and Performance of Raspberry-Pi ClustersProgramming Modes and Performance of Raspberry-Pi Clusters
Programming Modes and Performance of Raspberry-Pi Clusters
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
GaruaGeo: Global Scale Data Aggregation in Hybrid Edge and Cloud Computing En...
GaruaGeo: Global Scale Data Aggregation in Hybrid Edge and Cloud Computing En...GaruaGeo: Global Scale Data Aggregation in Hybrid Edge and Cloud Computing En...
GaruaGeo: Global Scale Data Aggregation in Hybrid Edge and Cloud Computing En...
 
Poster
PosterPoster
Poster
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...
 
Performance Optimization of Clustering On GPU
 Performance Optimization of Clustering On GPU Performance Optimization of Clustering On GPU
Performance Optimization of Clustering On GPU
 
Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​
 
OpenACC and Hackathons Monthly Highlights: April 2023
OpenACC and Hackathons Monthly Highlights: April  2023OpenACC and Hackathons Monthly Highlights: April  2023
OpenACC and Hackathons Monthly Highlights: April 2023
 
gridcomputing (506).PPTX
gridcomputing (506).PPTXgridcomputing (506).PPTX
gridcomputing (506).PPTX
 
IEEE_SMC_2011
IEEE_SMC_2011IEEE_SMC_2011
IEEE_SMC_2011
 
DLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningDLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep Learning
 
Web Traffic Time Series Forecasting
Web Traffic  Time Series ForecastingWeb Traffic  Time Series Forecasting
Web Traffic Time Series Forecasting
 
The Future is Big Graphs: A Community View on Graph Processing Systems
The Future is Big Graphs: A Community View on Graph Processing SystemsThe Future is Big Graphs: A Community View on Graph Processing Systems
The Future is Big Graphs: A Community View on Graph Processing Systems
 
Improving Resource Utilization in Data Centers using an LSTM-based Prediction...
Improving Resource Utilization in Data Centers using an LSTM-based Prediction...Improving Resource Utilization in Data Centers using an LSTM-based Prediction...
Improving Resource Utilization in Data Centers using an LSTM-based Prediction...
 
1605.08695.pdf
1605.08695.pdf1605.08695.pdf
1605.08695.pdf
 

Dernier

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 

Dernier (20)

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 

Memory Efficient Graph Convolutional Network based Distributed Link Prediction

  • 1. Memory Efficient Graph Convolutional Network based Distributed Link Prediction Damitha Senevirathne, Isuru Wijesiri, Suchitha Dehigaspitiya, Miyuru Dayarathna, Sanath Jayasena, Toyotaro Suzumura 2020 IEEE International Conference on Big Data, Seventh International Workshop on High Performance Big Graph Data Management, Analysis, and Mining University of Moratuwa, Sri Lanka WSO2, Inc. USA IBM T.J. Watson Research Center, USA MIT-IBM Watson AI Lab, USA Barcelona Supercomputing Center, Spain
  • 3. Why graphs? ● Network/Graph data encompasses numerous real world scenarios ● Richer data structures compared to standard feature based structures 3
  • 4. Why graphs ctd. Knowledge graphsSocial graphs 4
  • 5. Why graphs ctd. Protein-protein interaction graphsPatient interaction networks 5
  • 6. Mining on Graphs ● Traditional graph mining focused on using graph properties only (eg. PageRank, triangle count, degree distribution) ● Graph machine learning expands the horizons of mining on graph data 6
  • 7. Graph-based Machine Learning ● Network embedding a key part of graph-based machine learning ● Unsupervised learning of features generalizes the input for downstream machine learning tasks ● Early approaches such as node2vec based on graph walks ● But cannot incorporate feature data 7
  • 8. Graph Convolutional Networks (GCNs) ● Learning of embeddings using both features as well as graph structure ● Offer significantly better results in downstream machine learning tasks such as node classification, link prediction, graph clustering etc. ● Uses the idea of aggregating neighbourhood information to incorporate structure into embeddings 8
  • 9. Presentation Outline ● Introduction ● Research Problem ● Proposed Solution ● Related Work ● Methodology ● Evaluation ● Conclusion 9
  • 11. Problem and Context ● Graph data is useful for many applications and offer much more contextual information for machine learning tasks ● Graphs becoming too large in memory to handle in standard model training approaches and impossible to train on commodity hardware ○ Millions of nodes and edges ○ Large amounts of node features ● How to conduct efficient model training on large graphs? 11
  • 12. ● We propose a mechanism that partitions graphs and conducts distributed training on the partitions while ensuring memory efficiency by using an appropriate scheduling algorithm. ● We provide a mechanism to train any graph machine learning model aimed at any task such as node embedding, node classification, link prediction ● We evaluate the above mechanism by implementing a GCN based link prediction application for several graph based use cases Proposed Solution and Contributions 12
  • 13. ● Develop a generic graph machine learning mechanism on top of the distributed graph database system - JasmineGraph1 ○ Ensure good model performance as well as training time reduction ○ Ensure memory is utilized fully while eliminating overflow using scheduling Objectives 1. M. Dayarathna (2018), miyurud/jasminegraph, GitHub. [Online]. Available: https://github.com/miyurud/jasminegraph 13
  • 15. Related Work No Related Work Relatedness Limitation 1 DeepWalk [25] and Node2Vec [10] Early node embedding methods ● Uses only graph walks to capture node neighborhood information ● Does not utilize node features 2 GCN [4] Node embedding adapting the conv. theory to graphs ● Learn a function to generate node embeddings by aggregating target node’s and neighborhood features 15 [10] Aditya Grover and Jure Leskovec. 2016. Node2Vec: Scalable Feature Learning for Networks. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16). ACM, New York, NY, USA, 855–864. https://doi.org/10.1145/2939672.2939754 [25] B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, pages 701–710, New York, NY, USA, 2014. ACM [4] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun. Spectral networks and locally connected networks on graphs. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014
  • 16. Related Work ctd. No Related Work Relatedness Limitation 3 PyTorch-BigGraph (PBG) [16] Distributed graph training mechanism ● Random node partitioning ● Shared file system ● GCNs not utilized 4 Euler [1] Distributed graph learning framework ● Train models developed in Tensorflow to be trained on heterogeneous graphs ● But untested for large graphs like DBLP-V11 ● Depends on HDFS based shared file system 5 JanusGraph [3], Acacia [7] and Trinity [23] Distributed graph databases ● Distributed processing of graphs ● But does not support graph machine learning 16 [1] Alibaba. 2019. Euler. URL: https://github.com/alibaba/euler. [3] Apache Software Foundation. 2020. JanusGraph. URL: https://janusgraph.org/. [7] M. Dayarathna and T. Suzumura. 2014. Towards Scalable Distributed Graph Database Engine for Hybrid Clouds. In 2014 5th International Workshop on Data-Intensive Computing in the Clouds. 1–8. https://doi.org/10.1109/DataCloud.2014.9 [16] Adam Lerer, Ledell Wu, Jiajun Shen, Timothée Lacroix, Luca Wehrstedt, Abhijit Bose, and Alexander Peysakhovich. 2019. PyTorch-BigGraph: A Large-scale Graph Embedding System. CoRR abs/1903.12287 (2019). arXiv:1903.12287 http://arxiv.org/abs/1903.12287 [23] Bin Shao, Haixun Wang, and Yatao Li. 2013. Trinity: A Distributed Graph Engine on a Memory Cloud. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD ’13). Association for Computing Machinery, New York, NY, USA, 505–516. https://doi.org/10.1145/2463676.2467799
  • 18. Overview of JasmineGraph ● Two main components; Master and Worker ● Communication protocols between Master-worker and worker-worker have been designed ● Graphs partitioned during upload process using METIS 18
  • 19. Graph partitioning (METIS) and Reconstruction GROUP 22 - SID 19
  • 20. Horizontal and Vertical scaling GROUP 22 - SID 20
  • 21. JasmineGraph Architecture ● Python workers (Client and server) sitting alongside standard C++ workers run ML processes ● Model updates exchanged directly between Python workers 21
  • 22. JasmineGraph Architecture ctd. ● Update sharing increases model accuracy while simultaneously increasing communication overheads ● However, in the end results in one simple graph ML model to be used in desired downstream tasks 22
  • 23. Training Flow ● Training conducted on partitions by distributed workers/clients ● After every training round, model updates sent to server and aggregated and sent back to workers/clients 23
  • 24. Memory estimation ● Estimate partition size in memory based on number of nodes, edges and attributes 24
  • 25. Partition Scheduling ● All graph partitions might not fit into memory at once ● Decides which partitions are to train in parallel in a given moment ● Ensures that memory overflow is avoided ● Packs partitions into memory in a way that the training time is optimized ● Uses best first fit approach 25
  • 29. Training and Aggregation ● Assign global model weights to client initialized models ● Sample graph for training ● Clients in parallel do training based on schedule ● After training round, send weights to aggregator 29
  • 31. Datasets GROUP 22 - SID Data Set Vertices Edges No of features Edge file size(MB) Feature file size(MB) Size when training (MB) DBLP-V11a 4,107,340 36,624,464 948 508 9523 2.5 Redditb 232,965 11,606,919 602 145 270 3.84 Twitterc 81,306 1,768,149 1007 16 157 107.5 (Estimate) 31 ● Original sources: a. https://www.aminer.org/citation b. http://snap.stanford.edu/graphsage/ c. https://snap.stanford.edu/data/ego-Twitter.html ● Our prepared versions available at https://github.com/limetreeestate/graph-datasets
  • 32. Datasets ctd. GROUP 22 - SID 32 Twitter: Suggest new users to follow ● Nodes - Twitter Users ● Edges (Directed) - User follows another ● Features - Twitter handles and hashtags used in user node’s tweets Reddit: Recommend content/posts that user might find interesting ● Nodes - Reddit posts ● Edges - There are common users between two posts ● Features - Extracted from textual content of the post node Link prediction predicts whether there will be links between two nodes based on the attribute information and the observed existing link Information.
  • 33. Datasets ctd. GROUP 22 - SID 33 DBLP-V11: Suggest new papers that a researcher might find useful/interesting ● Nodes - Research papers ● Edges (Directed) - One paper cites the other in its work ● Features - The field(s) of study that the paper node belongs to
  • 34. Model ● Generate node embeddings for nodes of a potential links ● Generate link/edge representation using inner product ● Classify potential link 34
  • 35. GROUP 22 - SID Processor Intel®Xeon®CPU E7-4820 v3 @ 1.90GHz, 40 CPU cores (80 hardware threads via hyperthreading), Main memory 64GB RAM Cache memory 32KB L1 (d/i) cache, 256K L2 cache, and 25600K L3 cache. Storage 1.8TB hard disk drive Operating System Ubuntu Linux version 16.04 with Linux kernel 4.4.0-148-generic. 35 Experiment Environment
  • 36. Model Performance GROUP 22 - SID 36 Dataset Accuracy Recall AUC F1 Precision Twitter 0.7887 0.9869 0.9576 0.8350 0.7233 Reddit 0.7174 0.9026 0.8037 0.7616 0.6587 DBLP-V11 Cannot train in conventional setting, crashes The following numbers reflect how a unpartitioned and trained link prediction model performs on these datasets
  • 37. Partition count Accuracy Recall AUC F1 Score Precision 1 (unpartitioned) 0.7887 0.9869 0.9576 0.835 0.7233 2 0.7047 0.9831 0.9292 0.77 0.6336 4 0.6395 0.973 0.8672 0.7306 0.5861 8 0.6537 0.9844 0.8977 0.7412 0.5962 16 0.5936 0.986 0.8441 0.7088 0.5538 37 Model performance (Twitter) For following table, used client count is equal to the number of partitions
  • 38. Model performance (Twitter) ctd. GROUP 22 - SID 38
  • 39. Partition count Accuracy Recall AUC F1 Score Precision 1 (unpartitioned) 0.7174 0.9026 0.8037 0.7616 0.6587 2 0.702 0.9559 0.8458 0.7625 0.6344 4 0.6836 0.9534 0.8201 0.751 0.6202 39 Model performance (Reddit) Client count is equal to the number of partitions in the following results
  • 40. Model performance (Reddit) ctd. GROUP 22 - SID 40
  • 41. Number of clients Elapsed Time (seconds) 1 (unpartitioned graph) 37908.31 2 19575.20 4 12922.13 41 Elapsed Training Times Following table contains results related to twitter/Reddit dataset using 16 partitions. Trained for 5 rounds with 3 epochs per round. Number of clients Elapsed Time (seconds) 1 (unpartitioned graph) 32883.68 2 22011.78 4 15019.63 Twitter dataset Reddit dataset
  • 42. Elapsed Training Times ctd. GROUP 22 - SID 42
  • 43. Implementation on Large Graphs (DBLP-V11) ● We were unable to train DBLP-V11 using conventional training ● But using proposed solution (with scheduling) we were able to train DBLP-V11 using 16 partitions with 2 clients (20.5 hours) ● But due to a memory growth in the system we trained DBLP-V11 dataset by using two steps (3 training rounds and then 2 i.e 15 epochs total) GROUP 22 - SID 43 Dataset Accuracy Recall AUC F1 Precision DBLP-V11 0.56529 0.99584 0.88943 0.69677 0.53630
  • 45. Conclusion 45 ● Conventional training schemes cannot handle training Graph Convolutional Networks (GCNs) on large graphs ● Distributed mechanism needed to train GCNs on large graphs
  • 46. Conclusion ctd. GROUP 22 - SID 46 ● Can any graph machine learning model for any task ○ We evaluate on using an offline developed model for link prediction ● Reduced time taken for training by partitioning and scheduling ○ DBLP-v11 dataset (>10GB) trained for 15 epochs in 20 hours 24 minutes with 16 partitions and 2 workers where conventional training couldn’t process it at all ○ Reddit trained in 3 hours 11 minutes (8 partitions, 4 workers). Conventional took 9 hours 11 minutes ● Future work ○ Horizontal scaling experiments ○ Secure collaborative graph machine learning between organizations
  • 47. GROUP 22 - SID THANK YOU 47