Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Brad Rees, Connected Data London, Oct 4th, 2019
cuGraph
Accelerating all your Graph Analytic Needs
2
Brad
Rees
Name
NVIDIA
Sr
Manager
cuGraph
Lead
PhD
Community
Detection in
Social
Networks
> 30
years
education
experience...
3
WE ARE
CONNECTED
7 degrees of Kevin Bacon
Duncan Watts & Steven Strogatz
Collective dynamics of
‘small-world’ networks -...
4
CONNECTEDNESS
CAPTURED AS A
GRAPH
As well as
associated
information,
knowledge,
metadata, etc..
5
AND THERE ARE
A LOT OF GRAPH
FRAMEWORKS
In lots of variations
Neo4j
TigerGraph
AnzoGraph
RedisGraph
Oracle
Product names...
6
Why cuGraph?
More generally, why RAPIDS?
A) Graph is not an isolated function, and
needs to be part of the complete Data...
7
Speed, UX, and Iteration
The Way to Win at Data Science
Slide borrowed from Francois Chollet
8
cuDF cuIO
Analytics
GPU Memory
Data Preparation VisualizationModel Training
cuML
Machine Learning
cuGraph
Graph Analytic...
9
ETL - the Backbone of Data Science
cuDF is…
Python Library
● A Python library for manipulating GPU DataFrames
following ...
10
Extraction is the Cornerstone of ETL
cuIO is born
• Follows the APIs of Pandas and provide >10x
speedup
• CSV Reader - ...
11
cuML Machine Learning
GPU-accelerated Scikit-Learn
Classification / Regression
Statistical Inference
Clustering
Decompo...
12
cuGraph
Accelerating your Graph needs
13
GOALS AND BENEFITS OF CUGRAPH
• Seamless integration with cuDF and cuML
•Python APIs accepts and returns cuDF DataFrame...
14
Graph Technology Stack
Python
Cython
C++ cuGraph Algorithms
Prims
CUDA Libraries
CUDA
Dask cuGraph
Dask cuDF
cuDF
Numpy...
15
Bringing in leading researchers
Leveraging the great work of others
cuGraphGunrock Hornet
GraphBLAS
https://news.develo...
16
Algorithms
(as of release 0.10)
GPU-accelerated NetworkX
Community
Components
Link Analysis
Link Prediction
Traversal
S...
17
PageRank Speedup
cuGraph PageRank vs NetworkX PageRank
G = cugraph.Graph()
G.add_edge_list(gdf[‘src’], gdf[‘dst’], None...
18
PageRank Performance
HiBench Websearch benchmark
All times are in seconds
Vertices Edges
File Size
(GB)
Number of
GPUs
...
19
Faster Speeds, Real-World Benefits
cuIO/cuDF –
Load and Data Preparation cuML - XGBoost
Time in seconds (shorter is bet...
20
21
Deploy RAPIDS Everywhere
Focused on robust functionality, deployment, and user experience
Integration with major cloud ...
G R A P H I S T info@graphistry.com
Data Scientist
Notebooks
Dev API For
Embedding
Analyst
Tool Suite
Automate
Investigati...
23
Articles
THANK YOU
Please give us a star on GitHub
https://github.com/rapidsai/cugraph
Questions?
25
PageRank Performance
HiBench Websearch benchmark
All times are in seconds
Vertices Edges
File Size
(GB)
Number of
GPUs
...
Prochain SlideShare
Chargement dans…5
×

2

Partager

Télécharger pour lire hors ligne

RAPIDS cuGraph – Accelerating all your Graph needs

Télécharger pour lire hors ligne

The relationships between data sets matter. Discovering, analyzing, and learning those relationships is a central part to expanding our understand, and is a critical step to being able to predict and act upon the data. Unfortunately, these are not always simple or quick tasks.

To help the analyst we introduce RAPIDS, a collection of open-source libraries, incubated by NVIDIA and focused on accelerating the complete end-to-end data science ecosystem. Graph analytics is a critical piece of the data science ecosystem for processing linked data, and RAPIDS is pleased to offer cuGraph as our accelerated graph library.

Simply accelerating algorithms only addressed a portion of the problem. To address the full problem space, RAPIDS cuGraph strives to be feature-rich, easy to use, and intuitive. Rather than limiting the solution to a single graph technology, cuGraph supports Property Graphs, Knowledge Graphs, Hyper-Graphs, Bipartite graphs, and the basic directed and undirected graph.

A Python API allows the data to be manipulated as a DataFrame, similar and compatible with Pandas, with inputs and outputs being shared across the full RAPIDS suite, for example with the RAPIDS machine learning package, cuML.

This talk will present an overview of RAPIDS and cuGraph. Discuss and show examples of how to manipulate and analyze bipartite and property graph, plus show how data can be shared with machine learning algorithms. The talk will include some performance and scalability metrics. Then conclude with a preview of upcoming features, like graph query language support, and the general RAPIDS roadmap.

RAPIDS cuGraph – Accelerating all your Graph needs

  1. 1. Brad Rees, Connected Data London, Oct 4th, 2019 cuGraph Accelerating all your Graph Analytic Needs
  2. 2. 2 Brad Rees Name NVIDIA Sr Manager cuGraph Lead PhD Community Detection in Social Networks > 30 years education experience Cyber SNA works at Graph Computer Science >20years HPC Big Data
  3. 3. 3 WE ARE CONNECTED 7 degrees of Kevin Bacon Duncan Watts & Steven Strogatz Collective dynamics of ‘small-world’ networks - 1998 And have always been connected The small-world problem - 1968 Stanley Milgram (social psychologist) 1929
  4. 4. 4 CONNECTEDNESS CAPTURED AS A GRAPH As well as associated information, knowledge, metadata, etc..
  5. 5. 5 AND THERE ARE A LOT OF GRAPH FRAMEWORKS In lots of variations Neo4j TigerGraph AnzoGraph RedisGraph Oracle Product names are the property of the owners GraphX Pegasus Pregel GraphLab Giraph Graphulo PowerGraph GaloisLigra Gunrock GraphBLAS Stinger HornetcuGraph NetworkX NetworkX
  6. 6. 6 Why cuGraph? More generally, why RAPIDS? A) Graph is not an isolated function, and needs to be part of the complete Data Science Process. And Graph are just cool
  7. 7. 7 Speed, UX, and Iteration The Way to Win at Data Science Slide borrowed from Francois Chollet
  8. 8. 8 cuDF cuIO Analytics GPU Memory Data Preparation VisualizationModel Training cuML Machine Learning cuGraph Graph Analytics PyTorch Chainer MxNet Deep Learning cuXfilter <> pyViz Visualization Enter End-to-End Accelerated GPU Data Science Dask Reduce Data Movement and Keep All Processing on the GPU
  9. 9. 9 ETL - the Backbone of Data Science cuDF is… Python Library ● A Python library for manipulating GPU DataFrames following the Pandas API ● Python interface to CUDA C++ library with additional functionality ● Creating GPU DataFrames from Numpy arrays, Pandas DataFrames, and PyArrow Tables ● JIT compilation of User-Defined Functions (UDFs) using Numba ● String Support
  10. 10. 10 Extraction is the Cornerstone of ETL cuIO is born • Follows the APIs of Pandas and provide >10x speedup • CSV Reader - v0.2, CSV Writer v0.8 • Parquet Reader – v0.7 • ORC Reader – v0.7 • JSON Reader - v0.8 • Avro Reader - v0.9 • HDF5 Reader - v0.10 • Key is GPU-accelerating both parsing and decompression wherever possible Source: Apache Crail blog: SQL Performance: Part 1 - Input File Formats
  11. 11. 11 cuML Machine Learning GPU-accelerated Scikit-Learn Classification / Regression Statistical Inference Clustering Decomposition & Dimensionality Reduction Time Series Forecasting Recommendations Decision Trees / Random Forests Linear Regression Logistic Regression K-Nearest Neighbors Kalman Filtering Bayesian Inference Gaussian Mixture Models Hidden Markov Models K-Means DBSCAN Spectral Clustering Principal Components Singular Value Decomposition UMAP Spectral Embedding ARIMA Holt-Winters Implicit Matrix Factorization Cross Validation More to come! Hyper-parameter Tuning 1x V100 vs 2x 20 core CPU
  12. 12. 12 cuGraph Accelerating your Graph needs
  13. 13. 13 GOALS AND BENEFITS OF CUGRAPH • Seamless integration with cuDF and cuML •Python APIs accepts and returns cuDF DataFrames • Allows for Property Graph • Features • Extensive collection of algorithm, primitive, and utility functions** • With Accelerated Performance • Python API: • Multiple APIs: NetworkX, Pregel**, GraphBLAS**, Frontier** • Graph Query Language** • C/C++ • Full featured C++ API Focus on Features an Easy-of-Use ** On Roadmap
  14. 14. 14 Graph Technology Stack Python Cython C++ cuGraph Algorithms Prims CUDA Libraries CUDA Dask cuGraph Dask cuDF cuDF Numpy Thrust Cub cuSolver cuSparse cuRand Gunrock* cuGraphBLAS cuHornet nvGRAPH has been Opened Sourced and integrated into cuGraph. * Gunrock is from UC Davis cuGraphBLAS projected release Is. 0.12
  15. 15. 15 Bringing in leading researchers Leveraging the great work of others cuGraphGunrock Hornet GraphBLAS https://news.developer.nvidia.com/graph-technology-leaders-combine-forces-to-advance-graph-analytics/ cuHornet cuGraphBLAS
  16. 16. 16 Algorithms (as of release 0.10) GPU-accelerated NetworkX Community Components Link Analysis Link Prediction Traversal Structure Spectral Clustering Balanced-Cut Modularity Maximization Louvain Subgraph Extraction Triangle Counting Jaccard Weighted Jaccard Overlap Coefficient Single Source Shortest Path (SSSP) Breadth First Search (BFS) COO-to-CSR Transpose Renumbering Multi-GPU More to come! Utilities Weakly Connected Components Strongly Connected Components Page Rank Personal Page Rank Katz Query Language Page Rank OpenCypher: Find-Matches Long list of additional algorithms to come Symmetrize
  17. 17. 17 PageRank Speedup cuGraph PageRank vs NetworkX PageRank G = cugraph.Graph() G.add_edge_list(gdf[‘src’], gdf[‘dst’], None) df = cugraph.pagerank(G, alpha, max_iter, tol) https://github.com/rapidsai/notebooks-extended/tree/master/advanced/benchmarks/cugraph_benchmark SciPy
  18. 18. 18 PageRank Performance HiBench Websearch benchmark All times are in seconds Vertices Edges File Size (GB) Number of GPUs Read data and create DataFrame Run Pagerank (20 iterations) Write Scores TOTAL runtime 50,000,000 1,980,000,000 34 3 28.6 6.8 6.2 41.6 100,000,000 4,000,000,000 69 6 33.4 11.3 12.7 57.4 200,000,000 8,000,000,000 146 12 36.8 24.4 26.7 87.9 400,000,000 16,000,000,000 300 16 58.3 42.8 53.0 154.1 Ø Process Ø Read Data Ø Parse CSV into DataFrame Ø Run Page Rank Ø Convert Data to CSR Ø Setup Ø Run PagePage Solver Ø Collect Results and convert of a DataFrame Ø Write Score
  19. 19. 19 Faster Speeds, Real-World Benefits cuIO/cuDF – Load and Data Preparation cuML - XGBoost Time in seconds (shorter is better) cuIO/cuDF (Load and Data Prep) Data Conversion XGBoost Benchmark 200GB CSV dataset; Data prep includes joins, variable transformations CPU Cluster Configuration CPU nodes (61 GiB memory, 8 vCPUs, 64- bit platform), Apache Spark DGX Cluster Configuration 5x DGX-1 on InfiniBand network 8762 6148 3925 3221 322 213 End-to-End Non-Graph
  20. 20. 20
  21. 21. 21 Deploy RAPIDS Everywhere Focused on robust functionality, deployment, and user experience Integration with major cloud providers Both containers and cloud specific machine instances Support for Enterprise and HPC Orchestration Layers Cloud Dataproc Azure Machine Learning
  22. 22. G R A P H I S T info@graphistry.com Data Scientist Notebooks Dev API For Embedding Analyst Tool Suite Automate Investigations Virtual Graph over graph and tabular APIs GPU Visual Analytics: • 100X via GPUs: client<>cloud • Correlate w/ graph • Time, histograms, … 100X Investigations with Graphistry: Visibility & workflows for handling modern enterprise data G R A P H I S T R Y
  23. 23. 23 Articles
  24. 24. THANK YOU Please give us a star on GitHub https://github.com/rapidsai/cugraph Questions?
  25. 25. 25 PageRank Performance HiBench Websearch benchmark All times are in seconds Vertices Edges File Size (GB) Number of GPUs Read data and create DataFrame Run Pagerank (20 iterations) Write Scores TOTAL runtime 50,000,000 1,980,000,000 34 3 28.6 6.8 6.2 41.6 100,000,000 4,000,000,000 69 6 33.4 11.3 12.7 57.4 200,000,000 8,000,000,000 146 12 36.8 24.4 26.7 87.9 400,000,000 16,000,000,000 300 16 58.3 42.8 53.0 154.1 Vertices Edges Convert DataFrame to CSR Just PageRank Solver 50,000,000 1,980,000,000 2.4 3.66 100,000,000 4,000,000,000 4.5 5.16 200,000,000 8,000,000,000 9.6 8.65 400,000,000 16,000,000,000 19.5 13.89 Ø Process Ø Read Data Ø Parse CSV into DataFrame Ø Run Page Rank Ø Convert Data to CSR Ø Setup Ø Run PagePage Solver Ø Collect Results and convert of a DataFrame Ø Write Score
  • AmirYahyavi

    Feb. 23, 2021
  • pattmos

    Oct. 15, 2019

The relationships between data sets matter. Discovering, analyzing, and learning those relationships is a central part to expanding our understand, and is a critical step to being able to predict and act upon the data. Unfortunately, these are not always simple or quick tasks. To help the analyst we introduce RAPIDS, a collection of open-source libraries, incubated by NVIDIA and focused on accelerating the complete end-to-end data science ecosystem. Graph analytics is a critical piece of the data science ecosystem for processing linked data, and RAPIDS is pleased to offer cuGraph as our accelerated graph library. Simply accelerating algorithms only addressed a portion of the problem. To address the full problem space, RAPIDS cuGraph strives to be feature-rich, easy to use, and intuitive. Rather than limiting the solution to a single graph technology, cuGraph supports Property Graphs, Knowledge Graphs, Hyper-Graphs, Bipartite graphs, and the basic directed and undirected graph. A Python API allows the data to be manipulated as a DataFrame, similar and compatible with Pandas, with inputs and outputs being shared across the full RAPIDS suite, for example with the RAPIDS machine learning package, cuML. This talk will present an overview of RAPIDS and cuGraph. Discuss and show examples of how to manipulate and analyze bipartite and property graph, plus show how data can be shared with machine learning algorithms. The talk will include some performance and scalability metrics. Then conclude with a preview of upcoming features, like graph query language support, and the general RAPIDS roadmap.

Vues

Nombre de vues

649

Sur Slideshare

0

À partir des intégrations

0

Nombre d'intégrations

119

Actions

Téléchargements

14

Partages

0

Commentaires

0

Mentions J'aime

2

×