SlideShare une entreprise Scribd logo
1  sur  72
Télécharger pour lire hors ligne
Adam Thompson | adamt@nvidia.com | Senior Solutions Architect
Corey Nolet | cnolet@nvidia.com | Data Scientist and Senior Engineer
GPU Enabled Data Science
2
About Us
Adam Thompson @adamlikesai
Senior Solutions Architect at NVIDIA, Signals/RF SME
Research interests include applications of machine and deep learning to non-
traditional data types, HPC, and large system integration
Experience in programming GPUs for signal processing tasks since CUDA Toolkit 2
Roots for and
Corey Nolet @cjnolet
Data Scientist & Senior Engineer on the RAPIDS cuML team at NVIDIA
Focuses on building and scaling machine learning algorithms to support extreme
data loads at light-speed
Over a decade experience building massive-scale exploratory data science & real-
time analytics platforms for HPC environments in the defense industry
Working towards PhD in Computer Science, focused on unsupervised ML
3
1980 1990 2000 2010 2020
102
103
104
105
106
107
Single-threaded perf
1.5X per year
1.1X per year
GPU-Computing perf
1.5X per year 1000X
By 2025
REALITIES OF PERFORMANCE
GPU Performance Grows
4
Realities of Data
5
ML Workflow Stifles Innovation
It Requires Exploration and Iterations
All
Data
ETL
Manage Data
Structured
Data Store
Data
Preparation
Training
Model
Training
Visualization
Evaluate
Inference
Deploy
Accelerating just `Model Training` does have benefit but doesn’t address the whole problem
End-to-End acceleration is needed
Iterate …
Cross Validate …
Grid Search …
Iterate some more.
6
7
GPU
Adoption
Barriers
• Too much data movement
• Too many makeshift data formats
• Writing CUDA C/C++ is hard
• No Python API for data manipulation
Yes GPUs are fast but …
8
APP A
DATA MOVEMENT AND TRANSFORMATION
The bane of productivity and performance
CPU GPU
APP B
Read Data
Copy & Convert
Copy & Convert
Copy & Convert
Load Data
APP A GPU
Data
APP B
GPU
Data
APP A
APP B
9
APP A
DATA MOVEMENT AND TRANSFORMATION
What if we could keep data on the GPU?
APP B
Copy & Convert
Copy & Convert
Copy & Convert
APP A GPU
Data
APP B
GPU
Data
Read Data
Load Data
APP B
CPU GPU
APP A
10
Learning from APACHE ARROW
Source: Apache Arrow Home Page (https://arrow.apache.org/)
11
Data Processing Evolution
Faster Data Access Less Data Movement
25-100x Improvement
Less code
Language flexible
Primarily In-Memory
HDFS
Read
HDFS
Write
HDFS
Read
HDFS
Write
HDFS
ReadQuery ETL ML Train
HDFS
Read Query ETL ML Train
HDFS
Read
GPU
Read
Query
CPU
Write
GPU
Read
ETL
CPU
Write
GPU
Read
ML
Train
Arrow
Read
Query ETL
ML
Train
5-10x Improvement
More code
Language rigid
Substantially on GPU
50-100x Improvement
Same code
Language flexible
Primarily on GPU
RAPIDS
GPU/Spark In-Memory Processing
Hadoop Processing, Reading from disk
Spark In-Memory Processing
12
APPLICATIONS
SYSTEMS
ALGORITHMS
CUDA
ARCHITECTURE
• Learn what the data science community needs
• Use best practices and standards
• Build scalable systems and algorithms
• Test Applications and workflows
• Iterate
13
cuDF
Analytics
GPU Memory
Data Preparation VisualizationModel Training
cuML
Machine Learning
cuGraph
Graph Analytics
PyTorch & Chainer
Deep Learning
Kepler.GL
Visualization
RAPIDS Open Source SOFTWARE
14
RAPIDS: OPEN GPU DATA SCIENCE
cuDF, cuML, and cuGraph mimic well-known libraries
Data Preparation VisualizationModel Training
CUDA
PYTHON
DASK
DL
FRAMEWORKS
CUDNN
RAPIDS
CUMLCUDF CUGRAPH
APACHE ARROW
Pandas-like
ScikitLearn-like
NetworkX-like
15
Dask
What is Dask and why does RAPIDS use it for scaling out?
• Dask is a distributed computation scheduler
built to scale Python workloads from laptops to
supercomputer clusters.
• Extremely modular with scheduling, compute,
data transfer, and out-of-core handling all being
disjointed allowing us to plug in our own
implementations.
• Can easily run multiple Dask workers per node
to allow for an easier development model of
one worker per GPU regardless of single node
or multi node environment.
16
Dask
Scale up and out with cuDF
• Use cuDF primitives underneath in map-reduce style
operations with the same high level API
• Instead of using typical Dask data movement of
pickling objects and sending via TCP sockets, take
advantage of hardware advancements using a
communications framework called OpenUCX:
• For intranode data movement, utilize NVLink
and PCIe peer-to-peer communications
• For internode data movement, utilize GPU
RDMA over Infiniband and RoCE https://github.com/rapidsai/dask_gdf
17
Faster speeds, real world benefits
2,290
1,956
1,999
1,948
169
157
0 500 1,000 1,500 2,000 2,500
20 CPU Nodes
30 CPU Nodes
50 CPU Nodes
100 CPU Nodes
DGX-2
5x DGX-1
0 2,000 4,000 6,000 8,000 10,000
20 CPU Nodes
30 CPU Nodes
50 CPU Nodes
100 CPU Nodes
DGX-2
5x DGX-1
cuML — XGBoost
2,741
1,675
715
379
42
19
0 1,000 2,000 3,000
20 CPU Nodes
30 CPU Nodes
50 CPU Nodes
100 CPU Nodes
DGX-2
5x DGX-1
End-to-End
cuIO/cuDF —
Load and Data Preparation
Benchmark
200GB CSV dataset; Data preparation
includes joins, variable transformations.
CPU Cluster Configuration
CPU nodes (61 GiB of memory, 8 vCPUs,
64-bit platform), Apache Spark
DGX Cluster Configuration
5x DGX-1 on InfiniBand network
Time in seconds — Shorter is better
cuIO / cuDF (Load and Data Preparation) Data Conversion XGBoost
8,762
6,148
3,925
3,221
322
213
18
AI Libraries
Accelerating more of the AI ecosystem
Graph Analytics is fundamental to network analysis
Machine Learning is fundamental to prediction,
classification, clustering, anomaly detection and
recommendations.
Both can be accelerated with NVIDIA GPU
8x V100 20-90x faster than dual socket CPU
Decisions Trees
Random Forests
Linear Regressions
Logistics Regressions
K-Means
K-Nearest Neighbor
DBSCAN
Kalman Filtering
Principal Components
Single Value Decomposition
Bayesian Inferencing
PageRank
BFS
Jaccard Similarity
Single Source Shortest Path
Triangle Counting
Louvain Modularity
ARIMA
Holt-Winters
Machine Learning Graph Analytics
Time Series
XGBoost, Mortgage Dataset, 90x
3 Hours to 2 mins on 1 DGX-1
cuML & cuGraph
19
cuDF
20
RAPIDS: OPEN GPU DATA SCIENCE
Following the Pandas API for optimal code reuse
Data Preparation VisualizationModel Training
CUDA
PYTHON
DASK
DL
FRAMEWORKS
CUDNN
RAPIDS
CUMLCUDF CUGRAPH
APACHE ARROW
Pandas-like
import pandas as pd
import cudf
…
df = pd.read_csv('foo.csv', names=['index', 'A'], dtype=['date', 'float64'])
gdf = cudf.read_csv('foo.csv', names=['index', 'A'], dtype=['date', 'float64’])
…
# eventually majority of pandas features will be available in cudf
21
cuDF
GPU DataFrame library
• Apache Arrow data format
• Pandas-like API
• Unary and Binary Operations
• Joins / Merges
• GroupBys
• Filters
• User-Defined Functions (UDFs)
• Accelerated file readers
• Much, much more
22
CUDF
Today
CUDA With Python Bindings
• Low level library containing function
implementations and C/C++ API
• Importing/exporting Apache Arrow using the
CUDA IPC mechanism
• CUDA kernels to perform element-wise math
operations on GPU DataFrame columns
• CUDA sort, join, groupby, and reduction
operations on GPU DataFrames
• A Python library for manipulating GPU
DataFrames
• Python interface to CUDA C++ with additional
functionality
• Creating Apache Arrow from Numpy arrays,
Pandas DataFrames, and PyArrow Tables
• JIT compilation of User-Defined Functions
(UDFs) using Numba
23
GPU-Accelerated string functions with a Pandas-like API
cuString & NVSTring
• API and functionality is following Pandas: https://pandas.pydata.org/pandas-
docs/stable/api.html#string-handling
• lower()
• ~22x speedup
• find()
• ~40x speedup
• slice()
• ~100x speedup
0.00
100.00
200.00
300.00
400.00
500.00
600.00
700.00
800.00
lower() find(#) slice(1,15)
milliseconds
Pandas cudastrings
24
Next few months
GPU Dataframe
• Continue improving performance and functionality
• Single GPU
• Single node, multi GPU
• Multi node, multi GPU
• String Support
• Support for specific “string” dtype with GPU-accelerated functionality similar to Pandas
• Accelerated Data Loading
• File formats: CSV, Parquet, ORC – to start
25
cuGraph
26
CUGRAPH
GPU-Accelerated Graph Analytics Library
Coming Soon:
Full NVGraph Integration Q1 2019
27
Early cugraph benchmarks
NVGRAPH methods (not currently in cuGraph)
100 billion TEPS (single GPU)
Can reach 300 billion TEPS on MG (4 GPUs) with hand-tuning
PageRank (SG) on graph with 5.5m edges and 51k nodes with RAPIDS
CPU (GraphFrames, Spark) [576 GB memory, 384 vcores]: 172 seconds
GPU (cuGraph, V100) [32 GB memory, 5120 CUDA cores]: 671 ms
Speed-up of 256x
For large graphs, there are huge parallelization benefits
28
cuML
29
Problem
Data sizes continue to grow
30
Problem
Data sizes continue to grow
31
Problem
Data sizes continue to grow
min(variance)
min(bias)
32
Problem
Data sizes continue to grow
Histograms / Distributions
Dimension Reduction
Feature Selection
Remove Outliers
Sampling
33
Problem
Data sizes continue to grow
Histograms / Distributions
Dimension Reduction
Feature Selection
Remove Outliers
Sampling
Better to start with as much data as
possible and explore / preprocess to scale
to performance needs.
34
Problem
Data sizes continue to grow
Histograms / Distributions
Dimension Reduction
Feature Selection
Remove Outliers
Sampling
Massive Dataset
Better to start with as much data as
possible and explore / preprocess to scale
to performance needs.
35
Problem
Data sizes continue to grow
Histograms / Distributions
Dimension Reduction
Feature Selection
Remove Outliers
Sampling
Massive Dataset
Better to start with as much data as
possible and explore / preprocess to scale
to performance needs.
36
Problem
Data sizes continue to grow
Histograms / Distributions
Dimension Reduction
Feature Selection
Remove Outliers
Sampling
Massive Dataset
Better to start with as much data as
possible and explore / preprocess to scale
to performance needs.
Iterate.
37
Problem
Data sizes continue to grow
Histograms / Distributions
Dimension Reduction
Feature Selection
Remove Outliers
Sampling
Massive Dataset
Better to start with as much data as
possible and explore / preprocess to scale
to performance needs.
Iterate. Cross Validate.
38
Problem
Data sizes continue to grow
Histograms / Distributions
Dimension Reduction
Feature Selection
Remove Outliers
Sampling
Massive Dataset
Better to start with as much data as
possible and explore / preprocess to scale
to performance needs.
Iterate. Cross Validate & Grid Search.
39
Problem
Data sizes continue to grow
Histograms / Distributions
Dimension Reduction
Feature Selection
Remove Outliers
Sampling
Massive Dataset
Better to start with as much data as
possible and explore / preprocess to scale
to performance needs.
Iterate. Cross Validate & Grid Search.
Iterate some more.
40
Problem
Data sizes continue to grow
Histograms / Distributions
Dimension Reduction
Feature Selection
Remove Outliers
Sampling
Massive Dataset
Better to start with as much data as
possible and explore / preprocess to scale
to performance needs.
Iterate. Cross Validate & Grid Search.
Iterate some more.
Meet reasonable speed vs accuracy tradeoff
41
Problem
Data sizes continue to grow
Histograms / Distributions
Dimension Reduction
Feature Selection
Remove Outliers
Sampling
Massive Dataset
Better to start with as much data as
possible and explore / preprocess to scale
to performance needs.
Iterate. Cross Validate & Grid Search.
Iterate some more.
Meet reasonable speed vs accuracy tradeoff
Time
Increases
42
Problem
Data sizes continue to grow
Histograms / Distributions
Dimension Reduction
Feature Selection
Remove Outliers
Sampling
Massive Dataset
Better to start with as much data as
possible and explore / preprocess to scale
to performance needs.
Iterate. Cross Validate & Grid Search.
Iterate some more.
Meet reasonable speed vs accuracy tradeoff
Hours?
Time
Increases
43
Problem
Data sizes continue to grow
Histograms / Distributions
Dimension Reduction
Feature Selection
Remove Outliers
Sampling
Massive Dataset
Better to start with as much data as
possible and explore / preprocess to scale
to performance needs.
Iterate. Cross Validate & Grid Search.
Iterate some more.
Meet reasonable speed vs accuracy tradeoff
Hours? Days?
Time
Increases
44
“More data requires better approaches!”
~ Xavier Amatriain (CTO of Curai)
45
cuML API
Python
Algorithms
Primitives
GPU-accelerated machine learning at every layer
Scikit-learn-like interface for data scientists utilizing
cuDF & Numpy
C++ API for developers to utilize accelerated machine
learning algorithms.
Reusable building blocks for composing machine learning
algorithms.
46
Linear Algebra
Primitives
GPU-accelerated math optimized for feature matrices
Statistics
Sparse
Random
Distance / Metrics
Optimizers
More to come!
• Element-wise operations
• Matrix multiply
• Norms
• Eigen Decomposition
• SVD/RSVD
• Transpose
• QR Decomposition
47
Algorithms
GPU-accelerated Scikit-Learn
Classification / Regression
Statistical Inference
Clustering
Decomposition & Dimensionality Reduction
Timeseries Forecasting
Recommendations
Decision Trees / Random Forests
Linear Regression
Logistic Regression
K-Nearest Neighbors
Kalman Filtering
Bayesian Inference
Gaussian Mixture Models
Hidden Markov Models
K-Means
DBSCAN
Spectral Clustering
Principal Components
Singular Value Decomposition
UMAP
Spectral Embedding
ARIMA
Holt-Winters
Implicit Matrix Factorization
Cross Validation
More to come!
Hyper-parameter Tuning
48
HIGH-LEVEL APIs
CUDA/C++
Multi-Node & Multi-GPU Communications
ML Primitives
ML Algorithms
Python
Dask Multi-GPU ML
Scikit-Learn-Like
Host 2
GPU1 GPU3
GPU2 GPU4
Host 1
GPU1 GPU3
GPU2 GPU4
49
HIGH-LEVEL APIs
CUDA/C++
Multi-Node / Multi-GPU Communications
ML Primitives
ML Algorithms
Python
Dask Multi-GPU ML
Scikit-Learn-Like
Host 2
GPU1 GPU3
GPU2 GPU4
Host 1
GPU1 GPU3
GPU2 GPU4
Data Parallelism
Model Parallelism
50
ML Technology Stack
Python
Cython
cuML Algorithms
cuML Prims
CUDA Libraries
CUDA
C++
C++ C++
51
ML Technology Stack
Dask cuML
Dask cuDF
cuDF
Numpy
Thrust
Cub
cuSolver
nvGraph
CUTLASS
cuSparse
cuRand
cuBlas
Python
Cython
cuML Algorithms
cuML Prims
CUDA Libraries
CUDA
C++
C++ C++
52
Dimensionality Reduction
Code Example
import cudf
df = cudf.read_csv("iris/iris.data")
df.columns=['sepal_len', 'sepal_wid', 'petal_len', 'petal_wid', 'class']
df.dropna(how="all", inplace=True)
labels = df["class"]
df = df.drop(["class"], axis = 1).astype(np.float32)
Load CSV & Preprocess
53
Dimensionality Reduction
Code Example
import cudf
df = cudf.read_csv("iris/iris.data")
df.columns=['sepal_len', 'sepal_wid', 'petal_len', 'petal_wid', 'class']
df.dropna(how="all", inplace=True)
labels = df["class"]
df = df.drop(["class"], axis = 1).astype(np.float32)
sepal_len sepal_wid petal_len petal_wid
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
Load CSV & Preprocess
54
Dimensionality Reduction
Code Example
import cudf
df = cudf.read_csv("iris/iris.data")
df.columns=['sepal_len', 'sepal_wid', 'petal_len', 'petal_wid', 'class']
df.dropna(how="all", inplace=True)
labels = df["class"]
df = df.drop(["class"], axis = 1).astype(np.float32)
from cuml import PCA
pca = PCA(n_components=df.shape[1])
pca.fit(df)
print("Explained var %: %s': " % pca.explained_variance_ratio_)
first_two = sum(pca.explained_variance_ratio_[0:2])
print("nFirst two: %f" % first_two)
sepal_len sepal_wid petal_len petal_wid
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
Load CSV & Preprocess
Explore Variance
55
Dimensionality Reduction
Code Example
import cudf
df = cudf.read_csv("iris/iris.data")
df.columns=['sepal_len', 'sepal_wid', 'petal_len', 'petal_wid', 'class']
df.dropna(how="all", inplace=True)
labels = df["class"]
df = df.drop(["class"], axis = 1).astype(np.float32)
from cuml import PCA
pca = PCA(n_components=df.shape[1])
pca.fit(df)
print("Explained var %: %s': " % pca.explained_variance_ratio_)
first_two = sum(pca.explained_variance_ratio_[0:2])
print("nFirst two: %f" % first_two)
sepal_len sepal_wid petal_len petal_wid
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
Explained var %:
0 0.92461634
1 0.05301554
2 0.017185122
3 0.005183111
First two: 0.9776318781077862
Load CSV & Preprocess
Explore Variance
56
Dimensionality Reduction
pca = PCA(n_components = 2)
pca.fit(df)
X = pca.transform(df)
Code Example
Reduce Dimensions
57
Dimensionality Reduction
pca = PCA(n_components = 2)
pca.fit(df)
X = pca.transform(df)
Code Example
uniq = set(labels.values)
colors = [plt.cm.Spectral(each)
for each in np.linspace(0, 1, len(uniq))]
for k, col in zip(uniq, colors):
c = X[l==k].as_gpu_matrix()
plt.plot(c[:,0], c[:,1], '.',
markerfacecolor=tuple(col))
Reduce Dimensions
Chart Rotated Data
58
Dimensionality Reduction
pca = PCA(n_components = 2)
pca.fit(df)
X = pca.transform(df)
Code Example
uniq = set(labels.values)
colors = [plt.cm.Spectral(each)
for each in np.linspace(0, 1, len(uniq))]
for k, col in zip(uniq, colors):
c = X[l==k].as_gpu_matrix()
plt.plot(c[:,0], c[:,1], '.',
markerfacecolor=tuple(col))
Reduce Dimensions
Chart Rotated Data
59
Dimensionality Reduction
pca = PCA(n_components = 2)
pca.fit(df)
X = pca.transform(df)
Code Example
uniq = set(labels.values)
colors = [plt.cm.Spectral(each)
for each in np.linspace(0, 1, len(uniq))]
for k, col in zip(uniq, colors):
c = X[l==k].as_gpu_matrix()
plt.plot(c[:,0], c[:,1], '.',
markerfacecolor=tuple(col))
Reduce Dimensions
Chart Rotated Data
60
Clustering
Code Example
X, y = make_moons(n_samples=int(1e2),
noise=0.05, random_state=0)
X_df = pd.DataFrame({'fea%d'%i: X[:, i]
for i in range(X.shape[1])})
X = cudf.DataFrame.from_pandas(X_df)
Load Moons Dataset and Preprocess
61
Clustering
Code Example
X, y = make_moons(n_samples=int(1e2),
noise=0.05, random_state=0)
X_df = pd.DataFrame({'fea%d'%i: X[:, i]
for i in range(X.shape[1])})
X = cudf.DataFrame.from_pandas(X_df)
Load Moons Dataset and Preprocess
62
Clustering
Code Example
X, y = make_moons(n_samples=int(1e2),
noise=0.05, random_state=0)
X_df = pd.DataFrame({'fea%d'%i: X[:, i]
for i in range(X.shape[1])})
X = cudf.DataFrame.from_pandas(X_df)
Load Moons Dataset and Preprocess
63
Clustering
from cuml import DBSCAN
dbscan = DBSCAN(eps = 0.3, min_samples = 5)
dbscan.fit(X)
y_hat = db.fit_predict(X)
Code Example
Find Clusters
X, y = make_moons(n_samples=int(1e2),
noise=0.05, random_state=0)
X_df = pd.DataFrame({'fea%d'%i: X[:, i]
for i in range(X.shape[1])})
X = cudf.DataFrame.from_pandas(X_df)
Load Moons Dataset and Preprocess
64
Clustering
from cuml import DBSCAN
dbscan = DBSCAN(eps = 0.3, min_samples = 5)
dbscan.fit(X)
y_hat = db.fit_predict(X)
Code Example
Find Clusters
X, y = make_moons(n_samples=int(1e2),
noise=0.05, random_state=0)
X_df = pd.DataFrame({'fea%d'%i: X[:, i]
for i in range(X.shape[1])})
X = cudf.DataFrame.from_pandas(X_df)
Load Moons Dataset and Preprocess
65
cuML
Benchmarks of initial algorithms
66
cuDF + XGBoost
DGX-2 vs Scale Out CPU Cluster
• Full end to end pipeline
• Leveraging Dask + cuDF
• Store each GPU results in sys mem then read back in
• Arrow to Dmatrix (CSR) for XGBoost
67
cuDF + XGBoost
Scale Out GPU Cluster vs DGX-2
0 50 100 150 200 250 300 350
5xDGX-1
DGX-2
Chart Title
ETL+CSV (s) ML Prep (s) ML (s)
• Full end to end pipeline
• Leveraging Dask for multi-node + cuDF
• Store each GPU results in sys mem then read back in
• Arrow to Dmatrix (CSR) for XGBoost
68
cuDF + XGBoost
Fully In- GPU Benchmarks
• Full end to end pipeline
• Leveraging Dask cuDF
• No Data Prep time all in memory
• Arrow to Dmatrix (CSR) for XGBoost
69
XGBoost
Multi-node, Multi-GPU Performance
70
End-to-End Benchmarks
2,290
1,956
1,999
1,948
169
157
0 500 1,000 1,500 2,000 2,500
20 CPU Nodes
30 CPU Nodes
50 CPU Nodes
100 CPU Nodes
DGX-2
5x DGX-1
0 2,000 4,000 6,000 8,000 10,000
20 CPU Nodes
30 CPU Nodes
50 CPU Nodes
100 CPU Nodes
DGX-2
5x DGX-1
2,741
1,675
715
379
42
19
0 1,000 2,000 3,000
20 CPU Nodes
30 CPU Nodes
50 CPU Nodes
100 CPU Nodes
DGX-2
5x DGX-1
End-to-End
cuIO/cuDF —
Load and Data Preparation
Benchmark
200GB CSV dataset; Data preparation
includes joins, variable transformations.
CPU Cluster Configuration
CPU nodes (61 GiB of memory, 8 vCPUs,
64-bit platform), Apache Spark
DGX Cluster Configuration
5x DGX-1 on InfiniBand network
cuIO / cuDF (Load and Data Preparation) Data Conversion XGBoost
cuML — XGBoost
Time in seconds - Shorter is better
71
• https://ngc.nvidia.com/registry/nvidia-
rapidsai-rapidsai
• https://hub.docker.com/r/rapidsai/rapidsai/
• https://github.com/rapidsai
• https://anaconda.org/rapidsai/
• WIP:
• https://pypi.org/project/cudf
• https://pypi.org/project/cuml
RAPIDS
How do I get the software?
RAPIDS – Open GPU-accelerated Data Science

Contenu connexe

Tendances

Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkDatabricks
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop GuideSimplilearn
 
Pandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySparkPandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySparkLi Jin
 
Powering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta LakePowering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta LakeDatabricks
 
Data pipeline and data lake
Data pipeline and data lakeData pipeline and data lake
Data pipeline and data lakeDaeMyung Kang
 
Introduction to Chainer 11 may,2018
Introduction to Chainer 11 may,2018Introduction to Chainer 11 may,2018
Introduction to Chainer 11 may,2018Preferred Networks
 
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)Brian Brazil
 
2015年度GPGPU実践基礎工学 第13回 GPUのメモリ階層
2015年度GPGPU実践基礎工学 第13回 GPUのメモリ階層2015年度GPGPU実践基礎工学 第13回 GPUのメモリ階層
2015年度GPGPU実践基礎工学 第13回 GPUのメモリ階層智啓 出川
 
Apache spark 소개 및 실습
Apache spark 소개 및 실습Apache spark 소개 및 실습
Apache spark 소개 및 실습동현 강
 
Understanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitUnderstanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitSpark Summit
 
Real time big data stream processing
Real time big data stream processing Real time big data stream processing
Real time big data stream processing Luay AL-Assadi
 
Spark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student SlidesSpark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student SlidesDatabricks
 
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)Yongho Ha
 
[study] Long Text Generation via Adversarial Training with Leaked Information
[study] Long Text Generation via Adversarial Training with Leaked Information[study] Long Text Generation via Adversarial Training with Leaked Information
[study] Long Text Generation via Adversarial Training with Leaked InformationGyuhyeon Nam
 
Best Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflowBest Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflowDatabricks
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's EvolutionDataWorks Summit
 
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop AdoptionYARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop AdoptionDataWorks Summit
 

Tendances (20)

Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop Guide
 
RAPIDS 概要
RAPIDS 概要RAPIDS 概要
RAPIDS 概要
 
Pandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySparkPandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySpark
 
Powering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta LakePowering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta Lake
 
Data pipeline and data lake
Data pipeline and data lakeData pipeline and data lake
Data pipeline and data lake
 
Introduction to Chainer 11 may,2018
Introduction to Chainer 11 may,2018Introduction to Chainer 11 may,2018
Introduction to Chainer 11 may,2018
 
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)
 
HDFS: Optimization, Stabilization and Supportability
HDFS: Optimization, Stabilization and SupportabilityHDFS: Optimization, Stabilization and Supportability
HDFS: Optimization, Stabilization and Supportability
 
2015年度GPGPU実践基礎工学 第13回 GPUのメモリ階層
2015年度GPGPU実践基礎工学 第13回 GPUのメモリ階層2015年度GPGPU実践基礎工学 第13回 GPUのメモリ階層
2015年度GPGPU実践基礎工学 第13回 GPUのメモリ階層
 
Apache spark 소개 및 실습
Apache spark 소개 및 실습Apache spark 소개 및 실습
Apache spark 소개 및 실습
 
Understanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitUnderstanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And Profit
 
Real time big data stream processing
Real time big data stream processing Real time big data stream processing
Real time big data stream processing
 
Spark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student SlidesSpark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student Slides
 
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
 
Elk stack
Elk stackElk stack
Elk stack
 
[study] Long Text Generation via Adversarial Training with Leaked Information
[study] Long Text Generation via Adversarial Training with Leaked Information[study] Long Text Generation via Adversarial Training with Leaked Information
[study] Long Text Generation via Adversarial Training with Leaked Information
 
Best Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflowBest Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflow
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's Evolution
 
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop AdoptionYARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
 

Similaire à RAPIDS – Open GPU-accelerated Data Science

GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020John Zedlewski
 
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdfS51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdfDLow6
 
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...E-Commerce Brasil
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFKeith Kraus
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAlluxio, Inc.
 
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSAccelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSDatabricks
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsConnected Data World
 
End to End Machine Learning Open Source Solution Presented in Cisco Developer...
End to End Machine Learning Open Source Solution Presented in Cisco Developer...End to End Machine Learning Open Source Solution Presented in Cisco Developer...
End to End Machine Learning Open Source Solution Presented in Cisco Developer...Manish Harsh
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Lablup Inc.
 
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習NVIDIA Taiwan
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Databricks
 
Innovation with ai at scale on the edge vt sept 2019 v0
Innovation with ai at scale  on the edge vt sept 2019 v0Innovation with ai at scale  on the edge vt sept 2019 v0
Innovation with ai at scale on the edge vt sept 2019 v0Ganesan Narayanasamy
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUsiguazio
 
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Matej Misik
 
RAPIDS, GPUs & Python - AWS Community Day Melbourne
RAPIDS, GPUs & Python - AWS Community Day MelbourneRAPIDS, GPUs & Python - AWS Community Day Melbourne
RAPIDS, GPUs & Python - AWS Community Day MelbourneRay Hilton
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platforminside-BigData.com
 
GOAI: GPU-Accelerated Data Science DataSciCon 2017
GOAI: GPU-Accelerated Data Science DataSciCon 2017GOAI: GPU-Accelerated Data Science DataSciCon 2017
GOAI: GPU-Accelerated Data Science DataSciCon 2017Joshua Patterson
 
Introduction to PowerAI - The Enterprise AI Platform
Introduction to PowerAI - The Enterprise AI PlatformIntroduction to PowerAI - The Enterprise AI Platform
Introduction to PowerAI - The Enterprise AI PlatformIndrajit Poddar
 
Introduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AIIntroduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AITyrone Systems
 
sudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJAsudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJANicolas Poggi
 

Similaire à RAPIDS – Open GPU-accelerated Data Science (20)

GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
 
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdfS51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
 
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
 
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSAccelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
End to End Machine Learning Open Source Solution Presented in Cisco Developer...
End to End Machine Learning Open Source Solution Presented in Cisco Developer...End to End Machine Learning Open Source Solution Presented in Cisco Developer...
End to End Machine Learning Open Source Solution Presented in Cisco Developer...
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
 
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
 
Innovation with ai at scale on the edge vt sept 2019 v0
Innovation with ai at scale  on the edge vt sept 2019 v0Innovation with ai at scale  on the edge vt sept 2019 v0
Innovation with ai at scale on the edge vt sept 2019 v0
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUs
 
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
 
RAPIDS, GPUs & Python - AWS Community Day Melbourne
RAPIDS, GPUs & Python - AWS Community Day MelbourneRAPIDS, GPUs & Python - AWS Community Day Melbourne
RAPIDS, GPUs & Python - AWS Community Day Melbourne
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platform
 
GOAI: GPU-Accelerated Data Science DataSciCon 2017
GOAI: GPU-Accelerated Data Science DataSciCon 2017GOAI: GPU-Accelerated Data Science DataSciCon 2017
GOAI: GPU-Accelerated Data Science DataSciCon 2017
 
Introduction to PowerAI - The Enterprise AI Platform
Introduction to PowerAI - The Enterprise AI PlatformIntroduction to PowerAI - The Enterprise AI Platform
Introduction to PowerAI - The Enterprise AI Platform
 
Introduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AIIntroduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AI
 
sudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJAsudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJA
 

Plus de Data Works MD

Data Journalism at The Baltimore Banner
Data Journalism at The Baltimore BannerData Journalism at The Baltimore Banner
Data Journalism at The Baltimore BannerData Works MD
 
Jolt’s Picks - Machine Learning and Major League Baseball Hit Streaks
Jolt’s Picks - Machine Learning and Major League Baseball Hit StreaksJolt’s Picks - Machine Learning and Major League Baseball Hit Streaks
Jolt’s Picks - Machine Learning and Major League Baseball Hit StreaksData Works MD
 
Introducing DataWave
Introducing DataWaveIntroducing DataWave
Introducing DataWaveData Works MD
 
Malware Detection, Enabled by Machine Learning
Malware Detection, Enabled by Machine LearningMalware Detection, Enabled by Machine Learning
Malware Detection, Enabled by Machine LearningData Works MD
 
Using AWS, Terraform, and Ansible to Automate Splunk at Scale
Using AWS, Terraform, and Ansible to Automate Splunk at ScaleUsing AWS, Terraform, and Ansible to Automate Splunk at Scale
Using AWS, Terraform, and Ansible to Automate Splunk at ScaleData Works MD
 
A Day in the Life of a Data Journalist
A Day in the Life of a Data JournalistA Day in the Life of a Data Journalist
A Day in the Life of a Data JournalistData Works MD
 
Robotics and Machine Learning: Working with NVIDIA Jetson Kits
Robotics and Machine Learning: Working with NVIDIA Jetson KitsRobotics and Machine Learning: Working with NVIDIA Jetson Kits
Robotics and Machine Learning: Working with NVIDIA Jetson KitsData Works MD
 
Connect Data and Devices with Apache NiFi
Connect Data and Devices with Apache NiFiConnect Data and Devices with Apache NiFi
Connect Data and Devices with Apache NiFiData Works MD
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningData Works MD
 
Data in the City: Analytics and Civic Data in Baltimore
Data in the City: Analytics and Civic Data in BaltimoreData in the City: Analytics and Civic Data in Baltimore
Data in the City: Analytics and Civic Data in BaltimoreData Works MD
 
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...Data Works MD
 
Automated Software Requirements Labeling
Automated Software Requirements LabelingAutomated Software Requirements Labeling
Automated Software Requirements LabelingData Works MD
 
Introduction to Elasticsearch for Business Intelligence and Application Insights
Introduction to Elasticsearch for Business Intelligence and Application InsightsIntroduction to Elasticsearch for Business Intelligence and Application Insights
Introduction to Elasticsearch for Business Intelligence and Application InsightsData Works MD
 
An Asynchronous Distributed Deep Learning Based Intrusion Detection System fo...
An Asynchronous Distributed Deep Learning Based Intrusion Detection System fo...An Asynchronous Distributed Deep Learning Based Intrusion Detection System fo...
An Asynchronous Distributed Deep Learning Based Intrusion Detection System fo...Data Works MD
 
Two Algorithms for Weakly Supervised Denoising of EEG Data
Two Algorithms for Weakly Supervised Denoising of EEG DataTwo Algorithms for Weakly Supervised Denoising of EEG Data
Two Algorithms for Weakly Supervised Denoising of EEG DataData Works MD
 
Detecting Lateral Movement with a Compute-Intense Graph Kernel
Detecting Lateral Movement with a Compute-Intense Graph KernelDetecting Lateral Movement with a Compute-Intense Graph Kernel
Detecting Lateral Movement with a Compute-Intense Graph KernelData Works MD
 
Predictive Analytics and Neighborhood Health
Predictive Analytics and Neighborhood HealthPredictive Analytics and Neighborhood Health
Predictive Analytics and Neighborhood HealthData Works MD
 
Social Network Analysis Workshop
Social Network Analysis WorkshopSocial Network Analysis Workshop
Social Network Analysis WorkshopData Works MD
 

Plus de Data Works MD (18)

Data Journalism at The Baltimore Banner
Data Journalism at The Baltimore BannerData Journalism at The Baltimore Banner
Data Journalism at The Baltimore Banner
 
Jolt’s Picks - Machine Learning and Major League Baseball Hit Streaks
Jolt’s Picks - Machine Learning and Major League Baseball Hit StreaksJolt’s Picks - Machine Learning and Major League Baseball Hit Streaks
Jolt’s Picks - Machine Learning and Major League Baseball Hit Streaks
 
Introducing DataWave
Introducing DataWaveIntroducing DataWave
Introducing DataWave
 
Malware Detection, Enabled by Machine Learning
Malware Detection, Enabled by Machine LearningMalware Detection, Enabled by Machine Learning
Malware Detection, Enabled by Machine Learning
 
Using AWS, Terraform, and Ansible to Automate Splunk at Scale
Using AWS, Terraform, and Ansible to Automate Splunk at ScaleUsing AWS, Terraform, and Ansible to Automate Splunk at Scale
Using AWS, Terraform, and Ansible to Automate Splunk at Scale
 
A Day in the Life of a Data Journalist
A Day in the Life of a Data JournalistA Day in the Life of a Data Journalist
A Day in the Life of a Data Journalist
 
Robotics and Machine Learning: Working with NVIDIA Jetson Kits
Robotics and Machine Learning: Working with NVIDIA Jetson KitsRobotics and Machine Learning: Working with NVIDIA Jetson Kits
Robotics and Machine Learning: Working with NVIDIA Jetson Kits
 
Connect Data and Devices with Apache NiFi
Connect Data and Devices with Apache NiFiConnect Data and Devices with Apache NiFi
Connect Data and Devices with Apache NiFi
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Data in the City: Analytics and Civic Data in Baltimore
Data in the City: Analytics and Civic Data in BaltimoreData in the City: Analytics and Civic Data in Baltimore
Data in the City: Analytics and Civic Data in Baltimore
 
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
 
Automated Software Requirements Labeling
Automated Software Requirements LabelingAutomated Software Requirements Labeling
Automated Software Requirements Labeling
 
Introduction to Elasticsearch for Business Intelligence and Application Insights
Introduction to Elasticsearch for Business Intelligence and Application InsightsIntroduction to Elasticsearch for Business Intelligence and Application Insights
Introduction to Elasticsearch for Business Intelligence and Application Insights
 
An Asynchronous Distributed Deep Learning Based Intrusion Detection System fo...
An Asynchronous Distributed Deep Learning Based Intrusion Detection System fo...An Asynchronous Distributed Deep Learning Based Intrusion Detection System fo...
An Asynchronous Distributed Deep Learning Based Intrusion Detection System fo...
 
Two Algorithms for Weakly Supervised Denoising of EEG Data
Two Algorithms for Weakly Supervised Denoising of EEG DataTwo Algorithms for Weakly Supervised Denoising of EEG Data
Two Algorithms for Weakly Supervised Denoising of EEG Data
 
Detecting Lateral Movement with a Compute-Intense Graph Kernel
Detecting Lateral Movement with a Compute-Intense Graph KernelDetecting Lateral Movement with a Compute-Intense Graph Kernel
Detecting Lateral Movement with a Compute-Intense Graph Kernel
 
Predictive Analytics and Neighborhood Health
Predictive Analytics and Neighborhood HealthPredictive Analytics and Neighborhood Health
Predictive Analytics and Neighborhood Health
 
Social Network Analysis Workshop
Social Network Analysis WorkshopSocial Network Analysis Workshop
Social Network Analysis Workshop
 

Dernier

Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 

Dernier (20)

Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 

RAPIDS – Open GPU-accelerated Data Science

  • 1. Adam Thompson | adamt@nvidia.com | Senior Solutions Architect Corey Nolet | cnolet@nvidia.com | Data Scientist and Senior Engineer GPU Enabled Data Science
  • 2. 2 About Us Adam Thompson @adamlikesai Senior Solutions Architect at NVIDIA, Signals/RF SME Research interests include applications of machine and deep learning to non- traditional data types, HPC, and large system integration Experience in programming GPUs for signal processing tasks since CUDA Toolkit 2 Roots for and Corey Nolet @cjnolet Data Scientist & Senior Engineer on the RAPIDS cuML team at NVIDIA Focuses on building and scaling machine learning algorithms to support extreme data loads at light-speed Over a decade experience building massive-scale exploratory data science & real- time analytics platforms for HPC environments in the defense industry Working towards PhD in Computer Science, focused on unsupervised ML
  • 3. 3 1980 1990 2000 2010 2020 102 103 104 105 106 107 Single-threaded perf 1.5X per year 1.1X per year GPU-Computing perf 1.5X per year 1000X By 2025 REALITIES OF PERFORMANCE GPU Performance Grows
  • 5. 5 ML Workflow Stifles Innovation It Requires Exploration and Iterations All Data ETL Manage Data Structured Data Store Data Preparation Training Model Training Visualization Evaluate Inference Deploy Accelerating just `Model Training` does have benefit but doesn’t address the whole problem End-to-End acceleration is needed Iterate … Cross Validate … Grid Search … Iterate some more.
  • 6. 6
  • 7. 7 GPU Adoption Barriers • Too much data movement • Too many makeshift data formats • Writing CUDA C/C++ is hard • No Python API for data manipulation Yes GPUs are fast but …
  • 8. 8 APP A DATA MOVEMENT AND TRANSFORMATION The bane of productivity and performance CPU GPU APP B Read Data Copy & Convert Copy & Convert Copy & Convert Load Data APP A GPU Data APP B GPU Data APP A APP B
  • 9. 9 APP A DATA MOVEMENT AND TRANSFORMATION What if we could keep data on the GPU? APP B Copy & Convert Copy & Convert Copy & Convert APP A GPU Data APP B GPU Data Read Data Load Data APP B CPU GPU APP A
  • 10. 10 Learning from APACHE ARROW Source: Apache Arrow Home Page (https://arrow.apache.org/)
  • 11. 11 Data Processing Evolution Faster Data Access Less Data Movement 25-100x Improvement Less code Language flexible Primarily In-Memory HDFS Read HDFS Write HDFS Read HDFS Write HDFS ReadQuery ETL ML Train HDFS Read Query ETL ML Train HDFS Read GPU Read Query CPU Write GPU Read ETL CPU Write GPU Read ML Train Arrow Read Query ETL ML Train 5-10x Improvement More code Language rigid Substantially on GPU 50-100x Improvement Same code Language flexible Primarily on GPU RAPIDS GPU/Spark In-Memory Processing Hadoop Processing, Reading from disk Spark In-Memory Processing
  • 12. 12 APPLICATIONS SYSTEMS ALGORITHMS CUDA ARCHITECTURE • Learn what the data science community needs • Use best practices and standards • Build scalable systems and algorithms • Test Applications and workflows • Iterate
  • 13. 13 cuDF Analytics GPU Memory Data Preparation VisualizationModel Training cuML Machine Learning cuGraph Graph Analytics PyTorch & Chainer Deep Learning Kepler.GL Visualization RAPIDS Open Source SOFTWARE
  • 14. 14 RAPIDS: OPEN GPU DATA SCIENCE cuDF, cuML, and cuGraph mimic well-known libraries Data Preparation VisualizationModel Training CUDA PYTHON DASK DL FRAMEWORKS CUDNN RAPIDS CUMLCUDF CUGRAPH APACHE ARROW Pandas-like ScikitLearn-like NetworkX-like
  • 15. 15 Dask What is Dask and why does RAPIDS use it for scaling out? • Dask is a distributed computation scheduler built to scale Python workloads from laptops to supercomputer clusters. • Extremely modular with scheduling, compute, data transfer, and out-of-core handling all being disjointed allowing us to plug in our own implementations. • Can easily run multiple Dask workers per node to allow for an easier development model of one worker per GPU regardless of single node or multi node environment.
  • 16. 16 Dask Scale up and out with cuDF • Use cuDF primitives underneath in map-reduce style operations with the same high level API • Instead of using typical Dask data movement of pickling objects and sending via TCP sockets, take advantage of hardware advancements using a communications framework called OpenUCX: • For intranode data movement, utilize NVLink and PCIe peer-to-peer communications • For internode data movement, utilize GPU RDMA over Infiniband and RoCE https://github.com/rapidsai/dask_gdf
  • 17. 17 Faster speeds, real world benefits 2,290 1,956 1,999 1,948 169 157 0 500 1,000 1,500 2,000 2,500 20 CPU Nodes 30 CPU Nodes 50 CPU Nodes 100 CPU Nodes DGX-2 5x DGX-1 0 2,000 4,000 6,000 8,000 10,000 20 CPU Nodes 30 CPU Nodes 50 CPU Nodes 100 CPU Nodes DGX-2 5x DGX-1 cuML — XGBoost 2,741 1,675 715 379 42 19 0 1,000 2,000 3,000 20 CPU Nodes 30 CPU Nodes 50 CPU Nodes 100 CPU Nodes DGX-2 5x DGX-1 End-to-End cuIO/cuDF — Load and Data Preparation Benchmark 200GB CSV dataset; Data preparation includes joins, variable transformations. CPU Cluster Configuration CPU nodes (61 GiB of memory, 8 vCPUs, 64-bit platform), Apache Spark DGX Cluster Configuration 5x DGX-1 on InfiniBand network Time in seconds — Shorter is better cuIO / cuDF (Load and Data Preparation) Data Conversion XGBoost 8,762 6,148 3,925 3,221 322 213
  • 18. 18 AI Libraries Accelerating more of the AI ecosystem Graph Analytics is fundamental to network analysis Machine Learning is fundamental to prediction, classification, clustering, anomaly detection and recommendations. Both can be accelerated with NVIDIA GPU 8x V100 20-90x faster than dual socket CPU Decisions Trees Random Forests Linear Regressions Logistics Regressions K-Means K-Nearest Neighbor DBSCAN Kalman Filtering Principal Components Single Value Decomposition Bayesian Inferencing PageRank BFS Jaccard Similarity Single Source Shortest Path Triangle Counting Louvain Modularity ARIMA Holt-Winters Machine Learning Graph Analytics Time Series XGBoost, Mortgage Dataset, 90x 3 Hours to 2 mins on 1 DGX-1 cuML & cuGraph
  • 20. 20 RAPIDS: OPEN GPU DATA SCIENCE Following the Pandas API for optimal code reuse Data Preparation VisualizationModel Training CUDA PYTHON DASK DL FRAMEWORKS CUDNN RAPIDS CUMLCUDF CUGRAPH APACHE ARROW Pandas-like import pandas as pd import cudf … df = pd.read_csv('foo.csv', names=['index', 'A'], dtype=['date', 'float64']) gdf = cudf.read_csv('foo.csv', names=['index', 'A'], dtype=['date', 'float64’]) … # eventually majority of pandas features will be available in cudf
  • 21. 21 cuDF GPU DataFrame library • Apache Arrow data format • Pandas-like API • Unary and Binary Operations • Joins / Merges • GroupBys • Filters • User-Defined Functions (UDFs) • Accelerated file readers • Much, much more
  • 22. 22 CUDF Today CUDA With Python Bindings • Low level library containing function implementations and C/C++ API • Importing/exporting Apache Arrow using the CUDA IPC mechanism • CUDA kernels to perform element-wise math operations on GPU DataFrame columns • CUDA sort, join, groupby, and reduction operations on GPU DataFrames • A Python library for manipulating GPU DataFrames • Python interface to CUDA C++ with additional functionality • Creating Apache Arrow from Numpy arrays, Pandas DataFrames, and PyArrow Tables • JIT compilation of User-Defined Functions (UDFs) using Numba
  • 23. 23 GPU-Accelerated string functions with a Pandas-like API cuString & NVSTring • API and functionality is following Pandas: https://pandas.pydata.org/pandas- docs/stable/api.html#string-handling • lower() • ~22x speedup • find() • ~40x speedup • slice() • ~100x speedup 0.00 100.00 200.00 300.00 400.00 500.00 600.00 700.00 800.00 lower() find(#) slice(1,15) milliseconds Pandas cudastrings
  • 24. 24 Next few months GPU Dataframe • Continue improving performance and functionality • Single GPU • Single node, multi GPU • Multi node, multi GPU • String Support • Support for specific “string” dtype with GPU-accelerated functionality similar to Pandas • Accelerated Data Loading • File formats: CSV, Parquet, ORC – to start
  • 26. 26 CUGRAPH GPU-Accelerated Graph Analytics Library Coming Soon: Full NVGraph Integration Q1 2019
  • 27. 27 Early cugraph benchmarks NVGRAPH methods (not currently in cuGraph) 100 billion TEPS (single GPU) Can reach 300 billion TEPS on MG (4 GPUs) with hand-tuning PageRank (SG) on graph with 5.5m edges and 51k nodes with RAPIDS CPU (GraphFrames, Spark) [576 GB memory, 384 vcores]: 172 seconds GPU (cuGraph, V100) [32 GB memory, 5120 CUDA cores]: 671 ms Speed-up of 256x For large graphs, there are huge parallelization benefits
  • 31. 31 Problem Data sizes continue to grow min(variance) min(bias)
  • 32. 32 Problem Data sizes continue to grow Histograms / Distributions Dimension Reduction Feature Selection Remove Outliers Sampling
  • 33. 33 Problem Data sizes continue to grow Histograms / Distributions Dimension Reduction Feature Selection Remove Outliers Sampling Better to start with as much data as possible and explore / preprocess to scale to performance needs.
  • 34. 34 Problem Data sizes continue to grow Histograms / Distributions Dimension Reduction Feature Selection Remove Outliers Sampling Massive Dataset Better to start with as much data as possible and explore / preprocess to scale to performance needs.
  • 35. 35 Problem Data sizes continue to grow Histograms / Distributions Dimension Reduction Feature Selection Remove Outliers Sampling Massive Dataset Better to start with as much data as possible and explore / preprocess to scale to performance needs.
  • 36. 36 Problem Data sizes continue to grow Histograms / Distributions Dimension Reduction Feature Selection Remove Outliers Sampling Massive Dataset Better to start with as much data as possible and explore / preprocess to scale to performance needs. Iterate.
  • 37. 37 Problem Data sizes continue to grow Histograms / Distributions Dimension Reduction Feature Selection Remove Outliers Sampling Massive Dataset Better to start with as much data as possible and explore / preprocess to scale to performance needs. Iterate. Cross Validate.
  • 38. 38 Problem Data sizes continue to grow Histograms / Distributions Dimension Reduction Feature Selection Remove Outliers Sampling Massive Dataset Better to start with as much data as possible and explore / preprocess to scale to performance needs. Iterate. Cross Validate & Grid Search.
  • 39. 39 Problem Data sizes continue to grow Histograms / Distributions Dimension Reduction Feature Selection Remove Outliers Sampling Massive Dataset Better to start with as much data as possible and explore / preprocess to scale to performance needs. Iterate. Cross Validate & Grid Search. Iterate some more.
  • 40. 40 Problem Data sizes continue to grow Histograms / Distributions Dimension Reduction Feature Selection Remove Outliers Sampling Massive Dataset Better to start with as much data as possible and explore / preprocess to scale to performance needs. Iterate. Cross Validate & Grid Search. Iterate some more. Meet reasonable speed vs accuracy tradeoff
  • 41. 41 Problem Data sizes continue to grow Histograms / Distributions Dimension Reduction Feature Selection Remove Outliers Sampling Massive Dataset Better to start with as much data as possible and explore / preprocess to scale to performance needs. Iterate. Cross Validate & Grid Search. Iterate some more. Meet reasonable speed vs accuracy tradeoff Time Increases
  • 42. 42 Problem Data sizes continue to grow Histograms / Distributions Dimension Reduction Feature Selection Remove Outliers Sampling Massive Dataset Better to start with as much data as possible and explore / preprocess to scale to performance needs. Iterate. Cross Validate & Grid Search. Iterate some more. Meet reasonable speed vs accuracy tradeoff Hours? Time Increases
  • 43. 43 Problem Data sizes continue to grow Histograms / Distributions Dimension Reduction Feature Selection Remove Outliers Sampling Massive Dataset Better to start with as much data as possible and explore / preprocess to scale to performance needs. Iterate. Cross Validate & Grid Search. Iterate some more. Meet reasonable speed vs accuracy tradeoff Hours? Days? Time Increases
  • 44. 44 “More data requires better approaches!” ~ Xavier Amatriain (CTO of Curai)
  • 45. 45 cuML API Python Algorithms Primitives GPU-accelerated machine learning at every layer Scikit-learn-like interface for data scientists utilizing cuDF & Numpy C++ API for developers to utilize accelerated machine learning algorithms. Reusable building blocks for composing machine learning algorithms.
  • 46. 46 Linear Algebra Primitives GPU-accelerated math optimized for feature matrices Statistics Sparse Random Distance / Metrics Optimizers More to come! • Element-wise operations • Matrix multiply • Norms • Eigen Decomposition • SVD/RSVD • Transpose • QR Decomposition
  • 47. 47 Algorithms GPU-accelerated Scikit-Learn Classification / Regression Statistical Inference Clustering Decomposition & Dimensionality Reduction Timeseries Forecasting Recommendations Decision Trees / Random Forests Linear Regression Logistic Regression K-Nearest Neighbors Kalman Filtering Bayesian Inference Gaussian Mixture Models Hidden Markov Models K-Means DBSCAN Spectral Clustering Principal Components Singular Value Decomposition UMAP Spectral Embedding ARIMA Holt-Winters Implicit Matrix Factorization Cross Validation More to come! Hyper-parameter Tuning
  • 48. 48 HIGH-LEVEL APIs CUDA/C++ Multi-Node & Multi-GPU Communications ML Primitives ML Algorithms Python Dask Multi-GPU ML Scikit-Learn-Like Host 2 GPU1 GPU3 GPU2 GPU4 Host 1 GPU1 GPU3 GPU2 GPU4
  • 49. 49 HIGH-LEVEL APIs CUDA/C++ Multi-Node / Multi-GPU Communications ML Primitives ML Algorithms Python Dask Multi-GPU ML Scikit-Learn-Like Host 2 GPU1 GPU3 GPU2 GPU4 Host 1 GPU1 GPU3 GPU2 GPU4 Data Parallelism Model Parallelism
  • 50. 50 ML Technology Stack Python Cython cuML Algorithms cuML Prims CUDA Libraries CUDA C++ C++ C++
  • 51. 51 ML Technology Stack Dask cuML Dask cuDF cuDF Numpy Thrust Cub cuSolver nvGraph CUTLASS cuSparse cuRand cuBlas Python Cython cuML Algorithms cuML Prims CUDA Libraries CUDA C++ C++ C++
  • 52. 52 Dimensionality Reduction Code Example import cudf df = cudf.read_csv("iris/iris.data") df.columns=['sepal_len', 'sepal_wid', 'petal_len', 'petal_wid', 'class'] df.dropna(how="all", inplace=True) labels = df["class"] df = df.drop(["class"], axis = 1).astype(np.float32) Load CSV & Preprocess
  • 53. 53 Dimensionality Reduction Code Example import cudf df = cudf.read_csv("iris/iris.data") df.columns=['sepal_len', 'sepal_wid', 'petal_len', 'petal_wid', 'class'] df.dropna(how="all", inplace=True) labels = df["class"] df = df.drop(["class"], axis = 1).astype(np.float32) sepal_len sepal_wid petal_len petal_wid 0 5.1 3.5 1.4 0.2 1 4.9 3.0 1.4 0.2 2 4.7 3.2 1.3 0.2 3 4.6 3.1 1.5 0.2 4 5.0 3.6 1.4 0.2 Load CSV & Preprocess
  • 54. 54 Dimensionality Reduction Code Example import cudf df = cudf.read_csv("iris/iris.data") df.columns=['sepal_len', 'sepal_wid', 'petal_len', 'petal_wid', 'class'] df.dropna(how="all", inplace=True) labels = df["class"] df = df.drop(["class"], axis = 1).astype(np.float32) from cuml import PCA pca = PCA(n_components=df.shape[1]) pca.fit(df) print("Explained var %: %s': " % pca.explained_variance_ratio_) first_two = sum(pca.explained_variance_ratio_[0:2]) print("nFirst two: %f" % first_two) sepal_len sepal_wid petal_len petal_wid 0 5.1 3.5 1.4 0.2 1 4.9 3.0 1.4 0.2 2 4.7 3.2 1.3 0.2 3 4.6 3.1 1.5 0.2 4 5.0 3.6 1.4 0.2 Load CSV & Preprocess Explore Variance
  • 55. 55 Dimensionality Reduction Code Example import cudf df = cudf.read_csv("iris/iris.data") df.columns=['sepal_len', 'sepal_wid', 'petal_len', 'petal_wid', 'class'] df.dropna(how="all", inplace=True) labels = df["class"] df = df.drop(["class"], axis = 1).astype(np.float32) from cuml import PCA pca = PCA(n_components=df.shape[1]) pca.fit(df) print("Explained var %: %s': " % pca.explained_variance_ratio_) first_two = sum(pca.explained_variance_ratio_[0:2]) print("nFirst two: %f" % first_two) sepal_len sepal_wid petal_len petal_wid 0 5.1 3.5 1.4 0.2 1 4.9 3.0 1.4 0.2 2 4.7 3.2 1.3 0.2 3 4.6 3.1 1.5 0.2 4 5.0 3.6 1.4 0.2 Explained var %: 0 0.92461634 1 0.05301554 2 0.017185122 3 0.005183111 First two: 0.9776318781077862 Load CSV & Preprocess Explore Variance
  • 56. 56 Dimensionality Reduction pca = PCA(n_components = 2) pca.fit(df) X = pca.transform(df) Code Example Reduce Dimensions
  • 57. 57 Dimensionality Reduction pca = PCA(n_components = 2) pca.fit(df) X = pca.transform(df) Code Example uniq = set(labels.values) colors = [plt.cm.Spectral(each) for each in np.linspace(0, 1, len(uniq))] for k, col in zip(uniq, colors): c = X[l==k].as_gpu_matrix() plt.plot(c[:,0], c[:,1], '.', markerfacecolor=tuple(col)) Reduce Dimensions Chart Rotated Data
  • 58. 58 Dimensionality Reduction pca = PCA(n_components = 2) pca.fit(df) X = pca.transform(df) Code Example uniq = set(labels.values) colors = [plt.cm.Spectral(each) for each in np.linspace(0, 1, len(uniq))] for k, col in zip(uniq, colors): c = X[l==k].as_gpu_matrix() plt.plot(c[:,0], c[:,1], '.', markerfacecolor=tuple(col)) Reduce Dimensions Chart Rotated Data
  • 59. 59 Dimensionality Reduction pca = PCA(n_components = 2) pca.fit(df) X = pca.transform(df) Code Example uniq = set(labels.values) colors = [plt.cm.Spectral(each) for each in np.linspace(0, 1, len(uniq))] for k, col in zip(uniq, colors): c = X[l==k].as_gpu_matrix() plt.plot(c[:,0], c[:,1], '.', markerfacecolor=tuple(col)) Reduce Dimensions Chart Rotated Data
  • 60. 60 Clustering Code Example X, y = make_moons(n_samples=int(1e2), noise=0.05, random_state=0) X_df = pd.DataFrame({'fea%d'%i: X[:, i] for i in range(X.shape[1])}) X = cudf.DataFrame.from_pandas(X_df) Load Moons Dataset and Preprocess
  • 61. 61 Clustering Code Example X, y = make_moons(n_samples=int(1e2), noise=0.05, random_state=0) X_df = pd.DataFrame({'fea%d'%i: X[:, i] for i in range(X.shape[1])}) X = cudf.DataFrame.from_pandas(X_df) Load Moons Dataset and Preprocess
  • 62. 62 Clustering Code Example X, y = make_moons(n_samples=int(1e2), noise=0.05, random_state=0) X_df = pd.DataFrame({'fea%d'%i: X[:, i] for i in range(X.shape[1])}) X = cudf.DataFrame.from_pandas(X_df) Load Moons Dataset and Preprocess
  • 63. 63 Clustering from cuml import DBSCAN dbscan = DBSCAN(eps = 0.3, min_samples = 5) dbscan.fit(X) y_hat = db.fit_predict(X) Code Example Find Clusters X, y = make_moons(n_samples=int(1e2), noise=0.05, random_state=0) X_df = pd.DataFrame({'fea%d'%i: X[:, i] for i in range(X.shape[1])}) X = cudf.DataFrame.from_pandas(X_df) Load Moons Dataset and Preprocess
  • 64. 64 Clustering from cuml import DBSCAN dbscan = DBSCAN(eps = 0.3, min_samples = 5) dbscan.fit(X) y_hat = db.fit_predict(X) Code Example Find Clusters X, y = make_moons(n_samples=int(1e2), noise=0.05, random_state=0) X_df = pd.DataFrame({'fea%d'%i: X[:, i] for i in range(X.shape[1])}) X = cudf.DataFrame.from_pandas(X_df) Load Moons Dataset and Preprocess
  • 66. 66 cuDF + XGBoost DGX-2 vs Scale Out CPU Cluster • Full end to end pipeline • Leveraging Dask + cuDF • Store each GPU results in sys mem then read back in • Arrow to Dmatrix (CSR) for XGBoost
  • 67. 67 cuDF + XGBoost Scale Out GPU Cluster vs DGX-2 0 50 100 150 200 250 300 350 5xDGX-1 DGX-2 Chart Title ETL+CSV (s) ML Prep (s) ML (s) • Full end to end pipeline • Leveraging Dask for multi-node + cuDF • Store each GPU results in sys mem then read back in • Arrow to Dmatrix (CSR) for XGBoost
  • 68. 68 cuDF + XGBoost Fully In- GPU Benchmarks • Full end to end pipeline • Leveraging Dask cuDF • No Data Prep time all in memory • Arrow to Dmatrix (CSR) for XGBoost
  • 70. 70 End-to-End Benchmarks 2,290 1,956 1,999 1,948 169 157 0 500 1,000 1,500 2,000 2,500 20 CPU Nodes 30 CPU Nodes 50 CPU Nodes 100 CPU Nodes DGX-2 5x DGX-1 0 2,000 4,000 6,000 8,000 10,000 20 CPU Nodes 30 CPU Nodes 50 CPU Nodes 100 CPU Nodes DGX-2 5x DGX-1 2,741 1,675 715 379 42 19 0 1,000 2,000 3,000 20 CPU Nodes 30 CPU Nodes 50 CPU Nodes 100 CPU Nodes DGX-2 5x DGX-1 End-to-End cuIO/cuDF — Load and Data Preparation Benchmark 200GB CSV dataset; Data preparation includes joins, variable transformations. CPU Cluster Configuration CPU nodes (61 GiB of memory, 8 vCPUs, 64-bit platform), Apache Spark DGX Cluster Configuration 5x DGX-1 on InfiniBand network cuIO / cuDF (Load and Data Preparation) Data Conversion XGBoost cuML — XGBoost Time in seconds - Shorter is better
  • 71. 71 • https://ngc.nvidia.com/registry/nvidia- rapidsai-rapidsai • https://hub.docker.com/r/rapidsai/rapidsai/ • https://github.com/rapidsai • https://anaconda.org/rapidsai/ • WIP: • https://pypi.org/project/cudf • https://pypi.org/project/cuml RAPIDS How do I get the software?