SlideShare une entreprise Scribd logo
1  sur  33
Télécharger pour lire hors ligne
SF Big Analytics Meetup
Accelerate Spark With RAPIDS For Cost Savings
Sameer Raheja, Sr. Dir, Engineering
June 22, 2023
221 Zettabytes of Data Generated by 2026
221.0
0.0
50.0
100.0
150.0
200.0
250.0
2021 2022 2023 2024 2025 2026
Zettabytes
Zettabytes
Source: IDC, Global DataSphere Forecast, 2022-2026: Enterprise Organizations Driving Most of the Growth, May 2022
ZB
of
Data
Generation
2021-2026 CAGR of 21.2%
How to Deal With Data Growth?
Increase compute capacity to
meet growing data demands
Accept higher latency as data
volumes grow
Increase Cost
Increase Time
Down sample data to meet
cost & SLA targets
Reduce Fidelity
Scaling ETL Processing With Apache Spark with GPUs
RAPIDS Accelerator for Apache Spark
2030
2020
2010
2000
Hadoop Spark Spark GPU
Spark 2.0 on
CPUs
GPU
Accelerated
Spark 3.0
Growth in
Requirement for
Data Processing
Spark RAPIDS
Acceleration & Cost savings
NVIDIA Decision Support Benchmark
NVIDIA Decision Support
(NDS) is our adaptation of
the TPC-DS benchmark
often used by Spark
customers and providers.
NDS consists of the same
100+ SQL queries as the
industry standard
benchmark but has
modified parts for execution
scripts.
The NDS benchmark is
derived from the TPC-DS
benchmark and as such is
not comparable to published
TPC-DS results, as the NDS
results do not comply with
the TPC-DS Specification.
https://github.com/nvidia/spark-rapids-benchmarks
NDS Benchmark Environment on Google Cloud Dataproc
Parquet data at scale factor 3000
CPU Cluster
Worker cost: $1.89/hr
GPU Cluster
Worker cost: $1.89/hr
+ 2 x ($0.35)/hr/GPU
Nodes 1 driver, 4 workers
RAM 208 GB
Storage Google Cloud Storage + 2 NVME / node
Software Dataproc 2.1, Spark 3.3.0, YARN
Nodes 1 driver, 4 workers
RAM 208 GB
Storage Google Cloud Storage + 2 NVME / node
Software Dataproc 2.1, Spark 3.3.0, YARN,
RAPIDS Spark plugin jar
GPU 2xT4 / node
Pricing: https://cloud.google.com/compute/vm-instance-pricing, us-central1, Feb 2023
n1-highmem-32
n1-highmem-32
n1-highmem-32
n1-highmem-32
n1-standard-16 n1-standard-16 n1-highmem-32
n1-highmem-32
n1-highmem-32
n1-highmem-32
T4 T4
T4 T4
T4 T4
T4 T4
NDS Benchmark Configuration
GCP Dataproc
Config Dataproc CPU Dataproc GPU Group
spark.driver.memory 50gb 50gb Resources
spark.driver.maxResultSize 1gb (default) 2gb
spark.executor.cores 9 16
spark.executor.memory 16gb 16gb
spark.executor.instances 14 8
spark.executor.memoryOverhead 1.6gb (default) 16gb
spark.task.resource.gpu.amount 0.0625
spark.scheduler.minRegisteredResourcesRatio 1.0 1.0 Scheduling
spark.locality.wait 0 0
spark.sql.shuffle.partitions 128 200 (default) Shuffle
spark.sql.files.maxPartitionBytes 128mb (default) 2gb
spark.shuffle.manager com.nvidia.spark.rapids.spark330.RapidsShuffleManager
spark.rapids.shuffle.multiThreaded.writer.threads 16
spark.rapids.shuffle.multiThreaded.reader.threads 16
spark.rapids.sql.batchSizeBytes 1gb GPU specific
spark.rapids.sql.concurrentGpuTasks 2
spark.rapids.memory.host.spillStorageSize 32gb
spark.rapids.memory.pinnedPool.size 8gb
NVIDIA Decision Support Benchmark 3TB
RAPIDS Spark release 23.02
The NDS benchmark is derived from the TPCDS benchmark and as such is not comparable to published TPCDS results, as the NDS results do not comply with the TPCDS Specification.
5.7x speedup 4.5x cost savings 5x more efficient
176
31
0
50
100
150
200
CPU GPU
Time (min)
12425
2479
0
5000
10000
15000
CPU GPU
Power (watts)
$32.62
$7.20
$-
$5.00
$10.00
$15.00
$20.00
$25.00
$30.00
$35.00
CPU GPU
Cost ($)
Retail use case on Google Cloud Dataproc
Automate the ranking of thousands of online shelves
2x speedup
40 nodes
40% cost benefit
40 nodes
60% cost benefit
20 nodes
1.6x speedup
20 nodes
1.2x speedup
10 nodes
70% cost benefit
10 nodes
Cost Savings Across Clouds
NDS benchmark scale factor 3000
GCP
Dataproc 2.1
AWS
EMR 6.9
AWS
Databricks
Photon 10.4
Azure
Databricks
Photon 10.4
Cost Savings 78% 42% 39% 34%
CPU n1-highmem-32 (4) m6gd.8xlarge (8) m6gd.8xlarge (8) Standard_E16ds_v4 (8)
GPU
n1-highmem-32 (4) +
2 x T4 / node
g4dn.12xlarge (2)
4 x T4 / node
g5.8xlarge (4)
1 x A10 / node
Standard_NC8as_T4_v3
(8)
1 x T4 / node
GCP pricing: https://cloud.google.com/compute/vm-instance-pricing, us-central1 , Feb 2023
AWS pricing: https://aws.amazon.com/emr/pricing/ , us-west-2, Feb 2023
Databricks AWS pricing: https://www.databricks.com/product/aws-pricing , Feb 2023
Databricks Azure pricing: https://azure.microsoft.com/en-us/pricing/details/databricks/#instance-type-support , Feb 2023
RAPIDS Accelerator Distribution Availability
Apache Spark 3.x
Community
Release
Available
Now
Databricks
Machine
Learning
Runtime
Available
Now
Amazon EMR
Available
Now
Available
Now
Cloudera CDP Google Cloud
Dataproc
Available
Now
Available
Now
Microsoft Azure
Synapse Analytics
Spark RAPIDS
Architecture Overview
Apache Spark
Spark Core
Python SQL R Java Scala
Spark SQL Spark ML
Spark
Streaming
GraphX
Driver
Cluster
manager
Worker Nodes
Apache Spark 3.x
Resource aware scheduling
Spark Core
Python SQL R Java Scala
Spark SQL Spark ML
Spark
Streaming
GraphX
Driver
Cluster
manager
Worker Nodes
Apache Spark 3.x
Columnar processing
Spark Core
Python SQL R Java Scala
Spark SQL Spark ML
Spark
Streaming
GraphX
Row
processing
Column
processing
Apache Spark 3.x
RAPIDS Accelerator for Apache Spark
Spark Core
Python SQL R Java Scala
Spark SQL Spark ML
Spark
Streaming
GraphX
Driver
Cluster
manager
Worker Nodes
Spark built in processing RAPIDS Spark plugin
RAPIDS Accelerator for Apache Spark
Spark Plugin for GPU Acceleration
Spark Core
Spark DataFrame / SQL API Application
RAPIDS C++ Libraries
CUDA
Java Bindings
if gpu_enabled(operation && datatype):
RAPIDS Spark
else:
Spark CPU
RAPIDS Spark plugin
RAPIDS Accelerator for Apache Spark
Spark Plugin for GPU Acceleration
RAPIDS Accelerator
for Apache Spark
UCX Libraries
RAPIDS C++ Libraries
CUDA
APACHE SPARK CORE
Spark SQL API Spark Shuffle
DataFrame API
if gpu_enabled(operation
&& datatype):
RAPIDS Spark
else:
Spark CPU
Custom Implementation of shuffle
optimized for RDMA and GPU-to-GPU
direct communication
Java Bindings
DISTRIBUTED SCALE-OUT SPARK APPLICATIONS
Java Bindings
No Query Changes
• Add jars to classpath and set
spark.plugins config
• Same SQL and DataFrame code
• Compatible with PySpark, SparkR,
Koalas, and other DataFrame-based APIs
• Seamless fallback to CPU for
unsupported operations
spark.sql( """
SELECT
o_order_priority
count(*) as order_count
FROM
orders
WHERE
o_orderdate >= DATE '1993-07-01'
AND o_orderdate < DATE '1993-07-01' + interval '3' month
AND EXISTS (
SELECT
*
FROM lineitem
WHERE
l_orderkey = o_orderkey
AND l_commitdate < l_receiptdate
)
GROUP BY
o_orderpriority ORDER BY o_orderpriority
""" ).show()
Example: SQL query text processed directly by Spark using spark.sql() API.
No changes are required to the query or any of the Spark code.
Spark SQL & DataFrame Query Execution
Dataframe
Logical
Plan
Physical
Plan
RDD
[InternalRow]
GPU Physical
Plan
RDD
[ColumnarBatch]
bar.groupBy(
col(”product_id”),
col(“ds”))
.agg(
max(col(“price”)) -
min(col(“price”)).alias(“range”))
SELECT product_id, ds,
max(price) – min(price)
AS range
FROM bar
GROUP BY product_id, ds
Spark SQL & DataFrame Query Execution
Read Parquet File
First Stage
Aggregate
Shuffle Exchange
Second Stage
Aggregate
Write Parquet File
Combine Shuffle
Data
Read Parquet File
First Stage
Aggregate
Shuffle Exchange
Second Stage
Aggregate
Write Parquet File
Convert to Row
Format
Convert to Row
Format
GPU
PHYISCAL
PLAN
CPU
PHYISCAL
PLAN
Task Level Parallelism vs. Data Level Parallelism
Task Level Parallelism vs Data Level Parallelism
Data Parallelism is cooperative. It uses a more consistent amount of memory and reduces total data in memory for
operations like sort which results in less spilling.
Task 1
Task 2
Task 3
Task 4
CPU
Task 1
Task 2
GPU
Data Data
Is this a silver bullet?
v Small amounts of data
v Highly cache coherent processing
v Slow distributed filesystem
v Slow local disks or network for shuffle
v CPU/GPU transitions on many rows
v User Defined Functions (UDFs) not on GPU
No
160
550
1250
3500
12288
24576 25600
46080
307200
1048576
SPINNING
DISK
SSD 10GIGE NVME PCIE GEN3 PCIE GEN4 DDR4-3200
DIMM
TYPICAL PC
RAM
NVLINK CPU CACHE
MB/s
(Log
Scale)
But it can be amazing
v High cardinality processing
v Joins
v Aggregations
v Sort
v Window operations, especially large windows
v Complicated processing
v Transcoding data formats
v Encoding Parquet and ORC
v Reading CSV
Where the RAPIDS Accelerator excels
208
11
0
50
100
150
200
Seconds
18X Speedup on Window
CPU GPU
117
12
0
30
60
90
120
Seconds
9X Speedup on Large Join
CPU GPU
https://www.nvidia.com/en-us/on-demand/session/gtcfall21-a31436/
Job Qualification
From Qualification to Tuning
pip install spark-rapids-user-tools
Qualification
Understand the cost savings and
acceleration potential of RAPIDS
Spark
spark_rapids_user_tools
dataproc –cluster <cluster-
name> qualification
Output: list of apps recommended
for RAPIDS Spark with estimated
savings and speed-up
Bootstrap
Provide optimized RAPIDS Spark
configs based on Dataproc GPU
cluster shape
spark_rapids_user_tools
dataproc –cluster <cluster-
name> bootstrap
Output: updated Spark config
settings on Dataproc master node
Tuning
Tune RAPIDS Spark configs based
on initial job run leveraging Spark
event logs
spark_rapids_user_tools
dataproc –cluster <cluster-
name> profiler
Output: recommended per-app
RAPIDS Spark config settings
Workload Qualification Output
Sample tool output
+---------------------+----------------------+-----------------+-----------------+---------------+-----------------+
| App Name | Recommendation | Estimated GPU | Estimated GPU | App | Estimated GPU |
| | | Speedup | Duration(s) | Duration(s) | Savings(%) |
+---------------------+----------------------+-----------------+-----------------+---------------+-----------------|
| Customer App #1 | Strongly Recommended | 3.66 | 651.24 | 2384.32 | 64.04 |
| Sales App #1 | Strongly Recommended | 3.14 | 89.61 | 281.62 | 58.11 |
| Sales App #2 | Strongly Recommended | 3.12 | 300.39 | 939.21 | 57.89 |
| Customer App #2 | Strongly Recommended | 2.55 | 698.16 | 1783.65 | 48.47 |
+---------------------+----------------------+-----------------+-----------------+---------------+-----------------+
Report Summary:
------------------------------ ------
Total applications 4
RAPIDS candidates 4
Overall estimated speedup 3.10
Overall estimated cost savings 57.50%
------------------------------ ------
Upcoming
• Table layout formats
• Delta Lake, Apache Iceberg, Apache Hudi
• ARM support
• Tooling for Azure Databricks
• Improvements in performance, reliability
• File caching
More Information
• Spark RAPIDS Plugin Overview: https://www.nvidia.com/spark
• Spark RAPIDS User Docs: https://nvidia.github.io/spark-rapids
• GitHub: https://github.com/NVIDIA/spark-rapids/
• Contact us: spark-rapids-support@nvidia.com
SFBigAnalytics_SparkRapid_20220622.pdf

Contenu connexe

Tendances

Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
Databricks
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
Cloudera, Inc.
 

Tendances (20)

How Kafka Powers the World's Most Popular Vector Database System with Charles...
How Kafka Powers the World's Most Popular Vector Database System with Charles...How Kafka Powers the World's Most Popular Vector Database System with Charles...
How Kafka Powers the World's Most Popular Vector Database System with Charles...
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
 
Wide Column Store NoSQL vs SQL Data Modeling
Wide Column Store NoSQL vs SQL Data ModelingWide Column Store NoSQL vs SQL Data Modeling
Wide Column Store NoSQL vs SQL Data Modeling
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High Performance
 
Passive DNS Collection -- the 'dnstap' approach, by Paul Vixie [APNIC 38 / AP...
Passive DNS Collection -- the 'dnstap' approach, by Paul Vixie [APNIC 38 / AP...Passive DNS Collection -- the 'dnstap' approach, by Paul Vixie [APNIC 38 / AP...
Passive DNS Collection -- the 'dnstap' approach, by Paul Vixie [APNIC 38 / AP...
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
 
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
 
Cosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle ServiceCosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle Service
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is Failing
 
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceZeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
RaptorX: Building a 10X Faster Presto with hierarchical cache
RaptorX: Building a 10X Faster Presto with hierarchical cacheRaptorX: Building a 10X Faster Presto with hierarchical cache
RaptorX: Building a 10X Faster Presto with hierarchical cache
 
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
 

Similaire à SFBigAnalytics_SparkRapid_20220622.pdf

QCon2016--Drive Best Spark Performance on AI
QCon2016--Drive Best Spark Performance on AIQCon2016--Drive Best Spark Performance on AI
QCon2016--Drive Best Spark Performance on AI
Lex Yu
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 

Similaire à SFBigAnalytics_SparkRapid_20220622.pdf (20)

Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
 
Deep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.xDeep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.x
 
DUG'20: 02 - Accelerating apache spark with DAOS on Aurora
DUG'20: 02 - Accelerating apache spark with DAOS on AuroraDUG'20: 02 - Accelerating apache spark with DAOS on Aurora
DUG'20: 02 - Accelerating apache spark with DAOS on Aurora
 
RAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature EngineeringRAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature Engineering
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
 
Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...
Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...
Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...
 
20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing
 
20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL
 
QCon2016--Drive Best Spark Performance on AI
QCon2016--Drive Best Spark Performance on AIQCon2016--Drive Best Spark Performance on AI
QCon2016--Drive Best Spark Performance on AI
 
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
 
20181210 - PGconf.ASIA Unconference
20181210 - PGconf.ASIA Unconference20181210 - PGconf.ASIA Unconference
20181210 - PGconf.ASIA Unconference
 
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
 
20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai
 
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
 
PG-Strom v2.0 Technical Brief (17-Apr-2018)
PG-Strom v2.0 Technical Brief (17-Apr-2018)PG-Strom v2.0 Technical Brief (17-Apr-2018)
PG-Strom v2.0 Technical Brief (17-Apr-2018)
 
SQream-GPU가속 초거대 정형데이타 분석용 SQL DB-제품소개-박문기@메가존클라우드
SQream-GPU가속 초거대 정형데이타 분석용 SQL DB-제품소개-박문기@메가존클라우드SQream-GPU가속 초거대 정형데이타 분석용 SQL DB-제품소개-박문기@메가존클라우드
SQream-GPU가속 초거대 정형데이타 분석용 SQL DB-제품소개-박문기@메가존클라우드
 
Spark
SparkSpark
Spark
 
Mixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache SparkMixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache Spark
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
BlazingSQL & Graphistry - Netflow Demo
BlazingSQL & Graphistry - Netflow DemoBlazingSQL & Graphistry - Netflow Demo
BlazingSQL & Graphistry - Netflow Demo
 

Plus de Chester Chen

zookeeer+raft-2.pdf
zookeeer+raft-2.pdfzookeeer+raft-2.pdf
zookeeer+raft-2.pdf
Chester Chen
 
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
Chester Chen
 
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
Chester Chen
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a Pro
Chester Chen
 
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftSF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
Chester Chen
 
Sf big analytics: bighead
Sf big analytics: bigheadSf big analytics: bighead
Sf big analytics: bighead
Chester Chen
 
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3
Chester Chen
 
Index conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreathIndex conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreath
Chester Chen
 

Plus de Chester Chen (20)

zookeeer+raft-2.pdf
zookeeer+raft-2.pdfzookeeer+raft-2.pdf
zookeeer+raft-2.pdf
 
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
 
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
 
A missing link in the ML infrastructure stack?
A missing link in the ML infrastructure stack?A missing link in the ML infrastructure stack?
A missing link in the ML infrastructure stack?
 
Shopify datadiscoverysf bigdata
Shopify datadiscoverysf bigdataShopify datadiscoverysf bigdata
Shopify datadiscoverysf bigdata
 
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
 
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
 SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK... SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a Pro
 
SF Big Analytics 2019-06-12: Managing uber's data workflows at scale
SF Big Analytics 2019-06-12: Managing uber's data workflows at scaleSF Big Analytics 2019-06-12: Managing uber's data workflows at scale
SF Big Analytics 2019-06-12: Managing uber's data workflows at scale
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
 
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftSF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
 
SFBigAnalytics- hybrid data management using cdap
SFBigAnalytics- hybrid data management using cdapSFBigAnalytics- hybrid data management using cdap
SFBigAnalytics- hybrid data management using cdap
 
Sf big analytics: bighead
Sf big analytics: bigheadSf big analytics: bighead
Sf big analytics: bighead
 
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platformSf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
 
2018 data warehouse features in spark
2018   data warehouse features in spark2018   data warehouse features in spark
2018 data warehouse features in spark
 
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3
 
2018 02 20-jeg_index
2018 02 20-jeg_index2018 02 20-jeg_index
2018 02 20-jeg_index
 
Index conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreathIndex conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreath
 
Index conf sparkai-feb20-n-pentreath
Index conf sparkai-feb20-n-pentreathIndex conf sparkai-feb20-n-pentreath
Index conf sparkai-feb20-n-pentreath
 

Dernier

obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontangobat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
siskavia95
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
pwgnohujw
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
zifhagzkk
 
原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证
pwgnohujw
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
varanasisatyanvesh
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
jk0tkvfv
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 

Dernier (20)

Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontangobat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive FutureFuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchers
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
 
Rolex Watch - Design Decision Analysis.
Rolex Watch -  Design Decision Analysis.Rolex Watch -  Design Decision Analysis.
Rolex Watch - Design Decision Analysis.
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
 

SFBigAnalytics_SparkRapid_20220622.pdf

  • 1. SF Big Analytics Meetup Accelerate Spark With RAPIDS For Cost Savings Sameer Raheja, Sr. Dir, Engineering June 22, 2023
  • 2. 221 Zettabytes of Data Generated by 2026 221.0 0.0 50.0 100.0 150.0 200.0 250.0 2021 2022 2023 2024 2025 2026 Zettabytes Zettabytes Source: IDC, Global DataSphere Forecast, 2022-2026: Enterprise Organizations Driving Most of the Growth, May 2022 ZB of Data Generation 2021-2026 CAGR of 21.2%
  • 3. How to Deal With Data Growth? Increase compute capacity to meet growing data demands Accept higher latency as data volumes grow Increase Cost Increase Time Down sample data to meet cost & SLA targets Reduce Fidelity
  • 4. Scaling ETL Processing With Apache Spark with GPUs RAPIDS Accelerator for Apache Spark 2030 2020 2010 2000 Hadoop Spark Spark GPU Spark 2.0 on CPUs GPU Accelerated Spark 3.0 Growth in Requirement for Data Processing
  • 6. NVIDIA Decision Support Benchmark NVIDIA Decision Support (NDS) is our adaptation of the TPC-DS benchmark often used by Spark customers and providers. NDS consists of the same 100+ SQL queries as the industry standard benchmark but has modified parts for execution scripts. The NDS benchmark is derived from the TPC-DS benchmark and as such is not comparable to published TPC-DS results, as the NDS results do not comply with the TPC-DS Specification. https://github.com/nvidia/spark-rapids-benchmarks
  • 7. NDS Benchmark Environment on Google Cloud Dataproc Parquet data at scale factor 3000 CPU Cluster Worker cost: $1.89/hr GPU Cluster Worker cost: $1.89/hr + 2 x ($0.35)/hr/GPU Nodes 1 driver, 4 workers RAM 208 GB Storage Google Cloud Storage + 2 NVME / node Software Dataproc 2.1, Spark 3.3.0, YARN Nodes 1 driver, 4 workers RAM 208 GB Storage Google Cloud Storage + 2 NVME / node Software Dataproc 2.1, Spark 3.3.0, YARN, RAPIDS Spark plugin jar GPU 2xT4 / node Pricing: https://cloud.google.com/compute/vm-instance-pricing, us-central1, Feb 2023 n1-highmem-32 n1-highmem-32 n1-highmem-32 n1-highmem-32 n1-standard-16 n1-standard-16 n1-highmem-32 n1-highmem-32 n1-highmem-32 n1-highmem-32 T4 T4 T4 T4 T4 T4 T4 T4
  • 8. NDS Benchmark Configuration GCP Dataproc Config Dataproc CPU Dataproc GPU Group spark.driver.memory 50gb 50gb Resources spark.driver.maxResultSize 1gb (default) 2gb spark.executor.cores 9 16 spark.executor.memory 16gb 16gb spark.executor.instances 14 8 spark.executor.memoryOverhead 1.6gb (default) 16gb spark.task.resource.gpu.amount 0.0625 spark.scheduler.minRegisteredResourcesRatio 1.0 1.0 Scheduling spark.locality.wait 0 0 spark.sql.shuffle.partitions 128 200 (default) Shuffle spark.sql.files.maxPartitionBytes 128mb (default) 2gb spark.shuffle.manager com.nvidia.spark.rapids.spark330.RapidsShuffleManager spark.rapids.shuffle.multiThreaded.writer.threads 16 spark.rapids.shuffle.multiThreaded.reader.threads 16 spark.rapids.sql.batchSizeBytes 1gb GPU specific spark.rapids.sql.concurrentGpuTasks 2 spark.rapids.memory.host.spillStorageSize 32gb spark.rapids.memory.pinnedPool.size 8gb
  • 9. NVIDIA Decision Support Benchmark 3TB RAPIDS Spark release 23.02 The NDS benchmark is derived from the TPCDS benchmark and as such is not comparable to published TPCDS results, as the NDS results do not comply with the TPCDS Specification. 5.7x speedup 4.5x cost savings 5x more efficient 176 31 0 50 100 150 200 CPU GPU Time (min) 12425 2479 0 5000 10000 15000 CPU GPU Power (watts) $32.62 $7.20 $- $5.00 $10.00 $15.00 $20.00 $25.00 $30.00 $35.00 CPU GPU Cost ($)
  • 10. Retail use case on Google Cloud Dataproc Automate the ranking of thousands of online shelves 2x speedup 40 nodes 40% cost benefit 40 nodes 60% cost benefit 20 nodes 1.6x speedup 20 nodes 1.2x speedup 10 nodes 70% cost benefit 10 nodes
  • 11. Cost Savings Across Clouds NDS benchmark scale factor 3000 GCP Dataproc 2.1 AWS EMR 6.9 AWS Databricks Photon 10.4 Azure Databricks Photon 10.4 Cost Savings 78% 42% 39% 34% CPU n1-highmem-32 (4) m6gd.8xlarge (8) m6gd.8xlarge (8) Standard_E16ds_v4 (8) GPU n1-highmem-32 (4) + 2 x T4 / node g4dn.12xlarge (2) 4 x T4 / node g5.8xlarge (4) 1 x A10 / node Standard_NC8as_T4_v3 (8) 1 x T4 / node GCP pricing: https://cloud.google.com/compute/vm-instance-pricing, us-central1 , Feb 2023 AWS pricing: https://aws.amazon.com/emr/pricing/ , us-west-2, Feb 2023 Databricks AWS pricing: https://www.databricks.com/product/aws-pricing , Feb 2023 Databricks Azure pricing: https://azure.microsoft.com/en-us/pricing/details/databricks/#instance-type-support , Feb 2023
  • 12. RAPIDS Accelerator Distribution Availability Apache Spark 3.x Community Release Available Now Databricks Machine Learning Runtime Available Now Amazon EMR Available Now Available Now Cloudera CDP Google Cloud Dataproc Available Now Available Now Microsoft Azure Synapse Analytics
  • 14. Apache Spark Spark Core Python SQL R Java Scala Spark SQL Spark ML Spark Streaming GraphX Driver Cluster manager Worker Nodes
  • 15. Apache Spark 3.x Resource aware scheduling Spark Core Python SQL R Java Scala Spark SQL Spark ML Spark Streaming GraphX Driver Cluster manager Worker Nodes
  • 16. Apache Spark 3.x Columnar processing Spark Core Python SQL R Java Scala Spark SQL Spark ML Spark Streaming GraphX Row processing Column processing
  • 17. Apache Spark 3.x RAPIDS Accelerator for Apache Spark Spark Core Python SQL R Java Scala Spark SQL Spark ML Spark Streaming GraphX Driver Cluster manager Worker Nodes Spark built in processing RAPIDS Spark plugin
  • 18. RAPIDS Accelerator for Apache Spark Spark Plugin for GPU Acceleration Spark Core Spark DataFrame / SQL API Application RAPIDS C++ Libraries CUDA Java Bindings if gpu_enabled(operation && datatype): RAPIDS Spark else: Spark CPU RAPIDS Spark plugin
  • 19. RAPIDS Accelerator for Apache Spark Spark Plugin for GPU Acceleration RAPIDS Accelerator for Apache Spark UCX Libraries RAPIDS C++ Libraries CUDA APACHE SPARK CORE Spark SQL API Spark Shuffle DataFrame API if gpu_enabled(operation && datatype): RAPIDS Spark else: Spark CPU Custom Implementation of shuffle optimized for RDMA and GPU-to-GPU direct communication Java Bindings DISTRIBUTED SCALE-OUT SPARK APPLICATIONS Java Bindings
  • 20. No Query Changes • Add jars to classpath and set spark.plugins config • Same SQL and DataFrame code • Compatible with PySpark, SparkR, Koalas, and other DataFrame-based APIs • Seamless fallback to CPU for unsupported operations spark.sql( """ SELECT o_order_priority count(*) as order_count FROM orders WHERE o_orderdate >= DATE '1993-07-01' AND o_orderdate < DATE '1993-07-01' + interval '3' month AND EXISTS ( SELECT * FROM lineitem WHERE l_orderkey = o_orderkey AND l_commitdate < l_receiptdate ) GROUP BY o_orderpriority ORDER BY o_orderpriority """ ).show() Example: SQL query text processed directly by Spark using spark.sql() API. No changes are required to the query or any of the Spark code.
  • 21. Spark SQL & DataFrame Query Execution Dataframe Logical Plan Physical Plan RDD [InternalRow] GPU Physical Plan RDD [ColumnarBatch] bar.groupBy( col(”product_id”), col(“ds”)) .agg( max(col(“price”)) - min(col(“price”)).alias(“range”)) SELECT product_id, ds, max(price) – min(price) AS range FROM bar GROUP BY product_id, ds
  • 22. Spark SQL & DataFrame Query Execution Read Parquet File First Stage Aggregate Shuffle Exchange Second Stage Aggregate Write Parquet File Combine Shuffle Data Read Parquet File First Stage Aggregate Shuffle Exchange Second Stage Aggregate Write Parquet File Convert to Row Format Convert to Row Format GPU PHYISCAL PLAN CPU PHYISCAL PLAN
  • 23. Task Level Parallelism vs. Data Level Parallelism Task Level Parallelism vs Data Level Parallelism Data Parallelism is cooperative. It uses a more consistent amount of memory and reduces total data in memory for operations like sort which results in less spilling. Task 1 Task 2 Task 3 Task 4 CPU Task 1 Task 2 GPU Data Data
  • 24. Is this a silver bullet? v Small amounts of data v Highly cache coherent processing v Slow distributed filesystem v Slow local disks or network for shuffle v CPU/GPU transitions on many rows v User Defined Functions (UDFs) not on GPU No 160 550 1250 3500 12288 24576 25600 46080 307200 1048576 SPINNING DISK SSD 10GIGE NVME PCIE GEN3 PCIE GEN4 DDR4-3200 DIMM TYPICAL PC RAM NVLINK CPU CACHE MB/s (Log Scale)
  • 25. But it can be amazing v High cardinality processing v Joins v Aggregations v Sort v Window operations, especially large windows v Complicated processing v Transcoding data formats v Encoding Parquet and ORC v Reading CSV Where the RAPIDS Accelerator excels 208 11 0 50 100 150 200 Seconds 18X Speedup on Window CPU GPU 117 12 0 30 60 90 120 Seconds 9X Speedup on Large Join CPU GPU https://www.nvidia.com/en-us/on-demand/session/gtcfall21-a31436/
  • 27. From Qualification to Tuning pip install spark-rapids-user-tools Qualification Understand the cost savings and acceleration potential of RAPIDS Spark spark_rapids_user_tools dataproc –cluster <cluster- name> qualification Output: list of apps recommended for RAPIDS Spark with estimated savings and speed-up Bootstrap Provide optimized RAPIDS Spark configs based on Dataproc GPU cluster shape spark_rapids_user_tools dataproc –cluster <cluster- name> bootstrap Output: updated Spark config settings on Dataproc master node Tuning Tune RAPIDS Spark configs based on initial job run leveraging Spark event logs spark_rapids_user_tools dataproc –cluster <cluster- name> profiler Output: recommended per-app RAPIDS Spark config settings
  • 28.
  • 29. Workload Qualification Output Sample tool output +---------------------+----------------------+-----------------+-----------------+---------------+-----------------+ | App Name | Recommendation | Estimated GPU | Estimated GPU | App | Estimated GPU | | | | Speedup | Duration(s) | Duration(s) | Savings(%) | +---------------------+----------------------+-----------------+-----------------+---------------+-----------------| | Customer App #1 | Strongly Recommended | 3.66 | 651.24 | 2384.32 | 64.04 | | Sales App #1 | Strongly Recommended | 3.14 | 89.61 | 281.62 | 58.11 | | Sales App #2 | Strongly Recommended | 3.12 | 300.39 | 939.21 | 57.89 | | Customer App #2 | Strongly Recommended | 2.55 | 698.16 | 1783.65 | 48.47 | +---------------------+----------------------+-----------------+-----------------+---------------+-----------------+ Report Summary: ------------------------------ ------ Total applications 4 RAPIDS candidates 4 Overall estimated speedup 3.10 Overall estimated cost savings 57.50% ------------------------------ ------
  • 30.
  • 31. Upcoming • Table layout formats • Delta Lake, Apache Iceberg, Apache Hudi • ARM support • Tooling for Azure Databricks • Improvements in performance, reliability • File caching
  • 32. More Information • Spark RAPIDS Plugin Overview: https://www.nvidia.com/spark • Spark RAPIDS User Docs: https://nvidia.github.io/spark-rapids • GitHub: https://github.com/NVIDIA/spark-rapids/ • Contact us: spark-rapids-support@nvidia.com