SlideShare a Scribd company logo
1 of 36
Download to read offline
Kazuaki Ishizaki
IBM Research – Tokyo
âœ‡æœŹă‚ąă‚€ăƒ»ăƒ“ăƒŒăƒ»ă‚šăƒ ïŒˆæ Ș東äșŹćŸș瀎研究所
Exploiting GPUs in Spark
1
Who am I?
 Kazuaki Ishizaki
 Research staff member at IBM Research – Tokyo
– http://ibm.co/kiszk
 Research interests
– compiler optimizations, language runtime, and parallel processing
 Worked for Java virtual machine and just-in-time compiler over 20 years
– From JDK 1.0 to Java SE 8
 Twitter: @kiszk
 Slideshare: http://www.slideshare.net/ishizaki
 Github: https://github.com/kiszk
2 Exploting GPUs in Spark - Kazuaki Ishizaki
Agenda
 Motivation & Goal
 Introduction of GPUs
 Design & New Components
– Binary columnar
– GPU enabler
 Current Implementation
 Performance Experiment
– Achieved 3.15x performance of a naïve logistic regression by using a GPU
 Future Direction in Spark 2.0 and beyond
– with Dataset (introduced in Spark 1.6)
 Conclusion
3 Exploting GPUs in Spark - Kazuaki Ishizaki
Want to Accelerate Computation-heavy Application
 Motivation
– Want to shorten execution time of a long-running Spark application
 Computation-heavy
 Shuffle-heavy
 I/O-heavy
 Goal
– Accelerate a Spark computation-heavy application
 According to Reynold’s talk (p. 21), CPU will become bottleneck on Spark
4 Exploting GPUs in Spark - Kazuaki Ishizaki
Accelerate a Spark Application by GPUs
 Approach
– Accelerate a Spark application by using GPUs effectively and transparently
 Exploit high performance of GPUs
 Do not ask users to change their Spark programs
 New components
– Binary columnar
– GPU enabler
5 Exploting GPUs in Spark - Kazuaki Ishizaki
 Motivation & Goal
 Introduction of GPUs
 Design & New Components
 Current Implementation
 Performance Experiment
 Future Direction in Spark 2.0 and beyond
 Conclusion
GPU Programming Model
 Five steps
1. Allocate GPU device memory
2. Copy data on CPU main memory to GPU device memory
3. Launch a GPU kernel to be executed in parallel on cores
4. Copy back data on GPU device memory to CPU main memory
5. Free GPU device memory
 Usually, a programmer has to write these steps in CUDA or OpenCL
7 Exploting GPUs in Spark - Kazuaki Ishizaki
device memory
(up to 12GB)
main memory
(up to 1TB/socket)
CPU GPU
Data copy
over PCIe
dozen cores/socket thousands cores
How We Can Run Program Faster on GPU
 Assign a lot of parallel computations into cores
 Make memory accesses coalesced
– An example
– Column-oriented layout achieves better performance
 This paper reports about 3x performance improvement of GPU kernel execution of
kmeans over row-oriented layout
8 Exploting GPUs in Spark - Kazuaki Ishizaki
1 52 61 5 3 7
Assumption: 4 consecutive data elements
can be coalesced by GPU hardware
2 v.s. 4
memory accesses to
GPU device memory Row-oriented layoutColumn-oriented layout
Pt(x: Int, y: Int)
Load four Pt.x
Load four Pt.y
2 6 4 843 87
coresx1 x2 x3 x4
cores
Load Pt.x Load Pt.y Load Pt.x Load Pt.y
1 2 31 2 4
y1 y2 y3 y4 x1 x2 x3 x4 y1 y2 y3 y4
 Motivation & Goal
 Introduction of GPUs
 Design & New Components
 Current Implementation
 Performance Experiment
 Future Direction in Spark 2.0 and beyond
 Conclusion
Design of GPU Exploitation
 Efficient
– Reduce data copy overhead between CPU and GPU
– Make memory accesses efficient on GPU
 Transparent
– Map parallelism in a program
into GPU native code
User’s Spark Program (scala)
10
case class Pt(x: Int, y: Int)
rdd1 = sc.parallelize(Array(
Pt(1, 4), Pt(2, 5),
Pt(3, 6), Pt(4, 7),
Pt(5, 8), Pt(6, 9)), 3)
rdd2 = rdd1.map(p => Pt(p.x*2, p.y‐1))
cnt =  rdd2.reduce(
(p1, p2) => p1.x + p2.x)
Translate to
GPU native
code
Nativecode
1
GPU
4
2 5
3 6
4 7
5 8
6 9
1 4
2 5
3 6
4 7
5 8
6 9
2 3
4 4
6 5
8 6
10 7
12 8
2 3
4 4
6 5
8 6
10 7
12 8
*2=
-1=
rdd
1
Data
transfer
x y
Exploting GPUs in Spark - Kazuaki Ishizaki
GPU enabler
binary columnar Off-heap
x y
GPU can exploit parallelism both
among blocks in RDD and
within a block of RDD
rdd
2
block
GPU
kernel
CPU
What Binary Columnar does?
 Keep data as binary representation (not Java object representation)
 Keep data as column-oriented layout
 Keep data on off-heap or GPU device memory
11 Exploting GPUs in Spark - Kazuaki Ishizaki
2 51 4
Off-heap
case class Pt(x: Int, y: Int)
Array(Pt(1, 4),
Pt(2, 5))
Example
2 51 4
Off-heap
Columnar (column-oriented) Row-oriented
Current RDD as Java objects on Java heap
12 Exploting GPUs in Spark - Kazuaki Ishizaki
case class Pt(x: Int, y: Int)
rdd = sc.parallelize(Array(Pt(1, 4),
Pt(2, 5)))
Object header for Java virtual machine
1 4 2 5
Java heap
Current RDD
Row-oriented layout
Java object representation
On Java heap
Pt Pt
Binary Columnar RDD on off-heap
13 Exploting GPUs in Spark - Kazuaki Ishizaki
case class Pt(x: Int, y: Int)
rdd = sc.parallelize(Array(Pt(1, 4),
Pt(2, 5)))
Object header for Java virtual machine
1 4 2 5
Java heap Off-heap
2 51 4
Current RDD
Row-oriented layout
Java object representation
On Java heap
Binary columnar RDD
Column-oriented layout
Binary representation
On off-heap
2.1.
Long Path from Current RDD to GPU
 Three steps to send data from RDD to GPU
1. Java objects to column-oriented binary representation on Java heap
 From a Java object to binary representation
 From a row-oriented format to columnar
2. Binary representation on Java heap to binary columnar on off-heap
 Garbage collection may move objects on Java heap during GPU related operations
3. Off-heap to GPU device memory
14 Exploting GPUs in Spark - Kazuaki Ishizaki
case class Pt(x: Int, y: Int)
rdd = sc.parallelize(Array(Pt(1, 4),Pt(2, 5)))
rdd.map(
).reduce(
) // execute on GPU
1 4 2 5 2 51 4 2 51 4 2 51 4
Off-heap GPU device memoryJava heap Java heap
This thread in dev ML also discusses overhead of copying data between RDD and GPU
3.
Pt Pt ByteBuffer ByteBuffer
Short Path from Binary Columnar RDD to GPU
 RDD with binary columnar can be simply copied to GPU device memory
15 Exploting GPUs in Spark - Kazuaki Ishizaki
case class Pt(x: Int, y: Int)
rdd = sc.parallelize(Array(Pt(1, 4),Pt(2, 5)))
rdd.map(
).reduce(
) // execute on GPU
Off-heap GPU device memory
Eliminated
2 51 4 2 51 4
1 4 2 5 2 51 4 2 51 4
Off-heap GPU device memoryJava heap
2 51 4
Java heap
Can Execute map() in Parallel Using Binary Columnar
 Adjacent elements in binary columnar RDD can be accessed in parallel
 The same type of operations ( * or -) can be executed in parallel for data
to be loaded in parallel
16 Exploting GPUs in Spark - Kazuaki Ishizaki
case class Pt(x: Int, y: Int)
rdd = sc.parallelize(Array(Pt(1, 4),
Pt(2, 5)))
rdd1= rdd1.map(p => Pt(p.x*2, p.y‐1)) 
1 4 2 5
Java heap Off-heap
2 51 4
Current RDD Binary columnar RDD
Memory access
order 1 2 3 4 1 1 2 2
Advantages of Binary Columnar
 Can exploit high performance of GPUs
 Can reduce overhead of data copy between CPU and GPU
 Consume less memory footprint
 Can directly compute data, which are stored in columnar, from Apache
Parquet
 Can exploit SIMD instructions on CPU
17 Exploting GPUs in Spark - Kazuaki Ishizaki
What GPU Enabler Does?
 Copy data in binary columnar RDD between CPU main memory and GPU
device memory
 Launch GPU kernels
 Cache GPU native code for kernels
 Generate GPU native code from transformations and actions in a program
– We already productized the IBM Java just-in-time compiler that generate GPU
native code from a lambda expression in Java 8
18 Exploting GPUs in Spark - Kazuaki Ishizaki
 Motivation & Goal
 Introduction of GPUs
 Design & New Components
 Current Implementation
 Performance Experiment
 Future Direction in Spark 2.0 and beyond
 Conclusion
Software Stack in Current Spark 2.0-SNAPSHOT
 RDD keeps data on Java heap
20 Exploting GPUs in Spark - Kazuaki Ishizaki
RDD API
Java heap
RDD data
User’s Spark program
Off-heap
Software Stack of GPU Exploitation
 Current RDD and binary columnar RDD co-exist
21 Exploting GPUs in Spark - Kazuaki Ishizaki
RDD API
Java heap
RDD data
User’s Spark program
Columnar
GPU
enabler
GPU device memory
Columnar
Current Implementation of Binary Columnar
 Work with RDD
 Convert from current RDD to binary columnar RDD and vice versa
– Our current implementation eliminates conversion overhead between CPU and
GPU in a task
22 Exploting GPUs in Spark - Kazuaki Ishizaki
Current Implementation of GPU Enabler
 Execute user-provided GPU kernels from map()/reduce() functions
– GPU memory managements and data copy are automatically handled
 Generate GPU native code for simple map()/reduce() methods
– “spark.gpu.codegen=true” in spark-defaults.conf
23 Exploting GPUs in Spark - Kazuaki Ishizaki
rdd1 = sc.parallelize(1 to n, 2).convert(ColumnFormat) // rdd1 uses binary columnar RDD
sum  = rdd1.map(i => i * 2)
.reduce((x, y) => (x + y))
// CUDA
__global__ void sample_map(int *inX, int *inY, int *outX, int *outY, long size) {
long ix = threadIdx.x + blockIdx.x * blockDim.x;
if (size <= ix) return;
outX[ix] = inX[ix] * 2;
outY[ix] = inY[ix] – 1;
}
// Spark
mapFunction = new CUDAFunction(“sample_map", // CUDA method name
Array("this.x", "this.y"), // input object  has two fields
Array("this.x“, “this.y”), // output object has two fields
this.getClass.getResource("/sample.ptx")) // ptx is generated by CUDA complier
rdd1 = sc.parallelize(
).convert(ColumnFormat) // rdd1 uses binary columnar RDD
rdd2 = rdd1.mapExtFunc(p => Pt(p.x*2, p.y‐1), mapFunction)
How to Use GPU Exploitation version
 Easy to install by one-liner and to run by one-liner
– on x86_64, mac, and ppc64le with CUDA 7.0 or later with any JVM such as IBM
JDK or OpenJDK
 Run script for AWS EC2 is available, which support spot instances24 Exploting GPUs in Spark - Kazuaki Ishizaki
$ wget https://s3.amazonaws.com/spark‐gpu‐public/spark‐gpu‐latest‐bin‐hadoop2.4.tgz &&
tar xf spark‐gpu‐latest‐bin‐hadoop2.4.tgz && cd spark‐gpu
$ LD_LIBRARY_PATH=/usr/local/cuda/lib64 MASTER='local[2]' ./bin/run‐example SparkGPULR 8 3200 32 5


numSlices=8, N=3200, D=32, ITERATIONS=5                                         
On iteration 1
On iteration 2
On iteration 3
On iteration 4
On iteration 5
Elapsed time: 431 ms
$
Available at http://kiszk.github.io/spark-gpu/
‱ 3 contributors
‱ Private communications
with other developers
Achieved 3.15x Performance Improvement by GPU
 Ran naĂŻve implementation of logistic regression
 Achieved 3.15x performance improvement of logistic regression over
without GPU on a 16-core IvyBridge box with an NVIDIA K40 GPU card
– We have rooms to improve performance
25 Exploting GPUs in Spark - Kazuaki Ishizaki
Details are available at https://github.com/kiszk/spark-gpu/wiki/Benchmark
Program parameters
N=1,000,000 (# of points), D=400 (# of features), ITERATIONS=5
Slices=128 (without GPU), 16 (with GPU)
MASTER=local[8] (without and with GPU)
Hardware and software
Machine: nx360 M4, 2 sockets 8‐core Intel Xeon E5‐2667 3.3GHz, 256GB memory, one NVIDIA K40m card
OS: RedHat 6.6, CUDA: 7.0
 Motivation & Goal
 Introduction of GPUs
 Design & New Components
 Current Implementation
 Performance Experiment
 Future Direction in Spark 2.0 and beyond
 Conclusion
Comparisons among DataFrame, Dataset, and RDD
 DataFrame (with relational operations) and Dataset (with lambda
functions) use Catalyst and row-oriented data representation on off-heap
27 Exploting GPUs in Spark - Kazuaki Ishizaki
ds = d.toDS()
ds.filter(p => p.x>1)
.count()
1 4 2 5
Java heap
rdd = sc.parallelize(d)
rdd.filter(p => p.x>1)
.count()
df = d.toDF(
)
df.filter(”x>1”)
.count()
case class Pt(x: Int, y: Int)
d = Array(Pt(1, 4), Pt(2, 5))
Frontend
API
2 51 4
Off-heap
Data
DataFrame (v1.3-) Dataset (v1.6-) RDD (v0.5-)
Catalyst
Backend
computation
Generated
Java bytecode
Java bytecode in
Spark program and runtime
Row-oriented
Row-oriented
Design Concepts of Dataset and GPU Exploitation
 Keep data as binary representation
 Keep data on off-heap
 Take advantages of Catalyst optimizer
28 Exploting GPUs in Spark - Kazuaki Ishizaki
2 51 4
Off-heap
case class Pt(x: Int, y: Int)
sc.parallelize(Array(Pt(1, 4),Pt(2, 5)))
Comparison of data representations
2 51 4
Off-heap
case class Pt(x: Int, y: Int)
ds = (Pt(1, 4),Pt(2, 5)).toDS()
How can we apply binary columnar and GPU enabler to Dataset?
Dataset Binary columnar RDD
Binary columnar also does
GPU enabler could use
Row-oriented Columnar
GPU kernel launcher
Column Encoder
Binary Encoder
In-memory storage
Components in GPU Exploitation
 Binary columnar
– Columnar
 In-memory storage keeps data in binary representation on off-heap or GPU memory
 BinaryEncoder converts a data representation between a Java object and binary format
 ColumnEncoder puts a set of data elements as column-oriented layout
– Memory Manager
 Manage off-heap and GPU memory
 Columnar cache manages
persistency of in-memory storage
 GPU enabler
– GPU kernel launcher
 Launch kernels with data copy
 Caching GPU binary for kernels
– GPU code generator
 Generate GPU code from Spark program
29 Exploting GPUs in Spark - Kazuaki Ishizaki
Columnar cache
GPU code generator
Pre-compiled
libraries for GPU
Memory Manager Columnar
GPU memory
Off-heap memory
Software Stack in Spark 2.0 and Beyond
 Dataset will become a primary data structure for computation
 Dataset keeps data in UnsafeRow on off-heap
30 Exploting GPUs in Spark - Kazuaki Ishizaki
DataFrame
Dataset
Tungsten
Catalyst
Off-heap
UnsafeRow
User’s Spark program
Logical optimizer
CPU code generator
Columnar with Dataset
 Keep data in UnsafeRow or Columnar on off-heap, or Columnar on GPU
device memory
31 Exploting GPUs in Spark - Kazuaki Ishizaki
User’s Spark program
DataFrame
Dataset
Tungsten
Catalyst
Off-heap
UnsafeRow
GPU device memory
Columnar
Logical optimizer
Memory manager
CPU code generator
Columnar
Two Approaches for Binary Columnar with Dataset
 Binary Columnar as a first-class citizen
– Better end-to-end performance in a job without conversion
– Need more code changes to the existing source code
 Binary Columnar as a cache in a task
– Produce overhead of representation conversions between two tasks at shuffle
– Need less code changes to the existing source code
32 Exploting GPUs in Spark - Kazuaki Ishizaki
ds1 =
d.toDS()
ds2 =
ds1.map(
)
ds11 =
ds3.groupby(
)
ds3 =
ds2.map(
)
ds12 =
ds11.map(
)
As a
first-class
citizen task1 task2
As a
cache
shuffle
GPU Support in Tungsten
 According to Reynold’s talk (p. 25), Tungsten backend has a plan to enable
GPU exploitation
Exploiting GPUs in Spark - Kazuaki Ishizaki33
GPU Enabler in Catalyst
 Place GPU kernel launcher and GPU code generator into Catalyst
34 Exploting GPUs in Spark - Kazuaki Ishizaki
User’s Spark program
DataFrame
Dataset
Tungsten
Catalyst
Off-heap
UnsafeRow
GPU device memory
Columnar
Logical optimizer
Memory manager
CPU code generator
GPU code generatorGPU kernel launcher
Columnar
Future Direction
 Do refactoring to make current implementation decomposable
– Some components exist in one Scala file
 Make pull requests for each component
– to support columnar Dataset
– to exploit GPUs
35 Exploting GPUs in Spark - Kazuaki Ishizaki
Memory Manager Columnar
Binary
encoder
Column
encoder
In-memory
storage
Memory
manager
Cache
manager
As a cache
in task
As a first-
class citizen
Multiple
backend
support
CPU code
generator for
Columnar
CPU code
generator for
Columnar
GPU kernel launcher
Column Encoder
Binary Encoder
In-memory storageColumnar cache
GPU code generator
GPU memory
Off-heap memory
Roadmap for pull requests
Off-heap
Catalyst
Takeaway
 Accelerate a Spark application by using GPUs effectively and transparently
 Devised two New components
– Binary columnar to alleviate overhead for GPU exploitation
– GPU enabler to manage GPU kernel execution from a Spark program
 Call pre-compiled libraries for GPU
 Generate GPU native code at runtime
 Available at http://kiszk.github.io/spark-gpu/
36
Component Initial design
(Spark 1.3-1.5)
Current status
(Spark 2.0-Snapshot)
Future
(Spark 2.x)
Binary
columnar
with RDD with RDD with Dataset
GPU enabler launch GPU kernels
generate GPU native code
launch GPU kernels
generate GPU native code
in Catalyst
Exploting GPUs in Spark - Kazuaki Ishizaki
Appreciate any your feedback and contributions

More Related Content

What's hot

Introduction to Apache Spark Ecosystem
Introduction to Apache Spark EcosystemIntroduction to Apache Spark Ecosystem
Introduction to Apache Spark EcosystemBojan Babic
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDatabricks
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQLDatabricks
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkRahul Jain
 
Introduction to Spark with Python
Introduction to Spark with PythonIntroduction to Spark with Python
Introduction to Spark with PythonGokhan Atil
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupDatabricks
 
Deep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.xDeep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.xDatabricks
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkSamy Dindane
 
Introduction to apache spark
Introduction to apache spark Introduction to apache spark
Introduction to apache spark Aakashdata
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...Simplilearn
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architectureSohil Jain
 
Why your Spark job is failing
Why your Spark job is failingWhy your Spark job is failing
Why your Spark job is failingSandy Ryza
 
PySpark in practice slides
PySpark in practice slidesPySpark in practice slides
PySpark in practice slidesDat Tran
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekVenkata Naga Ravi
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Databricks
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLDatabricks
 

What's hot (20)

Introduction to Apache Spark Ecosystem
Introduction to Apache Spark EcosystemIntroduction to Apache Spark Ecosystem
Introduction to Apache Spark Ecosystem
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQL
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Dive into PySpark
Dive into PySparkDive into PySpark
Dive into PySpark
 
Introduction to Spark with Python
Introduction to Spark with PythonIntroduction to Spark with Python
Introduction to Spark with Python
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
 
Deep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.xDeep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.x
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Introduction to apache spark
Introduction to apache spark Introduction to apache spark
Introduction to apache spark
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
 
Spark
SparkSpark
Spark
 
Why your Spark job is failing
Why your Spark job is failingWhy your Spark job is failing
Why your Spark job is failing
 
PySpark in practice slides
PySpark in practice slidesPySpark in practice slides
PySpark in practice slides
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
 

Similar to Exploiting GPUs in Spark

Exploiting GPUs in Spark
Exploiting GPUs in SparkExploiting GPUs in Spark
Exploiting GPUs in SparkKazuaki Ishizaki
 
Transparent GPU Exploitation for Java
Transparent GPU Exploitation for JavaTransparent GPU Exploitation for Java
Transparent GPU Exploitation for JavaKazuaki Ishizaki
 
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storageKohei KaiGai
 
Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...
Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...
Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...Databricks
 
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...Databricks
 
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPBuild Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPDatabricks
 
HPBigData2015 PSTL kafka spark vertica
HPBigData2015 PSTL kafka spark verticaHPBigData2015 PSTL kafka spark vertica
HPBigData2015 PSTL kafka spark verticaJack Gudenkauf
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)Kohei KaiGai
 
Speed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS AcceleratorSpeed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS AcceleratorDatabricks
 
Apache spark sneha challa- google pittsburgh-aug 25th
Apache spark  sneha challa- google pittsburgh-aug 25thApache spark  sneha challa- google pittsburgh-aug 25th
Apache spark sneha challa- google pittsburgh-aug 25thSneha Challa
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsKohei KaiGai
 
NVIDIA CUDA
NVIDIA CUDANVIDIA CUDA
NVIDIA CUDAJungsoo Nam
 
An Overview of Apache Spark
An Overview of Apache SparkAn Overview of Apache Spark
An Overview of Apache SparkYasoda Jayaweera
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...AMD Developer Central
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda enKohei KaiGai
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAlluxio, Inc.
 

Similar to Exploiting GPUs in Spark (20)

Exploiting GPUs in Spark
Exploiting GPUs in SparkExploiting GPUs in Spark
Exploiting GPUs in Spark
 
Transparent GPU Exploitation for Java
Transparent GPU Exploitation for JavaTransparent GPU Exploitation for Java
Transparent GPU Exploitation for Java
 
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage
 
Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...
Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...
Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...
 
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
 
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPBuild Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
 
HPBigData2015 PSTL kafka spark vertica
HPBigData2015 PSTL kafka spark verticaHPBigData2015 PSTL kafka spark vertica
HPBigData2015 PSTL kafka spark vertica
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)
 
Speed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS AcceleratorSpeed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS Accelerator
 
Apache spark sneha challa- google pittsburgh-aug 25th
Apache spark  sneha challa- google pittsburgh-aug 25thApache spark  sneha challa- google pittsburgh-aug 25th
Apache spark sneha challa- google pittsburgh-aug 25th
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
 
NVIDIA CUDA
NVIDIA CUDANVIDIA CUDA
NVIDIA CUDA
 
An Overview of Apache Spark
An Overview of Apache SparkAn Overview of Apache Spark
An Overview of Apache Spark
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda en
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
 

More from Kazuaki Ishizaki

20230105_TITECH_lecture_ishizaki_public.pdf
20230105_TITECH_lecture_ishizaki_public.pdf20230105_TITECH_lecture_ishizaki_public.pdf
20230105_TITECH_lecture_ishizaki_public.pdfKazuaki Ishizaki
 
20221226_TITECH_lecture_ishizaki_public.pdf
20221226_TITECH_lecture_ishizaki_public.pdf20221226_TITECH_lecture_ishizaki_public.pdf
20221226_TITECH_lecture_ishizaki_public.pdfKazuaki Ishizaki
 
Make AI ecosystem more interoperable
Make AI ecosystem more interoperableMake AI ecosystem more interoperable
Make AI ecosystem more interoperableKazuaki Ishizaki
 
Introduction new features in Spark 3.0
Introduction new features in Spark 3.0Introduction new features in Spark 3.0
Introduction new features in Spark 3.0Kazuaki Ishizaki
 
SQL Performance Improvements At a Glance in Apache Spark 3.0
SQL Performance Improvements At a Glance in Apache Spark 3.0SQL Performance Improvements At a Glance in Apache Spark 3.0
SQL Performance Improvements At a Glance in Apache Spark 3.0Kazuaki Ishizaki
 
SparkTokyo2019NovIshizaki
SparkTokyo2019NovIshizakiSparkTokyo2019NovIshizaki
SparkTokyo2019NovIshizakiKazuaki Ishizaki
 
In-Memory Evolution in Apache Spark
In-Memory Evolution in Apache SparkIn-Memory Evolution in Apache Spark
In-Memory Evolution in Apache SparkKazuaki Ishizaki
 
icpe2019_ishizaki_public
icpe2019_ishizaki_publicicpe2019_ishizaki_public
icpe2019_ishizaki_publicKazuaki Ishizaki
 
hscj2019_ishizaki_public
hscj2019_ishizaki_publichscj2019_ishizaki_public
hscj2019_ishizaki_publicKazuaki Ishizaki
 
Looking back at Spark 2.x and forward to 3.0
Looking back at Spark 2.x and forward to 3.0Looking back at Spark 2.x and forward to 3.0
Looking back at Spark 2.x and forward to 3.0Kazuaki Ishizaki
 
20180109 titech lecture_ishizaki_public
20180109 titech lecture_ishizaki_public20180109 titech lecture_ishizaki_public
20180109 titech lecture_ishizaki_publicKazuaki Ishizaki
 
20171212 titech lecture_ishizaki_public
20171212 titech lecture_ishizaki_public20171212 titech lecture_ishizaki_public
20171212 titech lecture_ishizaki_publicKazuaki Ishizaki
 
Demystifying DataFrame and Dataset
Demystifying DataFrame and DatasetDemystifying DataFrame and Dataset
Demystifying DataFrame and DatasetKazuaki Ishizaki
 
Making Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseMaking Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseKazuaki Ishizaki
 
20160906 pplss ishizaki public
20160906 pplss ishizaki public20160906 pplss ishizaki public
20160906 pplss ishizaki publicKazuaki Ishizaki
 
Easy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java ProgrammersEasy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java ProgrammersKazuaki Ishizaki
 
20151112 kutech lecture_ishizaki_public
20151112 kutech lecture_ishizaki_public20151112 kutech lecture_ishizaki_public
20151112 kutech lecture_ishizaki_publicKazuaki Ishizaki
 
20141224 titech lecture_ishizaki_public
20141224 titech lecture_ishizaki_public20141224 titech lecture_ishizaki_public
20141224 titech lecture_ishizaki_publicKazuaki Ishizaki
 
Java Just-In-Timeă‚łăƒłăƒ‘ă‚€ăƒ©
Java Just-In-Timeă‚łăƒłăƒ‘ă‚€ăƒ©Java Just-In-Timeă‚łăƒłăƒ‘ă‚€ăƒ©
Java Just-In-Timeă‚łăƒłăƒ‘ă‚€ăƒ©Kazuaki Ishizaki
 

More from Kazuaki Ishizaki (20)

20230105_TITECH_lecture_ishizaki_public.pdf
20230105_TITECH_lecture_ishizaki_public.pdf20230105_TITECH_lecture_ishizaki_public.pdf
20230105_TITECH_lecture_ishizaki_public.pdf
 
20221226_TITECH_lecture_ishizaki_public.pdf
20221226_TITECH_lecture_ishizaki_public.pdf20221226_TITECH_lecture_ishizaki_public.pdf
20221226_TITECH_lecture_ishizaki_public.pdf
 
Make AI ecosystem more interoperable
Make AI ecosystem more interoperableMake AI ecosystem more interoperable
Make AI ecosystem more interoperable
 
Introduction new features in Spark 3.0
Introduction new features in Spark 3.0Introduction new features in Spark 3.0
Introduction new features in Spark 3.0
 
SQL Performance Improvements At a Glance in Apache Spark 3.0
SQL Performance Improvements At a Glance in Apache Spark 3.0SQL Performance Improvements At a Glance in Apache Spark 3.0
SQL Performance Improvements At a Glance in Apache Spark 3.0
 
SparkTokyo2019NovIshizaki
SparkTokyo2019NovIshizakiSparkTokyo2019NovIshizaki
SparkTokyo2019NovIshizaki
 
SparkTokyo2019
SparkTokyo2019SparkTokyo2019
SparkTokyo2019
 
In-Memory Evolution in Apache Spark
In-Memory Evolution in Apache SparkIn-Memory Evolution in Apache Spark
In-Memory Evolution in Apache Spark
 
icpe2019_ishizaki_public
icpe2019_ishizaki_publicicpe2019_ishizaki_public
icpe2019_ishizaki_public
 
hscj2019_ishizaki_public
hscj2019_ishizaki_publichscj2019_ishizaki_public
hscj2019_ishizaki_public
 
Looking back at Spark 2.x and forward to 3.0
Looking back at Spark 2.x and forward to 3.0Looking back at Spark 2.x and forward to 3.0
Looking back at Spark 2.x and forward to 3.0
 
20180109 titech lecture_ishizaki_public
20180109 titech lecture_ishizaki_public20180109 titech lecture_ishizaki_public
20180109 titech lecture_ishizaki_public
 
20171212 titech lecture_ishizaki_public
20171212 titech lecture_ishizaki_public20171212 titech lecture_ishizaki_public
20171212 titech lecture_ishizaki_public
 
Demystifying DataFrame and Dataset
Demystifying DataFrame and DatasetDemystifying DataFrame and Dataset
Demystifying DataFrame and Dataset
 
Making Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseMaking Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to Use
 
20160906 pplss ishizaki public
20160906 pplss ishizaki public20160906 pplss ishizaki public
20160906 pplss ishizaki public
 
Easy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java ProgrammersEasy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java Programmers
 
20151112 kutech lecture_ishizaki_public
20151112 kutech lecture_ishizaki_public20151112 kutech lecture_ishizaki_public
20151112 kutech lecture_ishizaki_public
 
20141224 titech lecture_ishizaki_public
20141224 titech lecture_ishizaki_public20141224 titech lecture_ishizaki_public
20141224 titech lecture_ishizaki_public
 
Java Just-In-Timeă‚łăƒłăƒ‘ă‚€ăƒ©
Java Just-In-Timeă‚łăƒłăƒ‘ă‚€ăƒ©Java Just-In-Timeă‚łăƒłăƒ‘ă‚€ăƒ©
Java Just-In-Timeă‚łăƒłăƒ‘ă‚€ăƒ©
 

Recently uploaded

%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...masabamasaba
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburgmasabamasaba
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationJuha-Pekka Tolvanen
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Hararemasabamasaba
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 

Recently uploaded (20)

%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
Abortion Pills In Pretoria ](+27832195400*)[ đŸ„ Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ đŸ„ Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ đŸ„ Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ đŸ„ Women's Abortion Clinic In Pre...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 

Exploiting GPUs in Spark

  • 1. Kazuaki Ishizaki IBM Research – Tokyo âœ‡æœŹă‚ąă‚€ăƒ»ăƒ“ăƒŒăƒ»ă‚šăƒ ïŒˆæ Ș東äșŹćŸș瀎研究所 Exploiting GPUs in Spark 1
  • 2. Who am I?  Kazuaki Ishizaki  Research staff member at IBM Research – Tokyo – http://ibm.co/kiszk  Research interests – compiler optimizations, language runtime, and parallel processing  Worked for Java virtual machine and just-in-time compiler over 20 years – From JDK 1.0 to Java SE 8  Twitter: @kiszk  Slideshare: http://www.slideshare.net/ishizaki  Github: https://github.com/kiszk 2 Exploting GPUs in Spark - Kazuaki Ishizaki
  • 3. Agenda  Motivation & Goal  Introduction of GPUs  Design & New Components – Binary columnar – GPU enabler  Current Implementation  Performance Experiment – Achieved 3.15x performance of a naĂŻve logistic regression by using a GPU  Future Direction in Spark 2.0 and beyond – with Dataset (introduced in Spark 1.6)  Conclusion 3 Exploting GPUs in Spark - Kazuaki Ishizaki
  • 4. Want to Accelerate Computation-heavy Application  Motivation – Want to shorten execution time of a long-running Spark application  Computation-heavy  Shuffle-heavy  I/O-heavy  Goal – Accelerate a Spark computation-heavy application  According to Reynold’s talk (p. 21), CPU will become bottleneck on Spark 4 Exploting GPUs in Spark - Kazuaki Ishizaki
  • 5. Accelerate a Spark Application by GPUs  Approach – Accelerate a Spark application by using GPUs effectively and transparently  Exploit high performance of GPUs  Do not ask users to change their Spark programs  New components – Binary columnar – GPU enabler 5 Exploting GPUs in Spark - Kazuaki Ishizaki
  • 6.  Motivation & Goal  Introduction of GPUs  Design & New Components  Current Implementation  Performance Experiment  Future Direction in Spark 2.0 and beyond  Conclusion
  • 7. GPU Programming Model  Five steps 1. Allocate GPU device memory 2. Copy data on CPU main memory to GPU device memory 3. Launch a GPU kernel to be executed in parallel on cores 4. Copy back data on GPU device memory to CPU main memory 5. Free GPU device memory  Usually, a programmer has to write these steps in CUDA or OpenCL 7 Exploting GPUs in Spark - Kazuaki Ishizaki device memory (up to 12GB) main memory (up to 1TB/socket) CPU GPU Data copy over PCIe dozen cores/socket thousands cores
  • 8. How We Can Run Program Faster on GPU  Assign a lot of parallel computations into cores  Make memory accesses coalesced – An example – Column-oriented layout achieves better performance  This paper reports about 3x performance improvement of GPU kernel execution of kmeans over row-oriented layout 8 Exploting GPUs in Spark - Kazuaki Ishizaki 1 52 61 5 3 7 Assumption: 4 consecutive data elements can be coalesced by GPU hardware 2 v.s. 4 memory accesses to GPU device memory Row-oriented layoutColumn-oriented layout Pt(x: Int, y: Int) Load four Pt.x Load four Pt.y 2 6 4 843 87 coresx1 x2 x3 x4 cores Load Pt.x Load Pt.y Load Pt.x Load Pt.y 1 2 31 2 4 y1 y2 y3 y4 x1 x2 x3 x4 y1 y2 y3 y4
  • 9.  Motivation & Goal  Introduction of GPUs  Design & New Components  Current Implementation  Performance Experiment  Future Direction in Spark 2.0 and beyond  Conclusion
  • 10. Design of GPU Exploitation  Efficient – Reduce data copy overhead between CPU and GPU – Make memory accesses efficient on GPU  Transparent – Map parallelism in a program into GPU native code User’s Spark Program (scala) 10 case class Pt(x: Int, y: Int) rdd1 = sc.parallelize(Array( Pt(1, 4), Pt(2, 5), Pt(3, 6), Pt(4, 7), Pt(5, 8), Pt(6, 9)), 3) rdd2 = rdd1.map(p => Pt(p.x*2, p.y‐1)) cnt =  rdd2.reduce( (p1, p2) => p1.x + p2.x) Translate to GPU native code Nativecode 1 GPU 4 2 5 3 6 4 7 5 8 6 9 1 4 2 5 3 6 4 7 5 8 6 9 2 3 4 4 6 5 8 6 10 7 12 8 2 3 4 4 6 5 8 6 10 7 12 8 *2= -1= rdd 1 Data transfer x y Exploting GPUs in Spark - Kazuaki Ishizaki GPU enabler binary columnar Off-heap x y GPU can exploit parallelism both among blocks in RDD and within a block of RDD rdd 2 block GPU kernel CPU
  • 11. What Binary Columnar does?  Keep data as binary representation (not Java object representation)  Keep data as column-oriented layout  Keep data on off-heap or GPU device memory 11 Exploting GPUs in Spark - Kazuaki Ishizaki 2 51 4 Off-heap case class Pt(x: Int, y: Int) Array(Pt(1, 4), Pt(2, 5)) Example 2 51 4 Off-heap Columnar (column-oriented) Row-oriented
  • 12. Current RDD as Java objects on Java heap 12 Exploting GPUs in Spark - Kazuaki Ishizaki case class Pt(x: Int, y: Int) rdd = sc.parallelize(Array(Pt(1, 4), Pt(2, 5))) Object header for Java virtual machine 1 4 2 5 Java heap Current RDD Row-oriented layout Java object representation On Java heap Pt Pt
  • 13. Binary Columnar RDD on off-heap 13 Exploting GPUs in Spark - Kazuaki Ishizaki case class Pt(x: Int, y: Int) rdd = sc.parallelize(Array(Pt(1, 4), Pt(2, 5))) Object header for Java virtual machine 1 4 2 5 Java heap Off-heap 2 51 4 Current RDD Row-oriented layout Java object representation On Java heap Binary columnar RDD Column-oriented layout Binary representation On off-heap
  • 14. 2.1. Long Path from Current RDD to GPU  Three steps to send data from RDD to GPU 1. Java objects to column-oriented binary representation on Java heap  From a Java object to binary representation  From a row-oriented format to columnar 2. Binary representation on Java heap to binary columnar on off-heap  Garbage collection may move objects on Java heap during GPU related operations 3. Off-heap to GPU device memory 14 Exploting GPUs in Spark - Kazuaki Ishizaki case class Pt(x: Int, y: Int) rdd = sc.parallelize(Array(Pt(1, 4),Pt(2, 5))) rdd.map(
).reduce(
) // execute on GPU 1 4 2 5 2 51 4 2 51 4 2 51 4 Off-heap GPU device memoryJava heap Java heap This thread in dev ML also discusses overhead of copying data between RDD and GPU 3. Pt Pt ByteBuffer ByteBuffer
  • 15. Short Path from Binary Columnar RDD to GPU  RDD with binary columnar can be simply copied to GPU device memory 15 Exploting GPUs in Spark - Kazuaki Ishizaki case class Pt(x: Int, y: Int) rdd = sc.parallelize(Array(Pt(1, 4),Pt(2, 5))) rdd.map(
).reduce(
) // execute on GPU Off-heap GPU device memory Eliminated 2 51 4 2 51 4 1 4 2 5 2 51 4 2 51 4 Off-heap GPU device memoryJava heap 2 51 4 Java heap
  • 16. Can Execute map() in Parallel Using Binary Columnar  Adjacent elements in binary columnar RDD can be accessed in parallel  The same type of operations ( * or -) can be executed in parallel for data to be loaded in parallel 16 Exploting GPUs in Spark - Kazuaki Ishizaki case class Pt(x: Int, y: Int) rdd = sc.parallelize(Array(Pt(1, 4), Pt(2, 5))) rdd1= rdd1.map(p => Pt(p.x*2, p.y‐1))  1 4 2 5 Java heap Off-heap 2 51 4 Current RDD Binary columnar RDD Memory access order 1 2 3 4 1 1 2 2
  • 17. Advantages of Binary Columnar  Can exploit high performance of GPUs  Can reduce overhead of data copy between CPU and GPU  Consume less memory footprint  Can directly compute data, which are stored in columnar, from Apache Parquet  Can exploit SIMD instructions on CPU 17 Exploting GPUs in Spark - Kazuaki Ishizaki
  • 18. What GPU Enabler Does?  Copy data in binary columnar RDD between CPU main memory and GPU device memory  Launch GPU kernels  Cache GPU native code for kernels  Generate GPU native code from transformations and actions in a program – We already productized the IBM Java just-in-time compiler that generate GPU native code from a lambda expression in Java 8 18 Exploting GPUs in Spark - Kazuaki Ishizaki
  • 19.  Motivation & Goal  Introduction of GPUs  Design & New Components  Current Implementation  Performance Experiment  Future Direction in Spark 2.0 and beyond  Conclusion
  • 20. Software Stack in Current Spark 2.0-SNAPSHOT  RDD keeps data on Java heap 20 Exploting GPUs in Spark - Kazuaki Ishizaki RDD API Java heap RDD data User’s Spark program
  • 21. Off-heap Software Stack of GPU Exploitation  Current RDD and binary columnar RDD co-exist 21 Exploting GPUs in Spark - Kazuaki Ishizaki RDD API Java heap RDD data User’s Spark program Columnar GPU enabler GPU device memory Columnar
  • 22. Current Implementation of Binary Columnar  Work with RDD  Convert from current RDD to binary columnar RDD and vice versa – Our current implementation eliminates conversion overhead between CPU and GPU in a task 22 Exploting GPUs in Spark - Kazuaki Ishizaki
  • 23. Current Implementation of GPU Enabler  Execute user-provided GPU kernels from map()/reduce() functions – GPU memory managements and data copy are automatically handled  Generate GPU native code for simple map()/reduce() methods – “spark.gpu.codegen=true” in spark-defaults.conf 23 Exploting GPUs in Spark - Kazuaki Ishizaki rdd1 = sc.parallelize(1 to n, 2).convert(ColumnFormat) // rdd1 uses binary columnar RDD sum  = rdd1.map(i => i * 2) .reduce((x, y) => (x + y)) // CUDA __global__ void sample_map(int *inX, int *inY, int *outX, int *outY, long size) { long ix = threadIdx.x + blockIdx.x * blockDim.x; if (size <= ix) return; outX[ix] = inX[ix] * 2; outY[ix] = inY[ix] – 1; } // Spark mapFunction = new CUDAFunction(“sample_map", // CUDA method name Array("this.x", "this.y"), // input object  has two fields Array("this.x“, “this.y”), // output object has two fields this.getClass.getResource("/sample.ptx")) // ptx is generated by CUDA complier rdd1 = sc.parallelize(
).convert(ColumnFormat) // rdd1 uses binary columnar RDD rdd2 = rdd1.mapExtFunc(p => Pt(p.x*2, p.y‐1), mapFunction)
  • 24. How to Use GPU Exploitation version  Easy to install by one-liner and to run by one-liner – on x86_64, mac, and ppc64le with CUDA 7.0 or later with any JVM such as IBM JDK or OpenJDK  Run script for AWS EC2 is available, which support spot instances24 Exploting GPUs in Spark - Kazuaki Ishizaki $ wget https://s3.amazonaws.com/spark‐gpu‐public/spark‐gpu‐latest‐bin‐hadoop2.4.tgz && tar xf spark‐gpu‐latest‐bin‐hadoop2.4.tgz && cd spark‐gpu $ LD_LIBRARY_PATH=/usr/local/cuda/lib64 MASTER='local[2]' ./bin/run‐example SparkGPULR 8 3200 32 5 
 numSlices=8, N=3200, D=32, ITERATIONS=5                                          On iteration 1 On iteration 2 On iteration 3 On iteration 4 On iteration 5 Elapsed time: 431 ms $ Available at http://kiszk.github.io/spark-gpu/ ‱ 3 contributors ‱ Private communications with other developers
  • 25. Achieved 3.15x Performance Improvement by GPU  Ran naĂŻve implementation of logistic regression  Achieved 3.15x performance improvement of logistic regression over without GPU on a 16-core IvyBridge box with an NVIDIA K40 GPU card – We have rooms to improve performance 25 Exploting GPUs in Spark - Kazuaki Ishizaki Details are available at https://github.com/kiszk/spark-gpu/wiki/Benchmark Program parameters N=1,000,000 (# of points), D=400 (# of features), ITERATIONS=5 Slices=128 (without GPU), 16 (with GPU) MASTER=local[8] (without and with GPU) Hardware and software Machine: nx360 M4, 2 sockets 8‐core Intel Xeon E5‐2667 3.3GHz, 256GB memory, one NVIDIA K40m card OS: RedHat 6.6, CUDA: 7.0
  • 26.  Motivation & Goal  Introduction of GPUs  Design & New Components  Current Implementation  Performance Experiment  Future Direction in Spark 2.0 and beyond  Conclusion
  • 27. Comparisons among DataFrame, Dataset, and RDD  DataFrame (with relational operations) and Dataset (with lambda functions) use Catalyst and row-oriented data representation on off-heap 27 Exploting GPUs in Spark - Kazuaki Ishizaki ds = d.toDS() ds.filter(p => p.x>1) .count() 1 4 2 5 Java heap rdd = sc.parallelize(d) rdd.filter(p => p.x>1) .count() df = d.toDF(
) df.filter(”x>1”) .count() case class Pt(x: Int, y: Int) d = Array(Pt(1, 4), Pt(2, 5)) Frontend API 2 51 4 Off-heap Data DataFrame (v1.3-) Dataset (v1.6-) RDD (v0.5-) Catalyst Backend computation Generated Java bytecode Java bytecode in Spark program and runtime Row-oriented Row-oriented
  • 28. Design Concepts of Dataset and GPU Exploitation  Keep data as binary representation  Keep data on off-heap  Take advantages of Catalyst optimizer 28 Exploting GPUs in Spark - Kazuaki Ishizaki 2 51 4 Off-heap case class Pt(x: Int, y: Int) sc.parallelize(Array(Pt(1, 4),Pt(2, 5))) Comparison of data representations 2 51 4 Off-heap case class Pt(x: Int, y: Int) ds = (Pt(1, 4),Pt(2, 5)).toDS() How can we apply binary columnar and GPU enabler to Dataset? Dataset Binary columnar RDD Binary columnar also does GPU enabler could use Row-oriented Columnar
  • 29. GPU kernel launcher Column Encoder Binary Encoder In-memory storage Components in GPU Exploitation  Binary columnar – Columnar  In-memory storage keeps data in binary representation on off-heap or GPU memory  BinaryEncoder converts a data representation between a Java object and binary format  ColumnEncoder puts a set of data elements as column-oriented layout – Memory Manager  Manage off-heap and GPU memory  Columnar cache manages persistency of in-memory storage  GPU enabler – GPU kernel launcher  Launch kernels with data copy  Caching GPU binary for kernels – GPU code generator  Generate GPU code from Spark program 29 Exploting GPUs in Spark - Kazuaki Ishizaki Columnar cache GPU code generator Pre-compiled libraries for GPU Memory Manager Columnar GPU memory Off-heap memory
  • 30. Software Stack in Spark 2.0 and Beyond  Dataset will become a primary data structure for computation  Dataset keeps data in UnsafeRow on off-heap 30 Exploting GPUs in Spark - Kazuaki Ishizaki DataFrame Dataset Tungsten Catalyst Off-heap UnsafeRow User’s Spark program Logical optimizer CPU code generator
  • 31. Columnar with Dataset  Keep data in UnsafeRow or Columnar on off-heap, or Columnar on GPU device memory 31 Exploting GPUs in Spark - Kazuaki Ishizaki User’s Spark program DataFrame Dataset Tungsten Catalyst Off-heap UnsafeRow GPU device memory Columnar Logical optimizer Memory manager CPU code generator Columnar
  • 32. Two Approaches for Binary Columnar with Dataset  Binary Columnar as a first-class citizen – Better end-to-end performance in a job without conversion – Need more code changes to the existing source code  Binary Columnar as a cache in a task – Produce overhead of representation conversions between two tasks at shuffle – Need less code changes to the existing source code 32 Exploting GPUs in Spark - Kazuaki Ishizaki ds1 = d.toDS() ds2 = ds1.map(
) ds11 = ds3.groupby(
) ds3 = ds2.map(
) ds12 = ds11.map(
) As a first-class citizen task1 task2 As a cache shuffle
  • 33. GPU Support in Tungsten  According to Reynold’s talk (p. 25), Tungsten backend has a plan to enable GPU exploitation Exploiting GPUs in Spark - Kazuaki Ishizaki33
  • 34. GPU Enabler in Catalyst  Place GPU kernel launcher and GPU code generator into Catalyst 34 Exploting GPUs in Spark - Kazuaki Ishizaki User’s Spark program DataFrame Dataset Tungsten Catalyst Off-heap UnsafeRow GPU device memory Columnar Logical optimizer Memory manager CPU code generator GPU code generatorGPU kernel launcher Columnar
  • 35. Future Direction  Do refactoring to make current implementation decomposable – Some components exist in one Scala file  Make pull requests for each component – to support columnar Dataset – to exploit GPUs 35 Exploting GPUs in Spark - Kazuaki Ishizaki Memory Manager Columnar Binary encoder Column encoder In-memory storage Memory manager Cache manager As a cache in task As a first- class citizen Multiple backend support CPU code generator for Columnar CPU code generator for Columnar GPU kernel launcher Column Encoder Binary Encoder In-memory storageColumnar cache GPU code generator GPU memory Off-heap memory Roadmap for pull requests Off-heap Catalyst
  • 36. Takeaway  Accelerate a Spark application by using GPUs effectively and transparently  Devised two New components – Binary columnar to alleviate overhead for GPU exploitation – GPU enabler to manage GPU kernel execution from a Spark program  Call pre-compiled libraries for GPU  Generate GPU native code at runtime  Available at http://kiszk.github.io/spark-gpu/ 36 Component Initial design (Spark 1.3-1.5) Current status (Spark 2.0-Snapshot) Future (Spark 2.x) Binary columnar with RDD with RDD with Dataset GPU enabler launch GPU kernels generate GPU native code launch GPU kernels generate GPU native code in Catalyst Exploting GPUs in Spark - Kazuaki Ishizaki Appreciate any your feedback and contributions