1. GPU support in Spark allows for accelerating Spark applications by offloading compute-intensive tasks to GPUs. However, production deployments face challenges like low resource utilization and overload when scheduling mixed GPU and CPU workloads.
2. The presentation proposes solutions like recognizing GPU tasks to optimize the DAG and inserting new GPU stages. It also discusses policies for prioritizing and allocating GPU and CPU resources independently through multi-dimensional scheduling.
3. Evaluation shows the ALS Spark example achieving speedups on GPUs. IBM Spectrum Conductor provides a Spark-centric shared service with fine-grained resource scheduling, reducing wait times and improving utilization across shared GPU and CPU resources.
Right Money Management App For Your Financial Goals
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
1. GPU Support in Spark and
GPU/CPU Mixed Resource
Scheduling at Production Scale
Yonggang Hu, IBM, Distinguished Engineer
Junfeng Liu, IBM, Architect
2. About us
• Yonggang Hu
Distinguished Engineer, IBM
Chief Architect at Spectrum Computing, IBM.
Vice President and Application Architect at JPMorgan Chase
Working on distributed computing, grid, cloud and big data for the
past 20 years.
• Junfeng Liu
IBM Spectrum Computing Architect, focusing on Big data
platform design and implementation. Successfully delivering
solutions to key customers.
2
3. Agenda
• GPU and Spark integration motivation
• The challenges of production deployments
• Solutions
• Demo
• IBM® Spectrum Conductor with Spark
3
4. Spark and GPU
Spark apps are
CPU intensive
Need to handle
more data and
bigger models
Machine Learning
Predictive analytics,
Logistic regression, ALS
Kmeans, etc.
Graph Analytics
Security, Fraud Detection
Social Network Analytics
GraphX
Video/Speech Analytics
Object Recognition
Dialog
Financial Risk Analytics
Market simulation
Credit risk, home-grown,
apps from Murex, Misys
Spark-enable existing GPU appsGPU-enable Spark apps 4
5. Various ways to integrate Spark and GPUs
• Use GPUs for accelerating Spark Libraries and operations without
changing interfaces and programming model
– IBM Open source project: https://github.com/IBMSparkGPU
– Nvidia GTC 2016 talk,
• http://on-demand.gputechconf.com/gtc/2016/presentation/s6280-rajesh-bordawekar-accelerating-spark.pdf
• http://on-demand.gputechconf.com/gtc/2016/video/S6211.html
• Automatically generate CUDA code from the source Spark Java code
– Nvidia GTC 2016 talk: http://on-demand.gputechconf.com/gtc/2016/presentation/s6346-kazuaki-
ishizaki-gpu-programming-java-programmers.pdf
• Integrate Spark with GPU-enabled application and system
(e.g., Spark integrated with Caffe, TensorFlow and customer
applications)
5
6. Production challenges
• However
– Identification of GPU execution vs. CPU
execution in DAG
– Data preparation for GPU execution
– Low resource utilization for CPU or GPU or
both
• Cannot assume all compute hosts are identical and
have GPU resource available
• GPU is a lot more expensive !!!
– Overload and contention when running mixed
GPU and CPU workload
– Long tail and GPU / CPU tasks failover
– Monitoring and management
Stage 2Stage 1
collect
reduceByKey
Stage GPU
GPU Group CPU Group
6
7. A typical example
Personalized Medicine – Adverse Drug Reaction Workload
- 30X faster at learning speed and 4X speed up end-to-end
- Need to fully utilize both GPU and CPU resources to get economic benefits
7
Source: http://on-demand.gputechconf.com/gtc/2016/presentation/s6280-rajesh-bordawekar-accelerating-spark.pdf
4X is
not
enough
8. Scheduling granularity
• Scheduling at application level
– Mesos and YARN tag the GPU machine with label
– Schedule the application on GPU hosts based on resource requirement of application
– Coarse-grained scheduling leads to low utilization of CPU/GPU
• Scheduling at DAG level
– Need fine-grained sharing for GPU resources rather than reserving entire GPU
machines
– Identify GPU operation
– Optimize the DAG tree by decoupling GPU operations from CPU operations and by
inserting new GPU stages
– Reduce GPU wait time, enable sharing of GPUs among different jobs and therefore
improve the overall GPU utilization
8
9. GPU task recognition
• GPU and CPU tasks mixed together
• Need to separate the workloads for scheduling control
GPUFunction()
Python-C Wrapper to
Invoke Native
Function
Function
implemented by
CUDA/OpenCL
GPU libraryPython-C/C++ Wrapper
9
10. GPU task recognition
• Mark the GPU workload by DAG
operation
– Go through the DAG tree to identify
stages with GPU requirement
– Optimize the distribution by inserting
GPU stage
10
11. Policies
• Resource manager needs to be able to identify the GPU
hosts and manage along with CPU resources
• Prioritization policy - share GPU resources among
applications
• Allocation policy – control GPU and CPU allocation
independently – multi-dimensional scheduling
• Fine-grained policy to schedule tasks according to
GPU-optimized DAG plan
11
12. Adaptive scheduling
• CPU and GPU tasks are convertible in
many applications
• Scheduling needs adaptive capability
– If GPU is available, use a portion of GPU
– Otherwise run rest of tasks on CPU
dstInBlocks.join(merged).mapV
alues {
….
if (useGPU) {
loadGPULib()
callGPU ()
}
else {
//CPU version
}
}
12
14. Efficiency considerations
• Do we need to wait for a GPU resource if there
is a CPU available?
• Do we need rerun the CPU tasks on GPU if
tasks on CPU are long-tail?
• Do we need to have failover cross-resource
type?
14
15. Defer scheduling
• Traditional defer scheduling
– Wait for data locality
– Cache, Host, Rack
• Resource-based defer scheduling
– Necessary if the GPU can greatly speed up task execution
– Wait time is acceptable
15
16. Evaluation
• Machine Learning – ALS on GPU
– IBM Spark on GPU project
– Various ML implementation on GPU
– https://github.com/IBMSparkGPU
• MovieLensALS Example
– Spark Example, without modification
– Binary compatible with Spark Application
– Data Source
• http://grouplens.org/datasets/movielens/
18. Future work
• Global optimization
– Consider the cost of additional shuffle stage
– Consider data locality of CPU and GPU stages
– Add time dimension to MDS (multi-dimensional
scheduling)
– Optimize global DAG tree execution
– Use historical data to optimize future execution, e.g,
future iteration
18
19. Fine grain, dynamic allocation of
resources maximizes efficiency of Spark
instances sharing a common resource
pool. Multi-tenant, multi-framework
support. Eliminates cluster sprawl.
2
Run Spark natively on a shared
infrastructure without the dependency of
Hadoop. Reduce application wait time,
improving time to results.
1
Building a Spark-Centric Shared Service
with IBM spectrum Conductor
End-to-End Enterprise Class Solution
Improve Time to Results
Proven architecture at extreme scale, with
enterprise class workload management,
multi-version support for Spark,
monitoring, reporting, and security
capabilities.
3
Reduce Administration Costs
Increase Resource Utilization
• IBM STC Spark Distribution
• IBM Platform Resource Orchestrator /
Session Scheduler, application service
manager.
• IBM Spectrum Scale FPO
4
19
http://www-03.ibm.com/systems/spectrum-computing/products/conductor/
20. IBM Spectrum Conductor with Spark
IBM Bluemix Spark Cloud Service in
production – thousands of users and
tenants
Third party audited benchmark
indicated significant performance /
throughput / SLA advantages
https://ibm.biz/Bd4Tyw
20
21. IBM Spectrum Conductor with Spark
Monitor and Reporting with Elastic (ELK)
Integrated Elastic Search, Logstash, Kibana for customizable monitoring
Built-in monitoring Metrics
Cross Spark Instance Groups
Cross Spark Applications within Spark Instance Group
Within Spark Application
Built-in monitoring inside Zeppelin Notebook
21
23. THANK YOU.
Visit us at the IBM booth in the expo
www.ibm.com/spectrum-conductor
yhu@ca.ibm.com
jfliu@ca.ibm.com
Notes de l'éditeur
The synergy is coming from two directions.
There are many existing GPU applications, such as deep learning apps – Caffe, Torch, commercial or home grown apps doing risk calculation etc, They could leverage Spark to support data distribution and data parallelization of computation on multiple GPU card.
On the other hand, it is already well known that many Spark apps are compute intensive – within each Spark task there are many parallizable elements – Instead of running them on CPU, running them on GPU will greatly improve the performance. For example, we have implemented some of Mllib functions on GPU and achieve great speed. Also if each node does more work, we can reduce communication and network traffic.
IBM is actively working on and contributing to all three areas.
Rajesh had a very talk on our work to accelerate Spark Mllib using GPU without changing Spark interface. Wei Tan presentation demonstrate how to use GPU to accelerate ALS.
Our Japan team and Canada team worked on this thread. They had a talk at GTC 2016, and showed how to program GPU in pure Java using new parallel streaming API and JIT Cuda code generation.
How to effectively map Spark partitions to GPU nodes, then to make sure data will fit into GPU memory and move necessary data to GPU.
Data might need to shuffled and certain data format (columnar format) is more suitable for GPU access.
Spark memory manager needs to consider GPU memory as well.
Spark is not GPU aware – when it creates DAG, when it distributes partition, manages memory. There is no way you can monitor and control it.
How many GPUs are there, how GPUs are being used, utilization, etc.?
Builds a LBFGS – based logistic regression model to classify drug pairs