CPU technologies have scaled well in past years, by more complex architecture design, more wide execution pipelines, more cores in same processor, and higher frequency. However accelerators show more computational power and higher throughput with lower cost in dedicated area, which leads to more usages in Spark. But when we integrate accelerators in Spark a common case is huge performance promises through micro test with little performance boost actually we get. One reason is the cost of data transfer between JVM and accelerator. The other reason is the accelerator lack the information how it's used in Spark. In this research, we investigate the usage of apache arrow based dataframe as the unified data sharing and transferring way between CPU and accelerators, and make it dataframe aware when we design hardware and software stack. In this way we seamlessly integrate Spark and Accelerators design and get close to promised performance.
2. Binwei Yang, Intel
Carson Wang, Intel
Apache Arrow* Based
Unified Data Exchange
#UnifiedAnalytics #SparkAISummit
3. Me
• 13 years of experience on performance analysis
• Software -> CPU simulator -> Spark
• Join Intel Spark team in Aug. 2018
• A “layman” of Apache Spark
3
4. Pursuit of Performance Is Endless
• Intel® 2nd Gen Xeon® Scalable Processors
• Intel® Optane™ DC persistent memory
• Intel® FPGA
• Software optimization
4
9. Overhead of Offload
9
Internal Row
FPGA Batch
FPGA DMA RX FPGA Engine
Internal Row
FPGA Batch
FPGA DMA TX
CPU
FPGA
Convert
Data Move
10. FPGA BatchFPGA Batch
Optimize – Unified Format
10
Unified Format
FPGA DMA RX FPGA Engine
Unified Format
FPGA DMA TX
CPU
FPGA
• Unified format FPAG can easily debug
• FPGA library can be shared with all other projects
11. FPGA BatchFPGA Batch
Optimize – Double Buffer
11
Unified Format
FPGA DMA RX1
FPGA Engine
Unified Format
CPU
FPGA
FPGA DMA RX2
FPGA DMA RX1
FPGA DMA RX2
12. Optimize – Double Buffer
12
Time
Col1
Eng 1
Col2
Eng 2
Col3
Eng 3
Col…
Eng …
• Columnar data format is
friendly to most of
accelerator
13. Do We Fully Utilize CPU?
13
df.agg(F.sum(‘a_float')).show()
perf stat -e fp_arith_inst_retired.128b_packed_single -A -a sleep 1
CPU0 0 fp_arith_inst_retired.128b_packed_single
CPU1 0 fp_arith_inst_retired.128b_packed_single
CPU2 0 fp_arith_inst_retired.128b_packed_single
…
14. Add AVX Support
14
• We need
– A columnar data format
– Native LLVM SQL Engine
• Take use of other highly optimized libraries
15. Recap
15
• A standard columnar data format
– Easily debug
– Shared by all projects
• Implement a serial of Tungsten backends
16. Apache Arrow* Is the Answer
16
• Apache Arrow* is the best choice
• A standard data frame format
– For Native Tungsten backend
– For all accelerators offloading Spark SQL engine
*Other names and brands may be claimed as the property of others.
17. Plug and Play Backend
17
op1 op2 op3 op4
Python
UDF
Data Frame Physical Plan
Tungsten Backend
JVM
LLVM
AVX
ACC1 ACC2 Intel
Python
Off-Heap Python
>>> >>> >>>
18. Take Use of Intel Optane DC Persistent
Memory
18
op1 op2 op3 op4
Data Frame Physical Plan
Tungsten Backend
JVM
LLVM
AVX
ACC ACC
Off-Heap
>>> >>>
19. Take Use of Intel Optane DC Persistent
Memory
19
op2 op3 op4
Data Frame Physical Plan
Tungsten Backend
LLVM
AVX
ACC ACC
Off-Heap
>>> >>>Shuffle
Input
22. Connect Other ML/AI Framework
22
• The proposal of JIRA 24579
• No extra data format convert
23. Call to Action
• Share your comments on JIRA 27396 created by
Robert
• Follow our work on https://github.com/Intel-
bigdata
• Let’s bring Spark’s performance to higher level
23#UnifiedAnalytics #SparkAISummit
24. DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT