SlideShare une entreprise Scribd logo
1  sur  19
Spark Shuffle Deep Dive
Bo Yang
Content
• Overview
• Major Classes
• Shuffle Writer
• Spark Serializer
• Shuffle Reader
• External Shuffle Service
• Suggestions
Shuffle Overview
Mapper 1
Orange 3
Apple 2
Peach 5
Pear 1
Mapper 2
Peach 3
Banana 2
Grape 5
Reducer 1
Apple 2
Peach 8
Pear 1
Reducer 2
Grape 5
Orange 3
Reducer 3
Banana 2
High Level Abstraction
• Pluggable Interface: ShuffleManager
• registerShuffle(…)
• getWriter(…)
• getReader(…)
• Configurable: spark.shuffle.manager=xxx
• Mapper: ShuffleWriter
• write(records: Iterator)
• Reducer: ShuffleReader
• read(): Iterator
Implementations
• SortShuffleManager (extends ShuffleManager)
• Three Writers (optimized for different scenarios)
• SortShuffleWriter: uses ExternalSorter
• BypassMergeSortShuffleWriter: no sorter
• UnsafeShuffleWriter: uses ShuffleExternalSorter
• One Reader
• BlockStoreShuffleReader, uses
• ExternalAppendOnlyMap
• ExternalSorter (if ordering)
Writer Output Example (Shuffle Files)
Mapper 1
Data File
Index File
Reducer 1 Reducer 2 Reducer 3
Offset 1
Partition 1
Partition 2
Partition 3
Offset 2
Offset 3
Mapper 2
Data File
Index File
Offset 1
Partition 1
Partition 2
Partition 3
Offset 2
Offset 3
Number of Partitions == Number of Reducers
Three Shuffle Writers
• Different Writer Algorithms
• SortShuffleWriter
• BypassMergeSortShuffleWriter
• UnsafeShuffleWriter
• Used in different situations (optimizations)
• Things to consider
• Reduce total number of files
• Reduce serialization/deserialization when possible
When Different Writers Are Used?
• Small number of partitions?
---> BypassMergeSortShuffleWriter
• Able to sort record in serialized form?
---> UnsafeShuffleWriter
• Otherwise
---> SortShuffleWriter
BypassMergeSortShuffleWriter
One file for each partition, then merge them
Mapper
BypassMergeSort
ShuffleWriter
Temp File: Partition 0
…
Temp File: Partition X
Index File
Data File
merge
Temp File: Partition 1
write
BypassMergeSortShuffleWriter (cont’d)
Used when
• No map side combine
• Number of partitions < spark.shuffle.sort.bypassMergeThreshold
Pros
• Simple
Cons
• 1 to 1 mapping between temp file and partition
• Many temp files
SortShuffleWriter
• Why sort?
• Sort records by PartitionId, to separate records by different partitions
• Reduce number of files: number of spill files < number of partitions
• Buffer (in memory):
• PartitionedAppendOnlyMap (when there is map side combine)
• PartitionedPairBuffer (when there is no map side combine)
Mapper
SortShuffleWriter
ExternalSorter Buffer
Spill File (Sorted)
…
Spill File (Sorted)
Index File
Data File
merge
SortShuffleWriter (cont’d)
Used when
• Has map side combine, or, many partitions
• Serializer supports record relocation
Pros
• Flexible, support all shuffle situations
Cons
• Serialize/deserialize multiple times
Internal configure to control spill behavior
(inside Spillable.scala):
spark.shuffle.spill.initialMemoryThreshold
spark.shuffle.spill.numElementsForceSpillThreshold
UnsafeShuffleWriter
• Record serialized once, then stored in memory pages
• 8 bytes record pointer (pointing to: memory page + offset)
• All record pointers stored in a long array
• Sort record pointers (long array)
• Small memory footprint
• Better fit CPU cache
• Sorter class: ShuffleExternalSorter
Memory
Page 1
Memory
Page 2
Memory
Page xxx
Record 1 (8 bytes)
Record 2 (8 bytes)
…
Store/Sort as
Array
UnsafeShuffleWriter (cont’d)
Used when
• Serializer supports record relocation
• No aggregator
Pros
• Single serialization, no deserialization/serialization for merging spill files
• Sorting is CPU cache friendly
Cons
• Not supported when using default serializer (JavaSerializer), supported
when using KryoSerializer
Serializer: JavaSerializer
• Default serializer in Spark
• spark.serializer=org.apache.spark.serializer.JavaSerializer
• Use object reference in serialized stream
• Write reference instead of whole object for repeated (same) object
• Not support record relocation
• Cannot move record in serialized stream due to object reference
• Pros: support serialization in all situations
• Cons: performance not good
Serializer: KryoSerializer
• Use kryo library
• Not use object reference in serialized stream by default
• Support record relocation
• Because there is no object reference, and each serialized object is independent
• Need to explicitly register classes for serialization, otherwise, it will write
fully qualified class name for each serialized object
• Pros: performance is good for common classes and registered classes (see
KryoSerializer.scala
• Cons: performance is bad for custom classes if not registered, need to
explicitly register them
Shuffle Reader: BlockStoreShuffleReader
Mapper 1
Data File
Index File
Reducer: BlockStoreShuffleReader
Offset 1
Partition 1
Partition 2
Partition 3
Offset 2
Offset 3
Mapper 2
Data File
Index File
Offset 1
Partition 1
Partition 2
Partition 3
Offset 2
Offset 3
Aggregator
ExternalAppend
OnlyMap
Spill File
…
Spill File
Iterator
Use:
HashComparator ExternalSorter
Iterator
If ordering
by key
External Shuffle Service
• YarnShuffleService / MesosExternalShuffleService
• YarnShuffleService: running inside YARN Node Manager as an
AuxiliaryService
• Run on each machine in YARN/Mesos cluster
• Get shuffle files from local disk and stream to reducers
• Use file name convention to locate shuffle files
(ExternalShuffleBlockResolver)
• "shuffle_" + shuffleId + "_" + mapId + "_0.index”
• "shuffle_" + shuffleId + "_" + mapId + "_0.data"
Suggestions / Takeaway
• Shuffle is expensive, avoid unnecessary shuffle
• Shuffle vs Cache (Dataset.persist(…))
• Shuffle files provide full data set for next stage execution
• Cache may not necessary when there is shuffle (unless want cache replicas)
• Use KryoSerializer if possible
• Tune different configures
• spark.shuffle.sort.bypassMergeThreshold
• spark.shuffle.spill.initialMemoryThreshold
• spark.shuffle.spill.numElementsForceSpillThreshold

Contenu connexe

Tendances

Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
 
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...Databricks
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangDatabricks
 
Flash for Apache Spark Shuffle with Cosco
Flash for Apache Spark Shuffle with CoscoFlash for Apache Spark Shuffle with Cosco
Flash for Apache Spark Shuffle with CoscoDatabricks
 
Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache SparkDatabricks
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLDatabricks
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Databricks
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceZeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceDatabricks
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsAnton Kirillov
 
Understanding and Improving Code Generation
Understanding and Improving Code GenerationUnderstanding and Improving Code Generation
Understanding and Improving Code GenerationDatabricks
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark Summit
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationDatabricks
 
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...Spark Summit
 
Spark SQL Join Improvement at Facebook
Spark SQL Join Improvement at FacebookSpark SQL Join Improvement at Facebook
Spark SQL Join Improvement at FacebookDatabricks
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupDatabricks
 
Why your Spark job is failing
Why your Spark job is failingWhy your Spark job is failing
Why your Spark job is failingSandy Ryza
 
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaDatabricks
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationshadooparchbook
 

Tendances (20)

Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
Flash for Apache Spark Shuffle with Cosco
Flash for Apache Spark Shuffle with CoscoFlash for Apache Spark Shuffle with Cosco
Flash for Apache Spark Shuffle with Cosco
 
Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache Spark
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceZeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
 
Understanding and Improving Code Generation
Understanding and Improving Code GenerationUnderstanding and Improving Code Generation
Understanding and Improving Code Generation
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical Optimization
 
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
 
Spark SQL Join Improvement at Facebook
Spark SQL Join Improvement at FacebookSpark SQL Join Improvement at Facebook
Spark SQL Join Improvement at Facebook
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
 
Why your Spark job is failing
Why your Spark job is failingWhy your Spark job is failing
Why your Spark job is failing
 
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 

Similaire à Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark

NSLogger - Cocoaheads Paris Presentation - English
NSLogger - Cocoaheads Paris Presentation - EnglishNSLogger - Cocoaheads Paris Presentation - English
NSLogger - Cocoaheads Paris Presentation - EnglishFlorent Pillet
 
HPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaHPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaTed Dunning
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDBMongoDB
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Lucidworks
 
PigHive presentation and hive impor.pptx
PigHive presentation and hive impor.pptxPigHive presentation and hive impor.pptx
PigHive presentation and hive impor.pptxRahul Borate
 
Elasticsearch Data Analyses
Elasticsearch Data AnalysesElasticsearch Data Analyses
Elasticsearch Data AnalysesAlaa Elhadba
 
Azure storage deep dive
Azure storage deep diveAzure storage deep dive
Azure storage deep diveYves Goeleven
 
azure track -04- azure storage deep dive
azure track -04- azure storage deep diveazure track -04- azure storage deep dive
azure track -04- azure storage deep diveITProceed
 
Hibernate ORM: Tips, Tricks, and Performance Techniques
Hibernate ORM: Tips, Tricks, and Performance TechniquesHibernate ORM: Tips, Tricks, and Performance Techniques
Hibernate ORM: Tips, Tricks, and Performance TechniquesBrett Meyer
 
Is your Elastic Cluster Stable and Production Ready?
Is your Elastic Cluster Stable and Production Ready?Is your Elastic Cluster Stable and Production Ready?
Is your Elastic Cluster Stable and Production Ready?DoiT International
 
Driver development – memory management
Driver development – memory managementDriver development – memory management
Driver development – memory managementVandana Salve
 
Introduction to Hive and HCatalog
Introduction to Hive and HCatalogIntroduction to Hive and HCatalog
Introduction to Hive and HCatalogmarkgrover
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)mundlapudi
 
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...DataStax Academy
 
AOS Lab 10: File system -- Inodes and beyond
AOS Lab 10: File system -- Inodes and beyondAOS Lab 10: File system -- Inodes and beyond
AOS Lab 10: File system -- Inodes and beyondZubair Nabi
 
Bigdata and Hadoop
 Bigdata and Hadoop Bigdata and Hadoop
Bigdata and HadoopGirish L
 

Similaire à Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark (20)

NSLogger - Cocoaheads Paris Presentation - English
NSLogger - Cocoaheads Paris Presentation - EnglishNSLogger - Cocoaheads Paris Presentation - English
NSLogger - Cocoaheads Paris Presentation - English
 
HPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaHPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with Katta
 
Logstash
LogstashLogstash
Logstash
 
PigHive.pptx
PigHive.pptxPigHive.pptx
PigHive.pptx
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDB
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
 
PigHive.pptx
PigHive.pptxPigHive.pptx
PigHive.pptx
 
PigHive presentation and hive impor.pptx
PigHive presentation and hive impor.pptxPigHive presentation and hive impor.pptx
PigHive presentation and hive impor.pptx
 
Elasticsearch Data Analyses
Elasticsearch Data AnalysesElasticsearch Data Analyses
Elasticsearch Data Analyses
 
Azure storage deep dive
Azure storage deep diveAzure storage deep dive
Azure storage deep dive
 
azure track -04- azure storage deep dive
azure track -04- azure storage deep diveazure track -04- azure storage deep dive
azure track -04- azure storage deep dive
 
Hibernate ORM: Tips, Tricks, and Performance Techniques
Hibernate ORM: Tips, Tricks, and Performance TechniquesHibernate ORM: Tips, Tricks, and Performance Techniques
Hibernate ORM: Tips, Tricks, and Performance Techniques
 
Is your Elastic Cluster Stable and Production Ready?
Is your Elastic Cluster Stable and Production Ready?Is your Elastic Cluster Stable and Production Ready?
Is your Elastic Cluster Stable and Production Ready?
 
Driver development – memory management
Driver development – memory managementDriver development – memory management
Driver development – memory management
 
Introduction to Hive and HCatalog
Introduction to Hive and HCatalogIntroduction to Hive and HCatalog
Introduction to Hive and HCatalog
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)
 
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
 
AOS Lab 10: File system -- Inodes and beyond
AOS Lab 10: File system -- Inodes and beyondAOS Lab 10: File system -- Inodes and beyond
AOS Lab 10: File system -- Inodes and beyond
 
Bigdata and Hadoop
 Bigdata and Hadoop Bigdata and Hadoop
Bigdata and Hadoop
 
Buffer overflow
Buffer overflowBuffer overflow
Buffer overflow
 

Dernier

AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdfKamal Acharya
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...drmkjayanthikannan
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEselvakumar948
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesRAJNEESHKUMAR341697
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdfKamal Acharya
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxSCMS School of Architecture
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdfKamal Acharya
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"mphochane1998
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiessarkmank1
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxmaisarahman1
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxSCMS School of Architecture
 

Dernier (20)

AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 

Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark

  • 1. Spark Shuffle Deep Dive Bo Yang
  • 2. Content • Overview • Major Classes • Shuffle Writer • Spark Serializer • Shuffle Reader • External Shuffle Service • Suggestions
  • 3. Shuffle Overview Mapper 1 Orange 3 Apple 2 Peach 5 Pear 1 Mapper 2 Peach 3 Banana 2 Grape 5 Reducer 1 Apple 2 Peach 8 Pear 1 Reducer 2 Grape 5 Orange 3 Reducer 3 Banana 2
  • 4. High Level Abstraction • Pluggable Interface: ShuffleManager • registerShuffle(…) • getWriter(…) • getReader(…) • Configurable: spark.shuffle.manager=xxx • Mapper: ShuffleWriter • write(records: Iterator) • Reducer: ShuffleReader • read(): Iterator
  • 5. Implementations • SortShuffleManager (extends ShuffleManager) • Three Writers (optimized for different scenarios) • SortShuffleWriter: uses ExternalSorter • BypassMergeSortShuffleWriter: no sorter • UnsafeShuffleWriter: uses ShuffleExternalSorter • One Reader • BlockStoreShuffleReader, uses • ExternalAppendOnlyMap • ExternalSorter (if ordering)
  • 6. Writer Output Example (Shuffle Files) Mapper 1 Data File Index File Reducer 1 Reducer 2 Reducer 3 Offset 1 Partition 1 Partition 2 Partition 3 Offset 2 Offset 3 Mapper 2 Data File Index File Offset 1 Partition 1 Partition 2 Partition 3 Offset 2 Offset 3 Number of Partitions == Number of Reducers
  • 7. Three Shuffle Writers • Different Writer Algorithms • SortShuffleWriter • BypassMergeSortShuffleWriter • UnsafeShuffleWriter • Used in different situations (optimizations) • Things to consider • Reduce total number of files • Reduce serialization/deserialization when possible
  • 8. When Different Writers Are Used? • Small number of partitions? ---> BypassMergeSortShuffleWriter • Able to sort record in serialized form? ---> UnsafeShuffleWriter • Otherwise ---> SortShuffleWriter
  • 9. BypassMergeSortShuffleWriter One file for each partition, then merge them Mapper BypassMergeSort ShuffleWriter Temp File: Partition 0 … Temp File: Partition X Index File Data File merge Temp File: Partition 1 write
  • 10. BypassMergeSortShuffleWriter (cont’d) Used when • No map side combine • Number of partitions < spark.shuffle.sort.bypassMergeThreshold Pros • Simple Cons • 1 to 1 mapping between temp file and partition • Many temp files
  • 11. SortShuffleWriter • Why sort? • Sort records by PartitionId, to separate records by different partitions • Reduce number of files: number of spill files < number of partitions • Buffer (in memory): • PartitionedAppendOnlyMap (when there is map side combine) • PartitionedPairBuffer (when there is no map side combine) Mapper SortShuffleWriter ExternalSorter Buffer Spill File (Sorted) … Spill File (Sorted) Index File Data File merge
  • 12. SortShuffleWriter (cont’d) Used when • Has map side combine, or, many partitions • Serializer supports record relocation Pros • Flexible, support all shuffle situations Cons • Serialize/deserialize multiple times Internal configure to control spill behavior (inside Spillable.scala): spark.shuffle.spill.initialMemoryThreshold spark.shuffle.spill.numElementsForceSpillThreshold
  • 13. UnsafeShuffleWriter • Record serialized once, then stored in memory pages • 8 bytes record pointer (pointing to: memory page + offset) • All record pointers stored in a long array • Sort record pointers (long array) • Small memory footprint • Better fit CPU cache • Sorter class: ShuffleExternalSorter Memory Page 1 Memory Page 2 Memory Page xxx Record 1 (8 bytes) Record 2 (8 bytes) … Store/Sort as Array
  • 14. UnsafeShuffleWriter (cont’d) Used when • Serializer supports record relocation • No aggregator Pros • Single serialization, no deserialization/serialization for merging spill files • Sorting is CPU cache friendly Cons • Not supported when using default serializer (JavaSerializer), supported when using KryoSerializer
  • 15. Serializer: JavaSerializer • Default serializer in Spark • spark.serializer=org.apache.spark.serializer.JavaSerializer • Use object reference in serialized stream • Write reference instead of whole object for repeated (same) object • Not support record relocation • Cannot move record in serialized stream due to object reference • Pros: support serialization in all situations • Cons: performance not good
  • 16. Serializer: KryoSerializer • Use kryo library • Not use object reference in serialized stream by default • Support record relocation • Because there is no object reference, and each serialized object is independent • Need to explicitly register classes for serialization, otherwise, it will write fully qualified class name for each serialized object • Pros: performance is good for common classes and registered classes (see KryoSerializer.scala • Cons: performance is bad for custom classes if not registered, need to explicitly register them
  • 17. Shuffle Reader: BlockStoreShuffleReader Mapper 1 Data File Index File Reducer: BlockStoreShuffleReader Offset 1 Partition 1 Partition 2 Partition 3 Offset 2 Offset 3 Mapper 2 Data File Index File Offset 1 Partition 1 Partition 2 Partition 3 Offset 2 Offset 3 Aggregator ExternalAppend OnlyMap Spill File … Spill File Iterator Use: HashComparator ExternalSorter Iterator If ordering by key
  • 18. External Shuffle Service • YarnShuffleService / MesosExternalShuffleService • YarnShuffleService: running inside YARN Node Manager as an AuxiliaryService • Run on each machine in YARN/Mesos cluster • Get shuffle files from local disk and stream to reducers • Use file name convention to locate shuffle files (ExternalShuffleBlockResolver) • "shuffle_" + shuffleId + "_" + mapId + "_0.index” • "shuffle_" + shuffleId + "_" + mapId + "_0.data"
  • 19. Suggestions / Takeaway • Shuffle is expensive, avoid unnecessary shuffle • Shuffle vs Cache (Dataset.persist(…)) • Shuffle files provide full data set for next stage execution • Cache may not necessary when there is shuffle (unless want cache replicas) • Use KryoSerializer if possible • Tune different configures • spark.shuffle.sort.bypassMergeThreshold • spark.shuffle.spill.initialMemoryThreshold • spark.shuffle.spill.numElementsForceSpillThreshold

Notes de l'éditeur

  1. ExternalAppendOnlyMap ExternalSorter (if ordering)