SlideShare une entreprise Scribd logo
1  sur  45
Télécharger pour lire hors ligne
A Deeper Understanding of Spark’s Internals
Aaron Davidson"
07/01/2014
This Talk
•  Goal: Understanding how Spark runs, focus
on performance
•  Major core components:
– Execution Model
– The Shuffle
– Caching
This Talk
•  Goal: Understanding how Spark runs, focus
on performance
•  Major core components:
– Execution Model
– The Shuffle
– Caching
Why understand internals?
Goal: Find number of distinct names per “first letter”

sc.textFile(“hdfs:/names”)
.map(name => (name.charAt(0), name))
.groupByKey()
.mapValues(names => names.toSet.size)
.collect()
Why understand internals?
Goal: Find number of distinct names per “first letter”

sc.textFile(“hdfs:/names”)
.map(name => (name.charAt(0), name))
.groupByKey()
.mapValues(names => names.toSet.size)
.collect()
Andy
Pat
Ahir
Why understand internals?
Goal: Find number of distinct names per “first letter”

sc.textFile(“hdfs:/names”)
.map(name => (name.charAt(0), name))
.groupByKey()
.mapValues(names => names.toSet.size)
.collect()
Andy
Pat
Ahir
(A, Andy)
(P, Pat)
(A, Ahir)
Why understand internals?
Goal: Find number of distinct names per “first letter”

sc.textFile(“hdfs:/names”)
.map(name => (name.charAt(0), name))
.groupByKey()
.mapValues(names => names.toSet.size)
.collect()
Andy
Pat
Ahir
(A, [Ahir, Andy])
 (P, [Pat])
(A, Andy)
(P, Pat)
(A, Ahir)
Why understand internals?
Goal: Find number of distinct names per “first letter”

sc.textFile(“hdfs:/names”)
.map(name => (name.charAt(0), name))
.groupByKey()
.mapValues(names => names.toSet.size)
.collect()
Andy
Pat
Ahir
(A, [Ahir, Andy])
 (P, [Pat])
(A, Andy)
(P, Pat)
(A, Ahir)
Why understand internals?
Goal: Find number of distinct names per “first letter”

sc.textFile(“hdfs:/names”)
.map(name => (name.charAt(0), name))
.groupByKey()
.mapValues(names => names.toSet.size)
.collect()
Andy
Pat
Ahir
(A, [Ahir, Andy])
 (P, [Pat])
(A, Set(Ahir, Andy))
 (P, Set(Pat))
(A, Andy)
(P, Pat)
(A, Ahir)
Why understand internals?
Goal: Find number of distinct names per “first letter”

sc.textFile(“hdfs:/names”)
.map(name => (name.charAt(0), name))
.groupByKey()
.mapValues(names => names.toSet.size)
.collect()
Andy
Pat
Ahir
(A, [Ahir, Andy])
 (P, [Pat])
(A, 2)
 (P, 1)
(A, Andy)
(P, Pat)
(A, Ahir)
Why understand internals?
Goal: Find number of distinct names per “first letter”

sc.textFile(“hdfs:/names”)
.map(name => (name.charAt(0), name))
.groupByKey()
.mapValues(names => names.toSet.size)
.collect()
Andy
Pat
Ahir
(A, [Ahir, Andy])
 (P, [Pat])
(A, 2)
 (P, 1)
(A, Andy)
(P, Pat)
(A, Ahir)
res0 = [(A, 2), (P, 1)]
Spark Execution Model
1.  Create DAG of RDDs to represent
computation
2.  Create logical execution plan for DAG
3.  Schedule and execute individual tasks
Step 1: Create RDDs
sc.textFile(“hdfs:/names”)
map(name => (name.charAt(0), name))
groupByKey()
mapValues(names => names.toSet.size)
collect()
Step 1: Create RDDs
HadoopRDD
map()
groupBy()
mapValues()
collect()
Step 2: Create execution plan
•  Pipeline as much as possible
•  Split into “stages” based on need to
reorganize data
Stage 1
 HadoopRDD
map()
groupBy()
mapValues()
collect()
Andy
Pat
Ahir
(A, [Ahir, Andy])
 (P, [Pat])
(A, 2)
(A, Andy)
(P, Pat)
(A, Ahir)
res0 = [(A, 2), ...]
Step 2: Create execution plan
•  Pipeline as much as possible
•  Split into “stages” based on need to
reorganize data
Stage 1
Stage 2
HadoopRDD
map()
groupBy()
mapValues()
collect()
Andy
Pat
Ahir
(A, [Ahir, Andy])
 (P, [Pat])
(A, 2)
 (P, 1)
(A, Andy)
(P, Pat)
(A, Ahir)
res0 = [(A, 2), (P, 1)]
•  Split each stage into tasks
•  A task is data + computation
•  Execute all tasks within a stage before
moving on

Step 3: Schedule tasks
Step 3: Schedule tasks
Computation
 Data
hdfs:/names/0.gz
hdfs:/names/1.gz
hdfs:/names/2.gz
Task 0
Task 1
Task 2
hdfs:/names/3.gz
…
Stage 1
HadoopRDD
map()
Task 3
hdfs:/names/0.gz
Task 0
HadoopRDD
map()
hdfs:/names/1.gz
Task 1
HadoopRDD
map()
Step 3: Schedule tasks
/names/0.gz
/names/3.gz
/names/0.gz
HadoopRDD
map()
Time
HDFS
/names/1.gz
/names/2.gz
HDFS
/names/2.gz
/names/3.gz
HDFS
Step 3: Schedule tasks
/names/0.gz
/names/3.gz
HDFS
/names/1.gz
/names/2.gz
HDFS
/names/2.gz
/names/3.gz
HDFS
/names/0.gz
HadoopRDD
map()
Time
Step 3: Schedule tasks
/names/0.gz
/names/3.gz
HDFS
/names/1.gz
/names/2.gz
HDFS
/names/2.gz
/names/3.gz
HDFS
/names/0.gz
HadoopRDD
map()
Time
Step 3: Schedule tasks
/names/0.gz
/names/3.gz
HDFS
/names/1.gz
/names/2.gz
HDFS
/names/2.gz
/names/3.gz
HDFS
/names/0.gz
HadoopRDD
map()
/names/1.gz
HadoopRDD
map()
Time
Step 3: Schedule tasks
/names/0.gz
/names/3.gz
HDFS
/names/1.gz
/names/2.gz
HDFS
/names/2.gz
/names/3.gz
HDFS
/names/0.gz
HadoopRDD
map()
/names/1.gz
HadoopRDD
map()
Time
Step 3: Schedule tasks
/names/0.gz
/names/3.gz
HDFS
/names/1.gz
/names/2.gz
HDFS
/names/2.gz
/names/3.gz
HDFS
/names/0.gz
HadoopRDD
map()
/names/2.gz
HadoopRDD
map()
/names/1.gz
HadoopRDD
map()
Time
Step 3: Schedule tasks
/names/0.gz
/names/3.gz
HDFS
/names/1.gz
/names/2.gz
HDFS
/names/2.gz
/names/3.gz
HDFS
/names/0.gz
HadoopRDD
map()
/names/1.gz
HadoopRDD
map()
/names/2.gz
HadoopRDD
map()
Time
Step 3: Schedule tasks
/names/0.gz
/names/3.gz
HDFS
/names/1.gz
/names/2.gz
HDFS
/names/2.gz
/names/3.gz
HDFS
/names/0.gz
HadoopRDD
map()
/names/1.gz
HadoopRDD
map()
/names/2.gz
HadoopRDD
map()
Time
/names/3.gz
HadoopRDD
map()
Step 3: Schedule tasks
/names/0.gz
/names/3.gz
HDFS
/names/1.gz
/names/2.gz
HDFS
/names/2.gz
/names/3.gz
HDFS
/names/0.gz
HadoopRDD
map()
/names/1.gz
HadoopRDD
map()
/names/2.gz
HadoopRDD
map()
Time
/names/3.gz
HadoopRDD
map()
Step 3: Schedule tasks
/names/0.gz
/names/3.gz
HDFS
/names/1.gz
/names/2.gz
HDFS
/names/2.gz
/names/3.gz
HDFS
/names/0.gz
HadoopRDD
map()
/names/1.gz
HadoopRDD
map()
/names/2.gz
HadoopRDD
map()
Time
/names/3.gz
HadoopRDD
map()
The Shuffle
Stage 1
Stage 2
HadoopRDD
map()
groupBy()
mapValues()
collect()
The Shuffle
Stage	
  1	
  
Stage	
  2	
  
•  Redistributes data among partitions
•  Hash keys into buckets
•  Optimizations:
– Avoided when possible, if"
data is already properly"
partitioned
– Partial aggregation reduces"
data movement
The Shuffle
Disk	
  
Stage	
  2	
  
Stage	
  1	
  
•  Pull-based, not push-based
•  Write intermediate files to disk
Execution of a groupBy()
•  Build hash map within each partition
•  Note: Can spill across keys, but a single
key-value pair must fit in memory
A => [Arsalan, Aaron, Andrew, Andrew, Andy, Ahir, Ali, …],
E => [Erin, Earl, Ed, …]
…
Done!
Stage 1
Stage 2
HadoopRDD
map()
groupBy()
mapValues()
collect()
What went wrong?
•  Too few partitions to get good concurrency
•  Large per-key groupBy()
•  Shipped all data across the cluster
Common issue checklist
1.  Ensure enough partitions for concurrency
2.  Minimize memory consumption (esp. of
sorting and large keys in groupBys)
3.  Minimize amount of data shuffled
4.  Know the standard library
1 & 2 are about tuning number of partitions!
Importance of Partition Tuning
•  Main issue: too few partitions
–  Less concurrency
–  More susceptible to data skew
–  Increased memory pressure for groupBy,
reduceByKey, sortByKey, etc.
•  Secondary issue: too many partitions
•  Need “reasonable number” of partitions
–  Commonly between 100 and 10,000 partitions
–  Lower bound: At least ~2x number of cores in
cluster
–  Upper bound: Ensure tasks take at least 100ms
Memory Problems
•  Symptoms:
–  Inexplicably bad performance
–  Inexplicable executor/machine failures"
(can indicate too many shuffle files too)
•  Diagnosis:
–  Set spark.executor.extraJavaOptions to include 
•  -XX:+PrintGCDetails
•  -XX:+HeapDumpOnOutOfMemoryError
–  Check dmesg for oom-killer logs
•  Resolution:
–  Increase spark.executor.memory
–  Increase number of partitions
–  Re-evaluate program structure (!)
Fixing our mistakes
sc.textFile(“hdfs:/names”)
.map(name => (name.charAt(0), name))
.groupByKey()
.mapValues { names => names.toSet.size }
.collect()
1.  Ensure enough partitions for
concurrency
2.  Minimize memory consumption (esp. of
large groupBys and sorting)
3.  Minimize data shuffle
4.  Know the standard library
Fixing our mistakes
sc.textFile(“hdfs:/names”)
.repartition(6)
.map(name => (name.charAt(0), name))
.groupByKey()
.mapValues { names => names.toSet.size }
.collect()
1.  Ensure enough partitions for
concurrency
2.  Minimize memory consumption (esp. of
large groupBys and sorting)
3.  Minimize data shuffle
4.  Know the standard library
Fixing our mistakes
sc.textFile(“hdfs:/names”)
.repartition(6)
.distinct()
.map(name => (name.charAt(0), name))
.groupByKey()
.mapValues { names => names.toSet.size }
.collect()
1.  Ensure enough partitions for
concurrency
2.  Minimize memory consumption (esp. of
large groupBys and sorting)
3.  Minimize data shuffle
4.  Know the standard library
Fixing our mistakes
sc.textFile(“hdfs:/names”)
.repartition(6)
.distinct()
.map(name => (name.charAt(0), name))
.groupByKey()
.mapValues { names => names.size }
.collect()
1.  Ensure enough partitions for
concurrency
2.  Minimize memory consumption (esp. of
large groupBys and sorting)
3.  Minimize data shuffle
4.  Know the standard library
Fixing our mistakes
sc.textFile(“hdfs:/names”)
.distinct(numPartitions = 6)
.map(name => (name.charAt(0), name))
.groupByKey()
.mapValues { names => names.size }
.collect()
1.  Ensure enough partitions for
concurrency
2.  Minimize memory consumption (esp. of
large groupBys and sorting)
3.  Minimize data shuffle
4.  Know the standard library
Fixing our mistakes
sc.textFile(“hdfs:/names”)
.distinct(numPartitions = 6)
.map(name => (name.charAt(0), 1))
.reduceByKey(_ + _)
.collect()
1.  Ensure enough partitions for
concurrency
2.  Minimize memory consumption (esp. of
large groupBys and sorting)
3.  Minimize data shuffle
4.  Know the standard library
Fixing our mistakes
sc.textFile(“hdfs:/names”)
.distinct(numPartitions = 6)
.map(name => (name.charAt(0), 1))
.reduceByKey(_ + _)
.collect()
Original:
sc.textFile(“hdfs:/names”)
.map(name => (name.charAt(0), name))
.groupByKey()
.mapValues { names => names.toSet.size }
.collect()
Questions?

Contenu connexe

Tendances

テスト用のプレゼンテーション
テスト用のプレゼンテーションテスト用のプレゼンテーション
テスト用のプレゼンテーションgooseboi
 
Data Munging in R - Chicago R User Group
Data Munging in R - Chicago R User GroupData Munging in R - Chicago R User Group
Data Munging in R - Chicago R User Groupdesignandanalytics
 
The elements of a functional mindset
The elements of a functional mindsetThe elements of a functional mindset
The elements of a functional mindsetEric Normand
 
Going beyond Django ORM limitations with Postgres
Going beyond Django ORM limitations with PostgresGoing beyond Django ORM limitations with Postgres
Going beyond Django ORM limitations with PostgresCraig Kerstiens
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation FrameworkTyler Brock
 
Scalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduceScalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduceLivePerson
 
NoSQL для PostgreSQL: Jsquery — язык запросов
NoSQL для PostgreSQL: Jsquery — язык запросовNoSQL для PostgreSQL: Jsquery — язык запросов
NoSQL для PostgreSQL: Jsquery — язык запросовCodeFest
 
Aggregation in MongoDB
Aggregation in MongoDBAggregation in MongoDB
Aggregation in MongoDBKishor Parkhe
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation FrameworkCaserta
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersRahul Jain
 
Dm adapter RubyConf.TW
Dm adapter RubyConf.TWDm adapter RubyConf.TW
Dm adapter RubyConf.TWcodingforrent
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichNorberto Leite
 
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander Korotkov
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander KorotkovPostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander Korotkov
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander KorotkovNikolay Samokhvalov
 
Introduction to Apache Drill - interactive query and analysis at scale
Introduction to Apache Drill - interactive query and analysis at scaleIntroduction to Apache Drill - interactive query and analysis at scale
Introduction to Apache Drill - interactive query and analysis at scaleMapR Technologies
 
Fuse'ing python for rapid development of storage efficient FS
Fuse'ing python for rapid development of storage efficient FSFuse'ing python for rapid development of storage efficient FS
Fuse'ing python for rapid development of storage efficient FSChetan Giridhar
 
Using MongoDB and Python
Using MongoDB and PythonUsing MongoDB and Python
Using MongoDB and PythonMike Bright
 
GOTO 2011 preso: 3x Hadoop
GOTO 2011 preso: 3x HadoopGOTO 2011 preso: 3x Hadoop
GOTO 2011 preso: 3x Hadoopfvanvollenhoven
 
PostgreSQL 9.4 JSON Types and Operators
PostgreSQL 9.4 JSON Types and OperatorsPostgreSQL 9.4 JSON Types and Operators
PostgreSQL 9.4 JSON Types and OperatorsNicholas Kiraly
 

Tendances (20)

テスト用のプレゼンテーション
テスト用のプレゼンテーションテスト用のプレゼンテーション
テスト用のプレゼンテーション
 
Data Munging in R - Chicago R User Group
Data Munging in R - Chicago R User GroupData Munging in R - Chicago R User Group
Data Munging in R - Chicago R User Group
 
The elements of a functional mindset
The elements of a functional mindsetThe elements of a functional mindset
The elements of a functional mindset
 
Going beyond Django ORM limitations with Postgres
Going beyond Django ORM limitations with PostgresGoing beyond Django ORM limitations with Postgres
Going beyond Django ORM limitations with Postgres
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
 
Scalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduceScalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduce
 
NoSQL для PostgreSQL: Jsquery — язык запросов
NoSQL для PostgreSQL: Jsquery — язык запросовNoSQL для PostgreSQL: Jsquery — язык запросов
NoSQL для PostgreSQL: Jsquery — язык запросов
 
Aggregation in MongoDB
Aggregation in MongoDBAggregation in MongoDB
Aggregation in MongoDB
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
 
Dm adapter RubyConf.TW
Dm adapter RubyConf.TWDm adapter RubyConf.TW
Dm adapter RubyConf.TW
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days Munich
 
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander Korotkov
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander KorotkovPostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander Korotkov
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander Korotkov
 
Elastic search 검색
Elastic search 검색Elastic search 검색
Elastic search 검색
 
Dm adapter
Dm adapterDm adapter
Dm adapter
 
Introduction to Apache Drill - interactive query and analysis at scale
Introduction to Apache Drill - interactive query and analysis at scaleIntroduction to Apache Drill - interactive query and analysis at scale
Introduction to Apache Drill - interactive query and analysis at scale
 
Fuse'ing python for rapid development of storage efficient FS
Fuse'ing python for rapid development of storage efficient FSFuse'ing python for rapid development of storage efficient FS
Fuse'ing python for rapid development of storage efficient FS
 
Using MongoDB and Python
Using MongoDB and PythonUsing MongoDB and Python
Using MongoDB and Python
 
GOTO 2011 preso: 3x Hadoop
GOTO 2011 preso: 3x HadoopGOTO 2011 preso: 3x Hadoop
GOTO 2011 preso: 3x Hadoop
 
PostgreSQL 9.4 JSON Types and Operators
PostgreSQL 9.4 JSON Types and OperatorsPostgreSQL 9.4 JSON Types and Operators
PostgreSQL 9.4 JSON Types and Operators
 

Similaire à A deeper-understanding-of-spark-internals

AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)Paul Chao
 
HadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software FrameworkHadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software FrameworkThoughtWorks
 
Assignment 1 MapReduce With Hadoop
Assignment 1  MapReduce With HadoopAssignment 1  MapReduce With Hadoop
Assignment 1 MapReduce With HadoopAllison Thompson
 
OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"Giivee The
 
You know, for search. Querying 24 Billion Documents in 900ms
You know, for search. Querying 24 Billion Documents in 900msYou know, for search. Querying 24 Billion Documents in 900ms
You know, for search. Querying 24 Billion Documents in 900msJodok Batlogg
 
Scalding - Hadoop Word Count in LESS than 70 lines of code
Scalding - Hadoop Word Count in LESS than 70 lines of codeScalding - Hadoop Word Count in LESS than 70 lines of code
Scalding - Hadoop Word Count in LESS than 70 lines of codeKonrad Malawski
 
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaDesing Pathshala
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
Hands on Mahout!
Hands on Mahout!Hands on Mahout!
Hands on Mahout!OSCON Byrum
 
Reading Cassandra Meetup Feb 2015: Apache Spark
Reading Cassandra Meetup Feb 2015: Apache SparkReading Cassandra Meetup Feb 2015: Apache Spark
Reading Cassandra Meetup Feb 2015: Apache SparkChristopher Batey
 
Advanced geoprocessing with Python
Advanced geoprocessing with PythonAdvanced geoprocessing with Python
Advanced geoprocessing with PythonChad Cooper
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-servicesSreenu Musham
 
Nosql hands on handout 04
Nosql hands on handout 04Nosql hands on handout 04
Nosql hands on handout 04Krishna Sankar
 
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj TalkSpark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj TalkZalando Technology
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADtab0ris_1
 

Similaire à A deeper-understanding-of-spark-internals (20)

Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
 
HadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software FrameworkHadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software Framework
 
Assignment 1 MapReduce With Hadoop
Assignment 1  MapReduce With HadoopAssignment 1  MapReduce With Hadoop
Assignment 1 MapReduce With Hadoop
 
Osd ctw spark
Osd ctw sparkOsd ctw spark
Osd ctw spark
 
OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"
 
You know, for search. Querying 24 Billion Documents in 900ms
You know, for search. Querying 24 Billion Documents in 900msYou know, for search. Querying 24 Billion Documents in 900ms
You know, for search. Querying 24 Billion Documents in 900ms
 
Scalding - Hadoop Word Count in LESS than 70 lines of code
Scalding - Hadoop Word Count in LESS than 70 lines of codeScalding - Hadoop Word Count in LESS than 70 lines of code
Scalding - Hadoop Word Count in LESS than 70 lines of code
 
Hadoop Overview kdd2011
Hadoop Overview kdd2011Hadoop Overview kdd2011
Hadoop Overview kdd2011
 
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Hands on Mahout!
Hands on Mahout!Hands on Mahout!
Hands on Mahout!
 
Reading Cassandra Meetup Feb 2015: Apache Spark
Reading Cassandra Meetup Feb 2015: Apache SparkReading Cassandra Meetup Feb 2015: Apache Spark
Reading Cassandra Meetup Feb 2015: Apache Spark
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Advanced geoprocessing with Python
Advanced geoprocessing with PythonAdvanced geoprocessing with Python
Advanced geoprocessing with Python
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
 
Nosql hands on handout 04
Nosql hands on handout 04Nosql hands on handout 04
Nosql hands on handout 04
 
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj TalkSpark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADta
 

A deeper-understanding-of-spark-internals

  • 1. A Deeper Understanding of Spark’s Internals Aaron Davidson" 07/01/2014
  • 2. This Talk •  Goal: Understanding how Spark runs, focus on performance •  Major core components: – Execution Model – The Shuffle – Caching
  • 3. This Talk •  Goal: Understanding how Spark runs, focus on performance •  Major core components: – Execution Model – The Shuffle – Caching
  • 4. Why understand internals? Goal: Find number of distinct names per “first letter” sc.textFile(“hdfs:/names”) .map(name => (name.charAt(0), name)) .groupByKey() .mapValues(names => names.toSet.size) .collect()
  • 5. Why understand internals? Goal: Find number of distinct names per “first letter” sc.textFile(“hdfs:/names”) .map(name => (name.charAt(0), name)) .groupByKey() .mapValues(names => names.toSet.size) .collect() Andy Pat Ahir
  • 6. Why understand internals? Goal: Find number of distinct names per “first letter” sc.textFile(“hdfs:/names”) .map(name => (name.charAt(0), name)) .groupByKey() .mapValues(names => names.toSet.size) .collect() Andy Pat Ahir (A, Andy) (P, Pat) (A, Ahir)
  • 7. Why understand internals? Goal: Find number of distinct names per “first letter” sc.textFile(“hdfs:/names”) .map(name => (name.charAt(0), name)) .groupByKey() .mapValues(names => names.toSet.size) .collect() Andy Pat Ahir (A, [Ahir, Andy]) (P, [Pat]) (A, Andy) (P, Pat) (A, Ahir)
  • 8. Why understand internals? Goal: Find number of distinct names per “first letter” sc.textFile(“hdfs:/names”) .map(name => (name.charAt(0), name)) .groupByKey() .mapValues(names => names.toSet.size) .collect() Andy Pat Ahir (A, [Ahir, Andy]) (P, [Pat]) (A, Andy) (P, Pat) (A, Ahir)
  • 9. Why understand internals? Goal: Find number of distinct names per “first letter” sc.textFile(“hdfs:/names”) .map(name => (name.charAt(0), name)) .groupByKey() .mapValues(names => names.toSet.size) .collect() Andy Pat Ahir (A, [Ahir, Andy]) (P, [Pat]) (A, Set(Ahir, Andy)) (P, Set(Pat)) (A, Andy) (P, Pat) (A, Ahir)
  • 10. Why understand internals? Goal: Find number of distinct names per “first letter” sc.textFile(“hdfs:/names”) .map(name => (name.charAt(0), name)) .groupByKey() .mapValues(names => names.toSet.size) .collect() Andy Pat Ahir (A, [Ahir, Andy]) (P, [Pat]) (A, 2) (P, 1) (A, Andy) (P, Pat) (A, Ahir)
  • 11. Why understand internals? Goal: Find number of distinct names per “first letter” sc.textFile(“hdfs:/names”) .map(name => (name.charAt(0), name)) .groupByKey() .mapValues(names => names.toSet.size) .collect() Andy Pat Ahir (A, [Ahir, Andy]) (P, [Pat]) (A, 2) (P, 1) (A, Andy) (P, Pat) (A, Ahir) res0 = [(A, 2), (P, 1)]
  • 12. Spark Execution Model 1.  Create DAG of RDDs to represent computation 2.  Create logical execution plan for DAG 3.  Schedule and execute individual tasks
  • 13. Step 1: Create RDDs sc.textFile(“hdfs:/names”) map(name => (name.charAt(0), name)) groupByKey() mapValues(names => names.toSet.size) collect()
  • 14. Step 1: Create RDDs HadoopRDD map() groupBy() mapValues() collect()
  • 15. Step 2: Create execution plan •  Pipeline as much as possible •  Split into “stages” based on need to reorganize data Stage 1 HadoopRDD map() groupBy() mapValues() collect() Andy Pat Ahir (A, [Ahir, Andy]) (P, [Pat]) (A, 2) (A, Andy) (P, Pat) (A, Ahir) res0 = [(A, 2), ...]
  • 16. Step 2: Create execution plan •  Pipeline as much as possible •  Split into “stages” based on need to reorganize data Stage 1 Stage 2 HadoopRDD map() groupBy() mapValues() collect() Andy Pat Ahir (A, [Ahir, Andy]) (P, [Pat]) (A, 2) (P, 1) (A, Andy) (P, Pat) (A, Ahir) res0 = [(A, 2), (P, 1)]
  • 17. •  Split each stage into tasks •  A task is data + computation •  Execute all tasks within a stage before moving on Step 3: Schedule tasks
  • 18. Step 3: Schedule tasks Computation Data hdfs:/names/0.gz hdfs:/names/1.gz hdfs:/names/2.gz Task 0 Task 1 Task 2 hdfs:/names/3.gz … Stage 1 HadoopRDD map() Task 3 hdfs:/names/0.gz Task 0 HadoopRDD map() hdfs:/names/1.gz Task 1 HadoopRDD map()
  • 19. Step 3: Schedule tasks /names/0.gz /names/3.gz /names/0.gz HadoopRDD map() Time HDFS /names/1.gz /names/2.gz HDFS /names/2.gz /names/3.gz HDFS
  • 20. Step 3: Schedule tasks /names/0.gz /names/3.gz HDFS /names/1.gz /names/2.gz HDFS /names/2.gz /names/3.gz HDFS /names/0.gz HadoopRDD map() Time
  • 21. Step 3: Schedule tasks /names/0.gz /names/3.gz HDFS /names/1.gz /names/2.gz HDFS /names/2.gz /names/3.gz HDFS /names/0.gz HadoopRDD map() Time
  • 22. Step 3: Schedule tasks /names/0.gz /names/3.gz HDFS /names/1.gz /names/2.gz HDFS /names/2.gz /names/3.gz HDFS /names/0.gz HadoopRDD map() /names/1.gz HadoopRDD map() Time
  • 23. Step 3: Schedule tasks /names/0.gz /names/3.gz HDFS /names/1.gz /names/2.gz HDFS /names/2.gz /names/3.gz HDFS /names/0.gz HadoopRDD map() /names/1.gz HadoopRDD map() Time
  • 24. Step 3: Schedule tasks /names/0.gz /names/3.gz HDFS /names/1.gz /names/2.gz HDFS /names/2.gz /names/3.gz HDFS /names/0.gz HadoopRDD map() /names/2.gz HadoopRDD map() /names/1.gz HadoopRDD map() Time
  • 25. Step 3: Schedule tasks /names/0.gz /names/3.gz HDFS /names/1.gz /names/2.gz HDFS /names/2.gz /names/3.gz HDFS /names/0.gz HadoopRDD map() /names/1.gz HadoopRDD map() /names/2.gz HadoopRDD map() Time
  • 26. Step 3: Schedule tasks /names/0.gz /names/3.gz HDFS /names/1.gz /names/2.gz HDFS /names/2.gz /names/3.gz HDFS /names/0.gz HadoopRDD map() /names/1.gz HadoopRDD map() /names/2.gz HadoopRDD map() Time /names/3.gz HadoopRDD map()
  • 27. Step 3: Schedule tasks /names/0.gz /names/3.gz HDFS /names/1.gz /names/2.gz HDFS /names/2.gz /names/3.gz HDFS /names/0.gz HadoopRDD map() /names/1.gz HadoopRDD map() /names/2.gz HadoopRDD map() Time /names/3.gz HadoopRDD map()
  • 28. Step 3: Schedule tasks /names/0.gz /names/3.gz HDFS /names/1.gz /names/2.gz HDFS /names/2.gz /names/3.gz HDFS /names/0.gz HadoopRDD map() /names/1.gz HadoopRDD map() /names/2.gz HadoopRDD map() Time /names/3.gz HadoopRDD map()
  • 29. The Shuffle Stage 1 Stage 2 HadoopRDD map() groupBy() mapValues() collect()
  • 30. The Shuffle Stage  1   Stage  2   •  Redistributes data among partitions •  Hash keys into buckets •  Optimizations: – Avoided when possible, if" data is already properly" partitioned – Partial aggregation reduces" data movement
  • 31. The Shuffle Disk   Stage  2   Stage  1   •  Pull-based, not push-based •  Write intermediate files to disk
  • 32. Execution of a groupBy() •  Build hash map within each partition •  Note: Can spill across keys, but a single key-value pair must fit in memory A => [Arsalan, Aaron, Andrew, Andrew, Andy, Ahir, Ali, …], E => [Erin, Earl, Ed, …] …
  • 34. What went wrong? •  Too few partitions to get good concurrency •  Large per-key groupBy() •  Shipped all data across the cluster
  • 35. Common issue checklist 1.  Ensure enough partitions for concurrency 2.  Minimize memory consumption (esp. of sorting and large keys in groupBys) 3.  Minimize amount of data shuffled 4.  Know the standard library 1 & 2 are about tuning number of partitions!
  • 36. Importance of Partition Tuning •  Main issue: too few partitions –  Less concurrency –  More susceptible to data skew –  Increased memory pressure for groupBy, reduceByKey, sortByKey, etc. •  Secondary issue: too many partitions •  Need “reasonable number” of partitions –  Commonly between 100 and 10,000 partitions –  Lower bound: At least ~2x number of cores in cluster –  Upper bound: Ensure tasks take at least 100ms
  • 37. Memory Problems •  Symptoms: –  Inexplicably bad performance –  Inexplicable executor/machine failures" (can indicate too many shuffle files too) •  Diagnosis: –  Set spark.executor.extraJavaOptions to include •  -XX:+PrintGCDetails •  -XX:+HeapDumpOnOutOfMemoryError –  Check dmesg for oom-killer logs •  Resolution: –  Increase spark.executor.memory –  Increase number of partitions –  Re-evaluate program structure (!)
  • 38. Fixing our mistakes sc.textFile(“hdfs:/names”) .map(name => (name.charAt(0), name)) .groupByKey() .mapValues { names => names.toSet.size } .collect() 1.  Ensure enough partitions for concurrency 2.  Minimize memory consumption (esp. of large groupBys and sorting) 3.  Minimize data shuffle 4.  Know the standard library
  • 39. Fixing our mistakes sc.textFile(“hdfs:/names”) .repartition(6) .map(name => (name.charAt(0), name)) .groupByKey() .mapValues { names => names.toSet.size } .collect() 1.  Ensure enough partitions for concurrency 2.  Minimize memory consumption (esp. of large groupBys and sorting) 3.  Minimize data shuffle 4.  Know the standard library
  • 40. Fixing our mistakes sc.textFile(“hdfs:/names”) .repartition(6) .distinct() .map(name => (name.charAt(0), name)) .groupByKey() .mapValues { names => names.toSet.size } .collect() 1.  Ensure enough partitions for concurrency 2.  Minimize memory consumption (esp. of large groupBys and sorting) 3.  Minimize data shuffle 4.  Know the standard library
  • 41. Fixing our mistakes sc.textFile(“hdfs:/names”) .repartition(6) .distinct() .map(name => (name.charAt(0), name)) .groupByKey() .mapValues { names => names.size } .collect() 1.  Ensure enough partitions for concurrency 2.  Minimize memory consumption (esp. of large groupBys and sorting) 3.  Minimize data shuffle 4.  Know the standard library
  • 42. Fixing our mistakes sc.textFile(“hdfs:/names”) .distinct(numPartitions = 6) .map(name => (name.charAt(0), name)) .groupByKey() .mapValues { names => names.size } .collect() 1.  Ensure enough partitions for concurrency 2.  Minimize memory consumption (esp. of large groupBys and sorting) 3.  Minimize data shuffle 4.  Know the standard library
  • 43. Fixing our mistakes sc.textFile(“hdfs:/names”) .distinct(numPartitions = 6) .map(name => (name.charAt(0), 1)) .reduceByKey(_ + _) .collect() 1.  Ensure enough partitions for concurrency 2.  Minimize memory consumption (esp. of large groupBys and sorting) 3.  Minimize data shuffle 4.  Know the standard library
  • 44. Fixing our mistakes sc.textFile(“hdfs:/names”) .distinct(numPartitions = 6) .map(name => (name.charAt(0), 1)) .reduceByKey(_ + _) .collect() Original: sc.textFile(“hdfs:/names”) .map(name => (name.charAt(0), name)) .groupByKey() .mapValues { names => names.toSet.size } .collect()