SlideShare une entreprise Scribd logo
1  sur  51
Spark Deep Dive
Corey Nolet
Tetra Concepts
Design Philosophies
●
Akka
●
Remote actors model
●
Designing for scalability
●
Distributed / concurrent processing
●
Across threads, processes, machines
●
Scala
●
Functional / closure-based
●
Lazy-evaluated
●
Immutable
●
Type inference
●
Terse but safe
Hadoop-based
●
Integration with HDFS
●
Preserves data locality
●
Shuffles for all-to-all communications
●
Integrates natively with resource negotiators
like YARN
●
Can use existing Hadoop input/output
formats
New concepts for the
community
●
Dependency graph instead of
map/combine/reduce
●
Can be narrow or wide depending on
communication model
●
Reprocessing partitions instead of restarting entire
tasks
●
Dataset appears like a local collection but
actions cause distributed computation
●
Memory can be used to cache data for reuse
across different transformations & actions.
RDD
●
API Similar to Scala's collection's API
●
Provides lazy functions like map(), flatMap(),
reduce(), collect(), etc…
●
Transformation lineage is tracked
●
Partitions can be rebuilt in the case of failure
●
Broken up into partitions that get scheduled
on tasks
Jobs, Stages, Tasks
●
SparkContext can be a long-running object and
we can submit many jobs to it.
●
Job: a sequence of transformations and actions
on an RDD
●
Stage: a specific transformation or action on an
RDD that gets scheduled on the executors.
●
Tasks: The actual closures executing on
executors to process stages.
What are partitions?
●
Chunks of data that make up an RDD.
●
Distributed across the cluster and control
parallelism of processing
●
Often start in a job from an input format
●
Similar to input splits in MapReduce
●
Number can change throughout the stages
that make up a job
●
Default can be set using
spark.default.parallelism
Partitions
Partition Locality
●
Can carry a set of “preferred locations” for
which tasks should be scheduled.
●
Like splits in MapReduce
●
Locality levels lowered when tasks become
too busy
●
Process, Node, Rack, Any, or No Pref
●
Process is most preferred
●
Set through
spark.locality.wait.[process,node,rack]
Partition Sizes
●
Can be changed manually using
rdd.coalesce()
●
Low overhead in deserializing tasks to
process partitions
●
Unlike MapReduce, many small partitions
are recommended over few large ones.
●
Generally 2-3 per core
●
Tasks can be small enough to run in 200ms
and still be efficient.
Changing Partition Sizes
rdd.coalesce(
numPartitions,
shuffle?
)
rdd.repartition(
numPartitions
)
Changing Partition Sizes
rdd.repartition(numParts)
actually calls
rdd.coalesce(numParts, true)
Coalesce (not shuffled)
●
Results in narrow dependency
●
For reducing number of partitions
●
Drastic decrease (e.g. 1000→ 1) usually
benefits more from shuffling
●
Final number of parts will never be greater
than specified amount
●
Could be less if the number of parent parts
is less
Coalesce (not shuffled)
●
Groups final partitions so they map to the
same number of parent partitions
●
When parents have locality information:
●
Attempts to group parent partitions on their
local nodes
●
When parents don't have locality information:
●
Create groups by chunking parents that are
close in the array of partitions
Coalesce (shuffled)
●
Results in wide dependency
●
Allows number of partitions to be increased
at the expense of a shuffle.
●
Evens out distribution of data using a hash
partitioner.
Memory in Spark
Executor Memory
●
Divided among cache and processing
●
60% used for cached objects
spark.storage.memoryFraction=60
spark.storage.safetyFraction=90
●
20% used for shuffles
spark.shuffle.memoryFraction=20
spark.shuffle.safetyFraction=80
●
What's left over is for task execution
●
Usable memory is defined as follows:
(max memory allocated to JVM – overhead memory
used in the JVM) * memoryFraction * safetyFraction.
Executor Memory
●
High JVM overhead can significantly reduce amount of
memory available for caching, shuffles, and task
execution.
●
Default amount allocated for YARN executors used to
be 7%. Raised to 10% in 1.3
●
Dependent on choices of data structures and
overhead of classes used.
●
spark.yarn.executor.memoryOverhead
RDD Caching
●
Useful when multiple downstream
transformations depend on a single upstream
RDD
val rdd1 = inputRdd.map(..)..saveAsTextFile(..)
val rdd2 = inputRdd.map(..)..saveAsTextFile(..)
●
Done through rdd.persist()
●
LRU eviction of memory cached RDDs when
memory is full (automatic cleanup)
●
Can be manually evicted using
rdd.unpersist()
RDD Caching
●
Deserialized / Raw
●
Generally faster
●
No cost of serializing data
●
Larger data sets put pressure on the garbage
collector
●
Serialized
●
Can take up to 2x - 4x less memory
●
Can be slower processing than raw while
garbage collector is running efficiently
Storage Levels
●
MEMORY_ONLY
●
MEMORY_AND_DISK
●
MEMORY_ONLY_SER
●
MEMORY_AND_DISK_SER
●
DISK_ONLY
●
MEMORY_ONLY_SER_2
●
MEMORY_AND_DISK_SER_2
●
MEMORY_ONLY_2
●
MEMORY_AND_DISK_2
●
OFF_HEAP
Tachyon
●
Uses a Ramdisk, or in-memory file system,
to expose HDFS API.
●
Asynchronously writes to HDFS
●
Allows off-heap caching to put less pressure
on garbage collector
●
Data can be shared by multiple executors
●
Cached data is not lost when an executor
dies
●
Still experimental as of Spark 1.4.0
Project Tungsten
●
Designs for three major optimizations to Spark
●
One of them provides off-heap memory
management to lower object overheads and
bypass garbage collection.
●
Another provides cache-aware data structures
that can minimize memory lookups
●
https://databricks.com/blog/2015/04/28/project-
tungsten-bringing-spark-closer-to-bare-
metal.html
Shared Memory
●
Broadcast variables
●
Read-only memory cached on each executor
and shared across tasks
●
Can be used like distributed cache in
MapReduce to share large lookup tables
across tasks
●
Accumulators
●
Can be used like counters in MapReduce
●
Can also perform any generic associative
algorithm.
Broadcast & Accumulators
// Using broadcast variable
val valueToWrap = “fubar”
val broadcastVal = sc.broadcast(valueToWrap)
…
rdd.filter(_ == broadcastVal.value)
// Using accumulators
val accumulator = sc.accumulator(0)
rdd.map(it => {
accumulator += 1
it
})
Serialization in Spark
●
Two different types
●
Closures
●
Data
Closures
●
Scala can be a little confusing
●
Functions vs Methods
●
Objects vs Classes
●
Closure is just an anonymous
implementation of the FunctionX class in
Scala
●
Closure will always contain a reference to its
outer object
●
Any objects used inside the closure will be serialized
Functions vs. Methods
class MyClass {
// compiles down to Java method
def myMethod(): Unit {}
}
class MyClass {
// compiles to impl of FunctionX trait
val myFuction: () => Unit = () => {}
}
Methods can also be coerced into functions, allowing
them to pass around like functions.
Objects vs. Classes
object MyObject {
// compiles to static member of MyObject
val myVal: Boolean = true
// compiles to Java static method
def myMethod(): Unit {}
}
class MyClass {
// compiles to instance value
val myVal: Boolean = true
// compiles to method on MyClass
def myMethod(): Unit {}
}
Closure Serialization
●
The primary way code makes it from the
driver to executors
●
No more extends Mapper/Reducer
●
Closures can be shipped at runtime
●
Currently only supports Java serialization
●
Closure cleaner attempts to prune unused
references of the object graph
●
Can still use unnecessary memory if not careful
Closure Cleaner
class MyProcessor {
def process(rdd: RDD[String]) {
rdd.filter(_ == “good”)
...
}
}
The filter() closure's reference to outer class
MyProcessor gets pruned by the ClosureCleaner
because it is not used.
Closure Cleaner
class MyProcessor(
filterWord: String
) {
def process(rdd: RDD[String]) {
rdd.filter(_ == filterWord)
...
}
}
Whole class gets serialized but doesn't extend
Serializable. Execution will fail.
Closure Cleaner
object MyProcessor{
val filterWord = ...
def process(
rdd: RDD[String]
) {
rdd.filter(_ == filterWord)
...
}
}
process() compiles to a Java static method so
only filter()'s closure gets serialized.
Closure Cleaner
class MyProcessor(
filterWord: String
) {
def process(rdd: RDD[String]) {
val filterWord2 = filterWord
rdd.filter(_ == filterWord2)
...
}
}
The filter() closure serializes because filterWord2
has separated the value from the instance of
MyProcessor
Data Serialization
●
Kryo & Java both supported
●
Kryo is faster and more compact than Java
●
spark.serializer =
org.apache.spark.serializer.KryoSerializer
●
Kryo requires object serializers to be
registered
●
Native Scala classes are supported
●
Serialization errors will not be noticed until
the data leaves the JVM
●
Used in in memory and on disk
Shuffling in Spark
Shuffling
●
Required for all-to-all communications
●
reduceByKey(), aggregateByKey(),
sortByKey(), etc…
●
Always a bottleneck
●
Network & Disk IO
●
Serialization
●
Compression
●
Receiving lots of attention.
Spark vs MapReduce
●
Reduce phase does not overlap with the Map
phase like MapReduce
●
Spark reducer's pull shuffle data from
mappers
●
MapReduce does push in a concurrent copy
stage
●
Map and Reduce tasks all run on same
executor JVMs
●
MapReduce uses different JVMs for these
tasks
First there was a hash-
based shuffle...
●
Originally required M * R number of
intermediate files (that is, # of mappers & #
of reducers)
●
Concurrently opened files are C * R (# of
cores * # of reducers)
●
Enabling shuffle spilling created even more
temporary files.
●
Many random writes/reads caused CPU
time spent in reduces to mainly wait on disk
I/O
Original hash-based shuffle
Then they consolidated
files...
●
Introduced an extra merge phase
●
All map tasks running on the same core write to
the same set of files in tandem
●
File consolidation reduces number of files to C *
R
●
Each reducer fetches a smaller number of files
●
Still bad for high numbers of reducers
●
Concurrently opened files are still C * R
●
spark.shuffle.consolidateFiles=true
Consolidated files
And along came sort-based
Shuffle
1)Records sorted in memory by partition ID and merged into a single
file for each core along with an index file
●
If map-side combine, buckets sorted by key & partition and run
through combiner
●
Otherwise, just sorted only by partition
2)Ranges of buckets in each file served to reducers upon request
3)Each segment is merged together on the reducer
4)Records deserialized and passed through all-to-all function (e.g.
aggregateByKey(), reduceByKey()) to complete the stage
●
In the case of sortByKey() and other ordered functions, the
partitions are sorted before being run through the all-to-all
function.
5)When <= 200 reducers and no sort or aggregation is needed
hash-based is used instead
●
spark.shuffle.sort.bypassMergeThreshold
And along came a sort-
based Shuffle
Shuffle Evolution
●
Shuffle write consolidation in 0.9
●
Pluggable shuffle managers in Spark 1.0
●
Hash-based (pre-1.2)
●
Sort-based (introduced in 1.1, default in 1.2+)
●
NettyTransferService introduced in 1.2 for
transferring shuffle “blocks”
●
External shuffle service introduced in 1.2
●
In 1.5+, Community is working on tiered merge
strategy.
Shuffle Durability
●
Failure of an executor will lose shuffle files unless Aux
Shuffle Service is configured on the YARN
NodeManager.
SparkConf:
spark.yarn.shuffle.service = true
yarn-site.xml add spark_shuffle to:
yarn.nodemanager.aux-services
yarn-site.xml add:
yarn.nodemanager.aux-services.spark_shuffle.class =
org.apache.spark.network.yarn.YarnShuffleService
Perhaps we could establish
some best practices
●
Consider the parallelism at each stage of your
jobs based on your data and number of cores.
●
Executor memory should be fine-tuned for
expected cache and shuffle sizes.
●
Minimize footprint of closures
●
Use broadcasts for large values
●
Use Kryo to serialize data
●
Know your communication patterns (one-to-all,
all-to-all, etc..) and optimize accordingly
●
Use aux-shuffle service
Shuffle Optimization
●
A couple properties that affect shuffle
performance
●
spark.akka.threads
●
spark.reducer.maxMbInFlight
●
By default, shuffles will use only 20% of the
memory allocated to executor
●
Increase spark.shuffle.memoryFraction
at expense of
spark.storage.memoryFraction
Questions?
corey@tetraconcepts.com

Contenu connexe

Tendances

Tendances (20)

Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overview
 
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveApache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
 
Resilient Distributed DataSets - Apache SPARK
Resilient Distributed DataSets - Apache SPARKResilient Distributed DataSets - Apache SPARK
Resilient Distributed DataSets - Apache SPARK
 
DTCC '14 Spark Runtime Internals
DTCC '14 Spark Runtime InternalsDTCC '14 Spark Runtime Internals
DTCC '14 Spark Runtime Internals
 
RDD
RDDRDD
RDD
 
Spark core
Spark coreSpark core
Spark core
 
Apache Spark RDD 101
Apache Spark RDD 101Apache Spark RDD 101
Apache Spark RDD 101
 
Transformations and actions a visual guide training
Transformations and actions a visual guide trainingTransformations and actions a visual guide training
Transformations and actions a visual guide training
 
Apache Spark RDDs
Apache Spark RDDsApache Spark RDDs
Apache Spark RDDs
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Apache Spark Internals
Apache Spark InternalsApache Spark Internals
Apache Spark Internals
 
IBM Spark Meetup - RDD & Spark Basics
IBM Spark Meetup - RDD & Spark BasicsIBM Spark Meetup - RDD & Spark Basics
IBM Spark Meetup - RDD & Spark Basics
 
Advanced Spark Programming - Part 1 | Big Data Hadoop Spark Tutorial | CloudxLab
Advanced Spark Programming - Part 1 | Big Data Hadoop Spark Tutorial | CloudxLabAdvanced Spark Programming - Part 1 | Big Data Hadoop Spark Tutorial | CloudxLab
Advanced Spark Programming - Part 1 | Big Data Hadoop Spark Tutorial | CloudxLab
 
Apache spark core
Apache spark coreApache spark core
Apache spark core
 
Apache Spark
Apache Spark Apache Spark
Apache Spark
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
 
Spark architecture
Spark architectureSpark architecture
Spark architecture
 

En vedette

Createpresentationsthatareoutofthisworld 130401005314-phpapp02 (1)
Createpresentationsthatareoutofthisworld 130401005314-phpapp02 (1)Createpresentationsthatareoutofthisworld 130401005314-phpapp02 (1)
Createpresentationsthatareoutofthisworld 130401005314-phpapp02 (1)
Abigail Cantos
 
How to Create an Extraordinary Resume v6122015
How to Create an Extraordinary Resume v6122015How to Create an Extraordinary Resume v6122015
How to Create an Extraordinary Resume v6122015
Charles Hagan
 
Sujay Kumar Jha - Curriculum Vitae
Sujay Kumar Jha - Curriculum VitaeSujay Kumar Jha - Curriculum Vitae
Sujay Kumar Jha - Curriculum Vitae
Sujay Kumar Jha
 
Maths powerpoint
Maths powerpointMaths powerpoint
Maths powerpoint
ashwin513
 
презентация администрирование 2409
презентация администрирование 2409презентация администрирование 2409
презентация администрирование 2409
Елена Дашкевич
 
Primaflor Company Presentation PPT
Primaflor Company Presentation PPTPrimaflor Company Presentation PPT
Primaflor Company Presentation PPT
Erick Tilander
 

En vedette (20)

Createpresentationsthatareoutofthisworld 130401005314-phpapp02 (1)
Createpresentationsthatareoutofthisworld 130401005314-phpapp02 (1)Createpresentationsthatareoutofthisworld 130401005314-phpapp02 (1)
Createpresentationsthatareoutofthisworld 130401005314-phpapp02 (1)
 
Top 6 Shifts in Corporate Philanthropy, Presentation by Des Hague
Top 6 Shifts in Corporate Philanthropy, Presentation by Des HagueTop 6 Shifts in Corporate Philanthropy, Presentation by Des Hague
Top 6 Shifts in Corporate Philanthropy, Presentation by Des Hague
 
How to Create an Extraordinary Resume v6122015
How to Create an Extraordinary Resume v6122015How to Create an Extraordinary Resume v6122015
How to Create an Extraordinary Resume v6122015
 
Context 2015
Context 2015Context 2015
Context 2015
 
How To Contribute To #GivingTuesday
How To Contribute To #GivingTuesdayHow To Contribute To #GivingTuesday
How To Contribute To #GivingTuesday
 
国境なき子どもたち(KnK) 美しきカンボジア -塀の中の少年たち-
国境なき子どもたち(KnK) 美しきカンボジア -塀の中の少年たち-国境なき子どもたち(KnK) 美しきカンボジア -塀の中の少年たち-
国境なき子どもたち(KnK) 美しきカンボジア -塀の中の少年たち-
 
Sujay Kumar Jha - Curriculum Vitae
Sujay Kumar Jha - Curriculum VitaeSujay Kumar Jha - Curriculum Vitae
Sujay Kumar Jha - Curriculum Vitae
 
Maths powerpoint
Maths powerpointMaths powerpoint
Maths powerpoint
 
examensbevis2
examensbevis2examensbevis2
examensbevis2
 
งานคอมฟิว
งานคอมฟิวงานคอมฟิว
งานคอมฟิว
 
презентация администрирование 2409
презентация администрирование 2409презентация администрирование 2409
презентация администрирование 2409
 
Adapt or Die
Adapt or DieAdapt or Die
Adapt or Die
 
Primaflor Company Presentation PPT
Primaflor Company Presentation PPTPrimaflor Company Presentation PPT
Primaflor Company Presentation PPT
 
Etica y deontologia
Etica y deontologiaEtica y deontologia
Etica y deontologia
 
продвижение сайтов
продвижение сайтовпродвижение сайтов
продвижение сайтов
 
How to Learn About Your Employer's Charitable Work
How to Learn About Your Employer's Charitable WorkHow to Learn About Your Employer's Charitable Work
How to Learn About Your Employer's Charitable Work
 
Integrating Charity with Your Brand
Integrating Charity with Your BrandIntegrating Charity with Your Brand
Integrating Charity with Your Brand
 
Sheryl Sandberg Leading in Charitable Efforts
Sheryl Sandberg Leading in Charitable EffortsSheryl Sandberg Leading in Charitable Efforts
Sheryl Sandberg Leading in Charitable Efforts
 
преимущества 1админ
преимущества 1админпреимущества 1админ
преимущества 1админ
 
преимущества 1админ 14 07
преимущества 1админ 14 07преимущества 1админ 14 07
преимущества 1админ 14 07
 

Similaire à Spark Deep Dive

Similaire à Spark Deep Dive (20)

Study Notes: Apache Spark
Study Notes: Apache SparkStudy Notes: Apache Spark
Study Notes: Apache Spark
 
Apache spark - Spark's distributed programming model
Apache spark - Spark's distributed programming modelApache spark - Spark's distributed programming model
Apache spark - Spark's distributed programming model
 
Spark
SparkSpark
Spark
 
Apache Spark™ is a multi-language engine for executing data-S5.ppt
Apache Spark™ is a multi-language engine for executing data-S5.pptApache Spark™ is a multi-language engine for executing data-S5.ppt
Apache Spark™ is a multi-language engine for executing data-S5.ppt
 
Spark 计算模型
Spark 计算模型Spark 计算模型
Spark 计算模型
 
Apache Spark II (SparkSQL)
Apache Spark II (SparkSQL)Apache Spark II (SparkSQL)
Apache Spark II (SparkSQL)
 
Big Data processing with Apache Spark
Big Data processing with Apache SparkBig Data processing with Apache Spark
Big Data processing with Apache Spark
 
Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Apache Spark: What? Why? When?
Apache Spark: What? Why? When?
 
Spark architechure.pptx
Spark architechure.pptxSpark architechure.pptx
Spark architechure.pptx
 
Spark
SparkSpark
Spark
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
 
Sdc challenges-2012
Sdc challenges-2012Sdc challenges-2012
Sdc challenges-2012
 
Sdc 2012-challenges
Sdc 2012-challengesSdc 2012-challenges
Sdc 2012-challenges
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Apache spark - Installation
Apache spark - InstallationApache spark - Installation
Apache spark - Installation
 
Dragoncraft Architectural Overview
Dragoncraft Architectural OverviewDragoncraft Architectural Overview
Dragoncraft Architectural Overview
 
Geek Night - Functional Data Processing using Spark and Scala
Geek Night - Functional Data Processing using Spark and ScalaGeek Night - Functional Data Processing using Spark and Scala
Geek Night - Functional Data Processing using Spark and Scala
 
BDAS RDD study report v1.2
BDAS RDD study report v1.2BDAS RDD study report v1.2
BDAS RDD study report v1.2
 

Dernier

%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 

Dernier (20)

MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 

Spark Deep Dive

  • 1. Spark Deep Dive Corey Nolet Tetra Concepts
  • 2. Design Philosophies ● Akka ● Remote actors model ● Designing for scalability ● Distributed / concurrent processing ● Across threads, processes, machines ● Scala ● Functional / closure-based ● Lazy-evaluated ● Immutable ● Type inference ● Terse but safe
  • 3. Hadoop-based ● Integration with HDFS ● Preserves data locality ● Shuffles for all-to-all communications ● Integrates natively with resource negotiators like YARN ● Can use existing Hadoop input/output formats
  • 4.
  • 5. New concepts for the community ● Dependency graph instead of map/combine/reduce ● Can be narrow or wide depending on communication model ● Reprocessing partitions instead of restarting entire tasks ● Dataset appears like a local collection but actions cause distributed computation ● Memory can be used to cache data for reuse across different transformations & actions.
  • 6.
  • 7. RDD ● API Similar to Scala's collection's API ● Provides lazy functions like map(), flatMap(), reduce(), collect(), etc… ● Transformation lineage is tracked ● Partitions can be rebuilt in the case of failure ● Broken up into partitions that get scheduled on tasks
  • 8. Jobs, Stages, Tasks ● SparkContext can be a long-running object and we can submit many jobs to it. ● Job: a sequence of transformations and actions on an RDD ● Stage: a specific transformation or action on an RDD that gets scheduled on the executors. ● Tasks: The actual closures executing on executors to process stages.
  • 9. What are partitions? ● Chunks of data that make up an RDD. ● Distributed across the cluster and control parallelism of processing ● Often start in a job from an input format ● Similar to input splits in MapReduce ● Number can change throughout the stages that make up a job ● Default can be set using spark.default.parallelism
  • 11. Partition Locality ● Can carry a set of “preferred locations” for which tasks should be scheduled. ● Like splits in MapReduce ● Locality levels lowered when tasks become too busy ● Process, Node, Rack, Any, or No Pref ● Process is most preferred ● Set through spark.locality.wait.[process,node,rack]
  • 12. Partition Sizes ● Can be changed manually using rdd.coalesce() ● Low overhead in deserializing tasks to process partitions ● Unlike MapReduce, many small partitions are recommended over few large ones. ● Generally 2-3 per core ● Tasks can be small enough to run in 200ms and still be efficient.
  • 15. Coalesce (not shuffled) ● Results in narrow dependency ● For reducing number of partitions ● Drastic decrease (e.g. 1000→ 1) usually benefits more from shuffling ● Final number of parts will never be greater than specified amount ● Could be less if the number of parent parts is less
  • 16. Coalesce (not shuffled) ● Groups final partitions so they map to the same number of parent partitions ● When parents have locality information: ● Attempts to group parent partitions on their local nodes ● When parents don't have locality information: ● Create groups by chunking parents that are close in the array of partitions
  • 17. Coalesce (shuffled) ● Results in wide dependency ● Allows number of partitions to be increased at the expense of a shuffle. ● Evens out distribution of data using a hash partitioner.
  • 19. Executor Memory ● Divided among cache and processing ● 60% used for cached objects spark.storage.memoryFraction=60 spark.storage.safetyFraction=90 ● 20% used for shuffles spark.shuffle.memoryFraction=20 spark.shuffle.safetyFraction=80 ● What's left over is for task execution ● Usable memory is defined as follows: (max memory allocated to JVM – overhead memory used in the JVM) * memoryFraction * safetyFraction.
  • 20. Executor Memory ● High JVM overhead can significantly reduce amount of memory available for caching, shuffles, and task execution. ● Default amount allocated for YARN executors used to be 7%. Raised to 10% in 1.3 ● Dependent on choices of data structures and overhead of classes used. ● spark.yarn.executor.memoryOverhead
  • 21. RDD Caching ● Useful when multiple downstream transformations depend on a single upstream RDD val rdd1 = inputRdd.map(..)..saveAsTextFile(..) val rdd2 = inputRdd.map(..)..saveAsTextFile(..) ● Done through rdd.persist() ● LRU eviction of memory cached RDDs when memory is full (automatic cleanup) ● Can be manually evicted using rdd.unpersist()
  • 22. RDD Caching ● Deserialized / Raw ● Generally faster ● No cost of serializing data ● Larger data sets put pressure on the garbage collector ● Serialized ● Can take up to 2x - 4x less memory ● Can be slower processing than raw while garbage collector is running efficiently
  • 24. Tachyon ● Uses a Ramdisk, or in-memory file system, to expose HDFS API. ● Asynchronously writes to HDFS ● Allows off-heap caching to put less pressure on garbage collector ● Data can be shared by multiple executors ● Cached data is not lost when an executor dies ● Still experimental as of Spark 1.4.0
  • 25. Project Tungsten ● Designs for three major optimizations to Spark ● One of them provides off-heap memory management to lower object overheads and bypass garbage collection. ● Another provides cache-aware data structures that can minimize memory lookups ● https://databricks.com/blog/2015/04/28/project- tungsten-bringing-spark-closer-to-bare- metal.html
  • 26. Shared Memory ● Broadcast variables ● Read-only memory cached on each executor and shared across tasks ● Can be used like distributed cache in MapReduce to share large lookup tables across tasks ● Accumulators ● Can be used like counters in MapReduce ● Can also perform any generic associative algorithm.
  • 27. Broadcast & Accumulators // Using broadcast variable val valueToWrap = “fubar” val broadcastVal = sc.broadcast(valueToWrap) … rdd.filter(_ == broadcastVal.value) // Using accumulators val accumulator = sc.accumulator(0) rdd.map(it => { accumulator += 1 it })
  • 28. Serialization in Spark ● Two different types ● Closures ● Data
  • 29. Closures ● Scala can be a little confusing ● Functions vs Methods ● Objects vs Classes ● Closure is just an anonymous implementation of the FunctionX class in Scala ● Closure will always contain a reference to its outer object ● Any objects used inside the closure will be serialized
  • 30. Functions vs. Methods class MyClass { // compiles down to Java method def myMethod(): Unit {} } class MyClass { // compiles to impl of FunctionX trait val myFuction: () => Unit = () => {} } Methods can also be coerced into functions, allowing them to pass around like functions.
  • 31. Objects vs. Classes object MyObject { // compiles to static member of MyObject val myVal: Boolean = true // compiles to Java static method def myMethod(): Unit {} } class MyClass { // compiles to instance value val myVal: Boolean = true // compiles to method on MyClass def myMethod(): Unit {} }
  • 32. Closure Serialization ● The primary way code makes it from the driver to executors ● No more extends Mapper/Reducer ● Closures can be shipped at runtime ● Currently only supports Java serialization ● Closure cleaner attempts to prune unused references of the object graph ● Can still use unnecessary memory if not careful
  • 33. Closure Cleaner class MyProcessor { def process(rdd: RDD[String]) { rdd.filter(_ == “good”) ... } } The filter() closure's reference to outer class MyProcessor gets pruned by the ClosureCleaner because it is not used.
  • 34. Closure Cleaner class MyProcessor( filterWord: String ) { def process(rdd: RDD[String]) { rdd.filter(_ == filterWord) ... } } Whole class gets serialized but doesn't extend Serializable. Execution will fail.
  • 35. Closure Cleaner object MyProcessor{ val filterWord = ... def process( rdd: RDD[String] ) { rdd.filter(_ == filterWord) ... } } process() compiles to a Java static method so only filter()'s closure gets serialized.
  • 36. Closure Cleaner class MyProcessor( filterWord: String ) { def process(rdd: RDD[String]) { val filterWord2 = filterWord rdd.filter(_ == filterWord2) ... } } The filter() closure serializes because filterWord2 has separated the value from the instance of MyProcessor
  • 37. Data Serialization ● Kryo & Java both supported ● Kryo is faster and more compact than Java ● spark.serializer = org.apache.spark.serializer.KryoSerializer ● Kryo requires object serializers to be registered ● Native Scala classes are supported ● Serialization errors will not be noticed until the data leaves the JVM ● Used in in memory and on disk
  • 39. Shuffling ● Required for all-to-all communications ● reduceByKey(), aggregateByKey(), sortByKey(), etc… ● Always a bottleneck ● Network & Disk IO ● Serialization ● Compression ● Receiving lots of attention.
  • 40. Spark vs MapReduce ● Reduce phase does not overlap with the Map phase like MapReduce ● Spark reducer's pull shuffle data from mappers ● MapReduce does push in a concurrent copy stage ● Map and Reduce tasks all run on same executor JVMs ● MapReduce uses different JVMs for these tasks
  • 41. First there was a hash- based shuffle... ● Originally required M * R number of intermediate files (that is, # of mappers & # of reducers) ● Concurrently opened files are C * R (# of cores * # of reducers) ● Enabling shuffle spilling created even more temporary files. ● Many random writes/reads caused CPU time spent in reduces to mainly wait on disk I/O
  • 43. Then they consolidated files... ● Introduced an extra merge phase ● All map tasks running on the same core write to the same set of files in tandem ● File consolidation reduces number of files to C * R ● Each reducer fetches a smaller number of files ● Still bad for high numbers of reducers ● Concurrently opened files are still C * R ● spark.shuffle.consolidateFiles=true
  • 45. And along came sort-based Shuffle 1)Records sorted in memory by partition ID and merged into a single file for each core along with an index file ● If map-side combine, buckets sorted by key & partition and run through combiner ● Otherwise, just sorted only by partition 2)Ranges of buckets in each file served to reducers upon request 3)Each segment is merged together on the reducer 4)Records deserialized and passed through all-to-all function (e.g. aggregateByKey(), reduceByKey()) to complete the stage ● In the case of sortByKey() and other ordered functions, the partitions are sorted before being run through the all-to-all function. 5)When <= 200 reducers and no sort or aggregation is needed hash-based is used instead ● spark.shuffle.sort.bypassMergeThreshold
  • 46. And along came a sort- based Shuffle
  • 47. Shuffle Evolution ● Shuffle write consolidation in 0.9 ● Pluggable shuffle managers in Spark 1.0 ● Hash-based (pre-1.2) ● Sort-based (introduced in 1.1, default in 1.2+) ● NettyTransferService introduced in 1.2 for transferring shuffle “blocks” ● External shuffle service introduced in 1.2 ● In 1.5+, Community is working on tiered merge strategy.
  • 48. Shuffle Durability ● Failure of an executor will lose shuffle files unless Aux Shuffle Service is configured on the YARN NodeManager. SparkConf: spark.yarn.shuffle.service = true yarn-site.xml add spark_shuffle to: yarn.nodemanager.aux-services yarn-site.xml add: yarn.nodemanager.aux-services.spark_shuffle.class = org.apache.spark.network.yarn.YarnShuffleService
  • 49. Perhaps we could establish some best practices ● Consider the parallelism at each stage of your jobs based on your data and number of cores. ● Executor memory should be fine-tuned for expected cache and shuffle sizes. ● Minimize footprint of closures ● Use broadcasts for large values ● Use Kryo to serialize data ● Know your communication patterns (one-to-all, all-to-all, etc..) and optimize accordingly ● Use aux-shuffle service
  • 50. Shuffle Optimization ● A couple properties that affect shuffle performance ● spark.akka.threads ● spark.reducer.maxMbInFlight ● By default, shuffles will use only 20% of the memory allocated to executor ● Increase spark.shuffle.memoryFraction at expense of spark.storage.memoryFraction

Notes de l'éditeur

  1. I think a great place to start would be with the core design philosophies that the architecture was designed around. First, we have Akka. This is a framework that, similar to many other distributed frameworks today, has you thinking about breaking your problems into their atomic particles first so that those particles can be designed on a single node and then scaled out as needed to run on clusters of machines. At its heart, it focuses on Actors who know how to intercept, process, and create new messages that are sent to other Actors. Actors are nothing more than objects which get serialized and deployed onto nodes and then start doing their magic, processing messages they choose to accept. Then there&amp;apos;s Scala which is the backbone that supports the expressive nature of the Akka framework. I could go on for hours about why Scala is such a useful framework for processing data but I&amp;apos;ll opt for the basic bullet points since we&amp;apos;re going to be strapped for time. Scala blends together functional and imperative programming in the JVM by using Java objects to define Closures, or first class functions that remember any variables available in the environment in which they were created. Similar to Java, Scala has a rich collections API. Unlike Java, however, Scala promotes immutability and uses shared structural state create new lightweight objects out of older objects when doing many operations that would normally require mutating state. Of course it does have mutable objects for those who need them, but it recommends sticking with immutability. When operations like add, remove, concatenate, etc.. are performed on many of the collections objects, they need not be copied- instead the inner Similar to Java 8&amp;apos;s streams API, and Guava&amp;apos;s Iterables, Scala has
  2. It turns out, this was no coincidence. This guy by the name of Matei Zaharia in 2008 wrote Hadoop&amp;apos;s FairScheduler while working for Cloudera. While getting his PhD at UC Berkeley, he created Apache Spark and Apache Mesos. Clearly this guy is no stranger to large-scale scheduling of computations on distributed systems. He is one of the co-founders of Databricks and is currently their CTO while also an assistant professor of Computer Science @ MIT. We love this guy- even though many of us don&amp;apos;t know who he is.