SlideShare a Scribd company logo
1 of 23
Download to read offline
Best Practices for Enabling
Speculative Execution on
Large Scale Platforms
Ron Hu, Venkata Sowrirajan
LinkedIn
Agenda
▪ Motivation
▪ Enhancements
▪ Configuration
▪ Metrics and analysis
▪ User guidance
▪ Future work
Speculative Execution
• A stage consists of many parallel
tasks and the stage can run as fast as
the slowest task runs.
• If one task is running very slowly in a
stage, Spark driver will re-launch a
speculation task for it on a different host.
• Between the regular task and the
speculation task, whichever task finishes
first is used. The slower task is killed.
3
❌
✅
Launch
speculative task
Time
Default Speculation Parameters
4
Configuration
Parameters
Default
value
Meaning
spark.speculation false If set to "true", performs speculative execution of
tasks.
spark.speculation.interval 100ms How often Spark will check for tasks to speculate.
spark.speculation.multiplier 1.5 How many times slower a task is than the median to
be considered for speculation.
spark.speculation.quantile 0.75 Fraction of tasks which must be complete before
speculation is enabled for a particular stage.
Motivation
• Speeds up straggler tasks - additional overhead.
• Default configs are generally too aggressive in most
cases.
• Speculating tasks that run for few seconds are
mostly wasteful.
• Investigate impact caused by data skews,
overloaded shuffle services etc with speculation
enabled.
• What is the impact if we enable speculation by
default in a multi-tenant large scale cluster?
5
Speculative Execution improvements
• Tasks run for few seconds gets speculated wasting
resources unnecessarily.
• Solution: Prevent tasks getting speculated which run
for few seconds
• Internally, introduced a new spark configuration
(spark.speculation.minRuntimeThreshold) which prevents
tasks from getting speculated that runs for less than the min threshold
time.
• Similar feature later got added to Apache Spark in
SPARK-33741
6
Speculative execution metrics
• Additional metrics are required to understand both the
usefulness and overhead introduced by speculative
execution.
• Existing onTaskEnd and onTaskStart event in
AppStatusListener is enriched to produce speculation
summary metrics for a stage.
7
Speculative execution metrics
• Added additional metrics for a stage with
speculative execution like:
• Number of speculated tasks
• Number of successful speculated tasks
• Number of killed speculated tasks
• Number of failed speculated tasks
8
Speculative execution metrics
• Speculation summary for a stage with additional metrics using
the existing events.
9
Updated Speculation Parameter Values
• Upstream Spark’s default speculation parameter values are not good for us.
• LinkedIn’s Spark jobs are mainly for batch off-line jobs plus some interactive analytics
workloads.
• We set speculation parameters to these default values for most LinkedIn’s applications.
Users can still overwrite per their individual needs.
10
Configuration Parameters
Upstream
Default
LinkedIn
Default
spark.speculation false true
spark.speculation.interval 100ms 1 sec
spark.speculation.multiplier 1.5 4.0
spark.speculation.quantile 0.75 0.90
spark.speculation.min.threshold N/A 30 sec
Metrics and Analysis
• We care about ROI (Return On Investment).
• We analyzed
• The return or performance gain, and
• The investment/overhead or additional cost
• We measured various metrics for one week on a large
cluster with 10K+ machines.
• A multi-tenant environment with 40K+ Spark applications running daily.
• Enabled dynamic allocations.
• With resource sharing and contention, performance varies due to transient
delays/congestions.
11
Task Level Statistics
12
1.24% 0.32% 60%
Duration
delta
Success
rate
Additional
tasks
Ratio of all
launched
speculation
tasks over
all tasks
Speculated
tasks success
rate
Ratio of
duration of all
speculation
tasks over
duration of all
regular tasks
1.65M
2.73M
Fruitful
tasks
Speculated
tasks
Total number
of the
launched
speculation
tasks
Total number
of fruitful
speculation
tasks
● A speculation task is
fruitful if it finishes
earlier than the
corresponding regular
task.
● The conservative values
in the config parameters
leads to high success
rate.
Stage Level Statistics
447K 184K 140K
Total
eligible
stages
Stages with
speculation
tasks
Stages with
fruitful
speculation
tasks
● A stage is eligible for
speculation if its duration > 30
seconds with at least 10 tasks.
● 41% of them launched
speculation tasks
● Among those stages that
launched speculation tasks,
76% of them received
performance benefits.
Stages Fruitful Stages
Speculated stages
Application Level Statistics
157K 59K 51K
Total
applications
Applications
with
speculation
tasks
Applications
with fruitful
speculation
tasks
● 38% of all Spark applications
launched speculation tasks.
● 87% of them benefit from the
speculative execution.
● Overall 32% of all Spark
applications benefit from the
speculation execution.
Applications Fruitful apps
Speculated apps
Case Study
• We analyzed the impact on a mission critical
application.
• It has a total of 29 Spark application flows.
• Some Spark flows run daily. Some run hourly.
• Each flow has a well defined SLA.
• We took measures of all the flows for
• two weeks before enabling speculation, and
• two weeks after enabling speculation.
15
Number in Minutes BEFORE enabling AFTER enabling After/Before ratio
Geometric mean of average
elapsed times of all flows
7.44 6.47 87% (or decreased by
13%)
Geometric mean of standard
deviation of elapsed times for
all flows
2.91 1.71 59%(or decreased by
41%)
Resource Consumption Impact
 
17
Decrease
by 24%
User Guidance: Where speculation can help
• A mapper task is slow because the running executor is too busy and/or
some system hangs due to hardware/software issues.
• We used to see ‘run-away’ tasks sometimes due to some system hang issues.
• After enabling speculation, we rarely see ‘run-away’ tasks.
• The ‘run-away’ tasks were later killed since their corresponding speculation tasks
finished earlier.
• The network route is congested somewhere.
• There exists another data copy.
• The regular task normally will reach the ‘NODE_LOCAL’/’RACK_LOCAL” copy. The
speculation task usually reaches the ‘ANY’ data copy
• If the initial task was launched suboptimally, its speculative task can have better
locality.
18
User Guidance: Where speculation cannot help
• Data skew
• Overload of shuffle services causing reducer task
delays
• Not enough memory causing tasks to spill.
• Spark driver does not know the root cause why a task
is slow when it launches a speculation task.
19
Summary
• At LinkedIn, we further enhanced Spark engine to monitor
speculation statistics.
• We shared our configuration settings to effectively manage
speculative executions.
• Depending on your performance goal, you need to decide how much
overhead you can tolerate.
• ROI if speculation parameters are properly set:
• I: small increase in network messages
• I: small overhead in Spark Driver
• R: good saving in executor resources
• R: good reduction in job elapsed times
• R: significant reduction in the variation of elapsed times, leading to a
more predictable/consistent performance.
Future Work
• Add intelligence to Spark driver to decide whether or
not to launch speculation tasks.
• Distinguish between the manageable/unmanageable causes.
• On the cloud, we may have unlimited resources.
However, we may need to factor in the money cost.
• What is the cost in launching additional executors?
21
Acknowledgement
We want to thank
▪ Eric Baldeschweiler
▪ Sunitha Beeram
▪ LinkedIn Spark Team
for their enlightening discussions
and insightful comments.
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.

More Related Content

What's hot

Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
Databricks
 

What's hot (20)

Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
 
Understanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitUnderstanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And Profit
 
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3GoHigh Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
 
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper Optimization
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical Optimization
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IO
 
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin HuaiA Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
 
Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache Spark
 
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
 
How to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkHow to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache Spark
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
 
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
Getting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on KubernetesGetting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on Kubernetes
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 

Similar to Best Practices for Enabling Speculative Execution on Large Scale Platforms

Managing Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic OptimizingManaging Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic Optimizing
Databricks
 
Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event Processing
Oh Chan Kwon
 
November 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory gridNovember 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory grid
Yahoo Developer Network
 

Similar to Best Practices for Enabling Speculative Execution on Large Scale Platforms (20)

Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
 
Managing Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic OptimizingManaging Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic Optimizing
 
Netflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesNetflix SRE perf meetup_slides
Netflix SRE perf meetup_slides
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
Spark cep
Spark cepSpark cep
Spark cep
 
Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event Processing
 
Performance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data PlatformsPerformance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data Platforms
 
Delight: An Improved Apache Spark UI, Free, and Cross-Platform
Delight: An Improved Apache Spark UI, Free, and Cross-PlatformDelight: An Improved Apache Spark UI, Free, and Cross-Platform
Delight: An Improved Apache Spark UI, Free, and Cross-Platform
 
Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析
 
November 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory gridNovember 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory grid
 
From Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterFrom Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim Hunter
 
Zoltán Zvara - Advanced visualization of Flink and Spark jobs

Zoltán Zvara - Advanced visualization of Flink and Spark jobs
Zoltán Zvara - Advanced visualization of Flink and Spark jobs

Zoltán Zvara - Advanced visualization of Flink and Spark jobs

 
Performance tuning Grails applications
Performance tuning Grails applicationsPerformance tuning Grails applications
Performance tuning Grails applications
 
Apache Spark and Oracle Stream Analytics
Apache Spark and Oracle Stream AnalyticsApache Spark and Oracle Stream Analytics
Apache Spark and Oracle Stream Analytics
 
Deep Dive into Spark
Deep Dive into SparkDeep Dive into Spark
Deep Dive into Spark
 
Software engineering 9 software cost estimation
Software engineering 9 software cost estimationSoftware engineering 9 software cost estimation
Software engineering 9 software cost estimation
 
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
 
Performance tuning Grails applications
 Performance tuning Grails applications Performance tuning Grails applications
Performance tuning Grails applications
 
Benchmarking at Parse
Benchmarking at ParseBenchmarking at Parse
Benchmarking at Parse
 
Advanced Benchmarking at Parse
Advanced Benchmarking at ParseAdvanced Benchmarking at Parse
Advanced Benchmarking at Parse
 

More from Databricks

Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
gajnagarg
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
gajnagarg
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
amitlee9823
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 

Recently uploaded (20)

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 

Best Practices for Enabling Speculative Execution on Large Scale Platforms

  • 1. Best Practices for Enabling Speculative Execution on Large Scale Platforms Ron Hu, Venkata Sowrirajan LinkedIn
  • 2. Agenda ▪ Motivation ▪ Enhancements ▪ Configuration ▪ Metrics and analysis ▪ User guidance ▪ Future work
  • 3. Speculative Execution • A stage consists of many parallel tasks and the stage can run as fast as the slowest task runs. • If one task is running very slowly in a stage, Spark driver will re-launch a speculation task for it on a different host. • Between the regular task and the speculation task, whichever task finishes first is used. The slower task is killed. 3 ❌ ✅ Launch speculative task Time
  • 4. Default Speculation Parameters 4 Configuration Parameters Default value Meaning spark.speculation false If set to "true", performs speculative execution of tasks. spark.speculation.interval 100ms How often Spark will check for tasks to speculate. spark.speculation.multiplier 1.5 How many times slower a task is than the median to be considered for speculation. spark.speculation.quantile 0.75 Fraction of tasks which must be complete before speculation is enabled for a particular stage.
  • 5. Motivation • Speeds up straggler tasks - additional overhead. • Default configs are generally too aggressive in most cases. • Speculating tasks that run for few seconds are mostly wasteful. • Investigate impact caused by data skews, overloaded shuffle services etc with speculation enabled. • What is the impact if we enable speculation by default in a multi-tenant large scale cluster? 5
  • 6. Speculative Execution improvements • Tasks run for few seconds gets speculated wasting resources unnecessarily. • Solution: Prevent tasks getting speculated which run for few seconds • Internally, introduced a new spark configuration (spark.speculation.minRuntimeThreshold) which prevents tasks from getting speculated that runs for less than the min threshold time. • Similar feature later got added to Apache Spark in SPARK-33741 6
  • 7. Speculative execution metrics • Additional metrics are required to understand both the usefulness and overhead introduced by speculative execution. • Existing onTaskEnd and onTaskStart event in AppStatusListener is enriched to produce speculation summary metrics for a stage. 7
  • 8. Speculative execution metrics • Added additional metrics for a stage with speculative execution like: • Number of speculated tasks • Number of successful speculated tasks • Number of killed speculated tasks • Number of failed speculated tasks 8
  • 9. Speculative execution metrics • Speculation summary for a stage with additional metrics using the existing events. 9
  • 10. Updated Speculation Parameter Values • Upstream Spark’s default speculation parameter values are not good for us. • LinkedIn’s Spark jobs are mainly for batch off-line jobs plus some interactive analytics workloads. • We set speculation parameters to these default values for most LinkedIn’s applications. Users can still overwrite per their individual needs. 10 Configuration Parameters Upstream Default LinkedIn Default spark.speculation false true spark.speculation.interval 100ms 1 sec spark.speculation.multiplier 1.5 4.0 spark.speculation.quantile 0.75 0.90 spark.speculation.min.threshold N/A 30 sec
  • 11. Metrics and Analysis • We care about ROI (Return On Investment). • We analyzed • The return or performance gain, and • The investment/overhead or additional cost • We measured various metrics for one week on a large cluster with 10K+ machines. • A multi-tenant environment with 40K+ Spark applications running daily. • Enabled dynamic allocations. • With resource sharing and contention, performance varies due to transient delays/congestions. 11
  • 12. Task Level Statistics 12 1.24% 0.32% 60% Duration delta Success rate Additional tasks Ratio of all launched speculation tasks over all tasks Speculated tasks success rate Ratio of duration of all speculation tasks over duration of all regular tasks 1.65M 2.73M Fruitful tasks Speculated tasks Total number of the launched speculation tasks Total number of fruitful speculation tasks ● A speculation task is fruitful if it finishes earlier than the corresponding regular task. ● The conservative values in the config parameters leads to high success rate.
  • 13. Stage Level Statistics 447K 184K 140K Total eligible stages Stages with speculation tasks Stages with fruitful speculation tasks ● A stage is eligible for speculation if its duration > 30 seconds with at least 10 tasks. ● 41% of them launched speculation tasks ● Among those stages that launched speculation tasks, 76% of them received performance benefits. Stages Fruitful Stages Speculated stages
  • 14. Application Level Statistics 157K 59K 51K Total applications Applications with speculation tasks Applications with fruitful speculation tasks ● 38% of all Spark applications launched speculation tasks. ● 87% of them benefit from the speculative execution. ● Overall 32% of all Spark applications benefit from the speculation execution. Applications Fruitful apps Speculated apps
  • 15. Case Study • We analyzed the impact on a mission critical application. • It has a total of 29 Spark application flows. • Some Spark flows run daily. Some run hourly. • Each flow has a well defined SLA. • We took measures of all the flows for • two weeks before enabling speculation, and • two weeks after enabling speculation. 15
  • 16. Number in Minutes BEFORE enabling AFTER enabling After/Before ratio Geometric mean of average elapsed times of all flows 7.44 6.47 87% (or decreased by 13%) Geometric mean of standard deviation of elapsed times for all flows 2.91 1.71 59%(or decreased by 41%)
  • 18. User Guidance: Where speculation can help • A mapper task is slow because the running executor is too busy and/or some system hangs due to hardware/software issues. • We used to see ‘run-away’ tasks sometimes due to some system hang issues. • After enabling speculation, we rarely see ‘run-away’ tasks. • The ‘run-away’ tasks were later killed since their corresponding speculation tasks finished earlier. • The network route is congested somewhere. • There exists another data copy. • The regular task normally will reach the ‘NODE_LOCAL’/’RACK_LOCAL” copy. The speculation task usually reaches the ‘ANY’ data copy • If the initial task was launched suboptimally, its speculative task can have better locality. 18
  • 19. User Guidance: Where speculation cannot help • Data skew • Overload of shuffle services causing reducer task delays • Not enough memory causing tasks to spill. • Spark driver does not know the root cause why a task is slow when it launches a speculation task. 19
  • 20. Summary • At LinkedIn, we further enhanced Spark engine to monitor speculation statistics. • We shared our configuration settings to effectively manage speculative executions. • Depending on your performance goal, you need to decide how much overhead you can tolerate. • ROI if speculation parameters are properly set: • I: small increase in network messages • I: small overhead in Spark Driver • R: good saving in executor resources • R: good reduction in job elapsed times • R: significant reduction in the variation of elapsed times, leading to a more predictable/consistent performance.
  • 21. Future Work • Add intelligence to Spark driver to decide whether or not to launch speculation tasks. • Distinguish between the manageable/unmanageable causes. • On the cloud, we may have unlimited resources. However, we may need to factor in the money cost. • What is the cost in launching additional executors? 21
  • 22. Acknowledgement We want to thank ▪ Eric Baldeschweiler ▪ Sunitha Beeram ▪ LinkedIn Spark Team for their enlightening discussions and insightful comments.
  • 23. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.