SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
Hybrid Spark
Architecture with
Yarn and Kubernetes
Catalin Toda (Sr Engineer @ Lyft)
Rohit Menon (Staff Engineer @ Lyft)
Agenda
▪ Spark @ Lyft
▪ Challenges with K8s
▪ Hybrid Model
▪ Spark Operator
▪ Image Hierarchy
▪ Spark Wrapper
▪ Progress & Future Plans
What is Spark Used For @ Lyft
• Primarily Python Shop with some Scala
• Running in AWS with S3 as permanent storage
• Interactive Development & ML with Jupyter and Spark
• ML Batch Use Cases
• Pricing
• ETA/Routing
• Mapping
• ETL Use Cases
• Event Ingestion
• GraphQL Offline Model
• Financial Datasets / SOX Complaint Datasets / Experimentation Offline Batch
Analysis and many more
• HiveQL to Spark-SQL Migration
2018 Spark on Yarn
• Every major use case had its own ephemeral yarn
cluster
• Management overhead for infra team
• Custom Dependency management per cluster to pull
in python dependencies
• Tough to test/maintain cluster bootstrap scripts
• Custom IAM role/permission overhead
2019 Spark On Kubernetes
• Lyft Infra supported Kubernetes deployment
• Google OSS spark-on-k8s operator availability
• Flyte (Container native scheduler) took off in Lyft for
ML use case
• Containerized workloads with easier python
dependency management
• Simpler support for per job IAM roles
Spark On Kubernetes Architecture
2020 Spark on Kubernetes
• Maturing support for Spark on K8s
• Lyft Hadoop/Hive infrastructure as K8s deployment
• Auto-scaling handled for YARN cluster based on RM
load
• Spark ETL workloads move over from Yarn to Spark
on k8s
• Start hitting limits with Lyft k8s Infra setup
• Custom solutions required to support growing scale
• Group jobs to reduce spiky requests to k8s and AWS control plane
• Add new k8s clusters to support stronger isolation model
Current Spark Scale on K8s
- 1000 concurrent jobs
- 5 Kubernetes Clusters
- 20k executors at peak
- 1 AWS region
- 5 AWS Availability Zones
Challenges with K8s Model
• IPv4 Address Shortage
• Shortage across all 5 AZs
• Leads to driver and executors
startup delays
- IAM Wait Delays
- AWS IAM assignment could be
throttled
- IAM wait to assure assignment
increasing delays
- Infrastructure issues
- Etcd size tuning
- Impact of bad k8s node
Challenges with K8s Model
• Image Overheads
• Every project has their own image
• Registration of images for different
environments
• Startup delays caused by uncached image
• New nodes
• New image releases
• Release model
• Infra prepares a base image with Spark
latest changes
• Customers manage final release when the
image is tested
• Leads to maximum 1 month rollout time due
to several images to be updated
Challenges with K8s Model
- K8s scheduling
• Fixed per namespace quota
• Containers not admitted if over quota
• No priority between jobs in namespaces
- Control plane limits
- Maximum number of containers
- Short running containers are not typical K8s
workload
• Hive Deprecation
• Today hive scale = 5k jobs at peak
• Expected spark load to increase by 6x
• For interactive workload pod startup time
latency is high on k8s
2021 Hybrid Model (YARN + K8s)
Separation by Workload Type
• Containerized (K8s)
• Pyspark + Custom Dependencies
• ML interactive
• Non-containerized (YARN)
• SQL Interactive
• SQL Batch
• Scala Workloads
• Simple Pyspark with no dependencies
Hybrid Model Architecture
Advantages of Hybrid Model
• YARN executors have low startup latency and can
handle spikes
• Easier Queue and Resource management
• Workloads without custom dependencies do not get
penalized with k8s infra overheads
• Mature support for dynamic allocation and external
shuffle service
Single Entry Point
• Spark-on-k8s-operator as single entry point for both YARN
and K8s
• Integrates easily open source
• Compatibility with Flyte
• No Lyft specific code
• Multi version branch by default
• Driver runs on k8s in client mode
• We plan to contribute this to OSS
• Current OSS design add overhead of spark submit pod
• Sets us infra team to move workloads seamlessly between
resource managers
Spark Wrapper Design
Stage 1
- Part of base image
- Downloads and runs stage2
Stage 2
- Manipulate configs
- Run spark driver
- Capture job logs and results
- Push application metrics
Spark Wrapper
• Custom image entry point
• Allows config management based on environments
• Allows switching between resource managers
• Metrics
• Push to events - queryable/dashboard using lyft stack
• Push to statsd for real time monitoring/alerting
• Integrates well with Lyft Infra
• Spark on k8s operator remains in sync with upstream
• Lyft specific logic that integrates with in-house tools
• Adds runtime controller to images
Image Hierarchy/Distribution
• One base image for spark per spark version
• Containerized spark extends base image
• Users can further extend containerized base image to add custom
dependencies
• ML base image
• Users maintain their own image
• Non-containerized use the base image directly
• Infra updates the image
• Consistent experience across use cases
Progress so far - Best of Both Worlds
- Spark driver startup < 1s
- Resource allocation managed in YARN
- K8s scale reduced by 20x
- IP addresses requirement reduced by 20x
- Per job IAM Roles using Web Identity provider
Progress so far - Best of Both Worlds
- No migration needed for containerized customers
- Python dependency management using an utility
library
- The latest version is synced in all environments
(adhoc, k8s, YARN)
Future Plans
- Consolidate Batch Compute on Spark (Hive -> Spark)
- Evaluate Data Lake technologies
- Continue to scale k8s and Spark Infrastructure
Conclusion
- YARN vs K8s
- Workload analysis is required before identifying the best solution
- For Lyft - existing YARN infrastructure helped choosing a hybrid model
- Fixing K8s model requires:
- K8s supports to IPv6 in the latest versions
- Scaling - Number of k8s clusters and a gateway to perform the routing between them
- Image - Design considerations/overheads with high number of images
- Quota - Investing in projects trying to solve this aspect
- Web Identity Provider - Custom Roles in K8s
Q & A
Contact Info:
ctoda@lyft.com
rmenon@lyft.com

Contenu connexe

Tendances

Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
Dvir Volk
 
Reliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on KubernetesReliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on Kubernetes
Databricks
 

Tendances (20)

Presto: Distributed sql query engine
Presto: Distributed sql query engine Presto: Distributed sql query engine
Presto: Distributed sql query engine
 
Kubernetes security
Kubernetes securityKubernetes security
Kubernetes security
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
Introduction to Kubernetes Workshop
Introduction to Kubernetes WorkshopIntroduction to Kubernetes Workshop
Introduction to Kubernetes Workshop
 
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
 
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and ManageEnd to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®
 
Reliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on KubernetesReliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on Kubernetes
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
 
Introduction to apache spark
Introduction to apache spark Introduction to apache spark
Introduction to apache spark
 
iceberg introduction.pptx
iceberg introduction.pptxiceberg introduction.pptx
iceberg introduction.pptx
 
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimHDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
 
Unique ID generation in distributed systems
Unique ID generation in distributed systemsUnique ID generation in distributed systems
Unique ID generation in distributed systems
 

Similaire à Hybrid Apache Spark Architecture with YARN and Kubernetes

SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftSF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
Chester Chen
 
Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...
DataWorks Summit
 
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Spark Summit
 
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3
Chester Chen
 

Similaire à Hybrid Apache Spark Architecture with YARN and Kubernetes (20)

Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenApache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
 
[Spark Summit 2017 NA] Apache Spark on Kubernetes
[Spark Summit 2017 NA] Apache Spark on Kubernetes[Spark Summit 2017 NA] Apache Spark on Kubernetes
[Spark Summit 2017 NA] Apache Spark on Kubernetes
 
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftSF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
 
Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...
 
GraphQL API on a Serverless Environment
GraphQL API on a Serverless EnvironmentGraphQL API on a Serverless Environment
GraphQL API on a Serverless Environment
 
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
 
What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3
 
Migrating a build farm from on-prem to AWS
Migrating a build farm from on-prem to AWSMigrating a build farm from on-prem to AWS
Migrating a build farm from on-prem to AWS
 
Scaling spark on kubernetes at Lyft
Scaling spark on kubernetes at LyftScaling spark on kubernetes at Lyft
Scaling spark on kubernetes at Lyft
 
Serverless spark
Serverless sparkServerless spark
Serverless spark
 
Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析
 
Big data and Kubernetes
Big data and KubernetesBig data and Kubernetes
Big data and Kubernetes
 
Monitoring MySQL at scale
Monitoring MySQL at scaleMonitoring MySQL at scale
Monitoring MySQL at scale
 
Apache Geode Meetup, Cork, Ireland at CIT
Apache Geode Meetup, Cork, Ireland at CITApache Geode Meetup, Cork, Ireland at CIT
Apache Geode Meetup, Cork, Ireland at CIT
 
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3
 
Microservices & Container Networking - OSN Days, Singapore
Microservices & Container Networking - OSN Days, SingaporeMicroservices & Container Networking - OSN Days, Singapore
Microservices & Container Networking - OSN Days, Singapore
 
Hybrid cloud openstack meetup
Hybrid cloud openstack meetupHybrid cloud openstack meetup
Hybrid cloud openstack meetup
 
Introduction to Apache Geode (Cork, Ireland)
Introduction to Apache Geode (Cork, Ireland)Introduction to Apache Geode (Cork, Ireland)
Introduction to Apache Geode (Cork, Ireland)
 
Apache spark 2.4 and beyond
Apache spark 2.4 and beyondApache spark 2.4 and beyond
Apache spark 2.4 and beyond
 
One Kubernetes to rule them all (ZEUS 2019 Keynote)
One Kubernetes to rule them all (ZEUS 2019 Keynote)One Kubernetes to rule them all (ZEUS 2019 Keynote)
One Kubernetes to rule them all (ZEUS 2019 Keynote)
 

Plus de Databricks

Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 

Plus de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 

Dernier

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Lars Albertsson
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 

Dernier (20)

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 

Hybrid Apache Spark Architecture with YARN and Kubernetes

  • 1. Hybrid Spark Architecture with Yarn and Kubernetes Catalin Toda (Sr Engineer @ Lyft) Rohit Menon (Staff Engineer @ Lyft)
  • 2. Agenda ▪ Spark @ Lyft ▪ Challenges with K8s ▪ Hybrid Model ▪ Spark Operator ▪ Image Hierarchy ▪ Spark Wrapper ▪ Progress & Future Plans
  • 3. What is Spark Used For @ Lyft • Primarily Python Shop with some Scala • Running in AWS with S3 as permanent storage • Interactive Development & ML with Jupyter and Spark • ML Batch Use Cases • Pricing • ETA/Routing • Mapping • ETL Use Cases • Event Ingestion • GraphQL Offline Model • Financial Datasets / SOX Complaint Datasets / Experimentation Offline Batch Analysis and many more • HiveQL to Spark-SQL Migration
  • 4. 2018 Spark on Yarn • Every major use case had its own ephemeral yarn cluster • Management overhead for infra team • Custom Dependency management per cluster to pull in python dependencies • Tough to test/maintain cluster bootstrap scripts • Custom IAM role/permission overhead
  • 5. 2019 Spark On Kubernetes • Lyft Infra supported Kubernetes deployment • Google OSS spark-on-k8s operator availability • Flyte (Container native scheduler) took off in Lyft for ML use case • Containerized workloads with easier python dependency management • Simpler support for per job IAM roles
  • 6. Spark On Kubernetes Architecture
  • 7. 2020 Spark on Kubernetes • Maturing support for Spark on K8s • Lyft Hadoop/Hive infrastructure as K8s deployment • Auto-scaling handled for YARN cluster based on RM load • Spark ETL workloads move over from Yarn to Spark on k8s • Start hitting limits with Lyft k8s Infra setup • Custom solutions required to support growing scale • Group jobs to reduce spiky requests to k8s and AWS control plane • Add new k8s clusters to support stronger isolation model
  • 8. Current Spark Scale on K8s - 1000 concurrent jobs - 5 Kubernetes Clusters - 20k executors at peak - 1 AWS region - 5 AWS Availability Zones
  • 9. Challenges with K8s Model • IPv4 Address Shortage • Shortage across all 5 AZs • Leads to driver and executors startup delays - IAM Wait Delays - AWS IAM assignment could be throttled - IAM wait to assure assignment increasing delays - Infrastructure issues - Etcd size tuning - Impact of bad k8s node
  • 10. Challenges with K8s Model • Image Overheads • Every project has their own image • Registration of images for different environments • Startup delays caused by uncached image • New nodes • New image releases • Release model • Infra prepares a base image with Spark latest changes • Customers manage final release when the image is tested • Leads to maximum 1 month rollout time due to several images to be updated
  • 11. Challenges with K8s Model - K8s scheduling • Fixed per namespace quota • Containers not admitted if over quota • No priority between jobs in namespaces - Control plane limits - Maximum number of containers - Short running containers are not typical K8s workload • Hive Deprecation • Today hive scale = 5k jobs at peak • Expected spark load to increase by 6x • For interactive workload pod startup time latency is high on k8s
  • 12. 2021 Hybrid Model (YARN + K8s) Separation by Workload Type • Containerized (K8s) • Pyspark + Custom Dependencies • ML interactive • Non-containerized (YARN) • SQL Interactive • SQL Batch • Scala Workloads • Simple Pyspark with no dependencies
  • 14. Advantages of Hybrid Model • YARN executors have low startup latency and can handle spikes • Easier Queue and Resource management • Workloads without custom dependencies do not get penalized with k8s infra overheads • Mature support for dynamic allocation and external shuffle service
  • 15. Single Entry Point • Spark-on-k8s-operator as single entry point for both YARN and K8s • Integrates easily open source • Compatibility with Flyte • No Lyft specific code • Multi version branch by default • Driver runs on k8s in client mode • We plan to contribute this to OSS • Current OSS design add overhead of spark submit pod • Sets us infra team to move workloads seamlessly between resource managers
  • 16. Spark Wrapper Design Stage 1 - Part of base image - Downloads and runs stage2 Stage 2 - Manipulate configs - Run spark driver - Capture job logs and results - Push application metrics
  • 17. Spark Wrapper • Custom image entry point • Allows config management based on environments • Allows switching between resource managers • Metrics • Push to events - queryable/dashboard using lyft stack • Push to statsd for real time monitoring/alerting • Integrates well with Lyft Infra • Spark on k8s operator remains in sync with upstream • Lyft specific logic that integrates with in-house tools • Adds runtime controller to images
  • 18. Image Hierarchy/Distribution • One base image for spark per spark version • Containerized spark extends base image • Users can further extend containerized base image to add custom dependencies • ML base image • Users maintain their own image • Non-containerized use the base image directly • Infra updates the image • Consistent experience across use cases
  • 19. Progress so far - Best of Both Worlds - Spark driver startup < 1s - Resource allocation managed in YARN - K8s scale reduced by 20x - IP addresses requirement reduced by 20x - Per job IAM Roles using Web Identity provider
  • 20. Progress so far - Best of Both Worlds - No migration needed for containerized customers - Python dependency management using an utility library - The latest version is synced in all environments (adhoc, k8s, YARN)
  • 21. Future Plans - Consolidate Batch Compute on Spark (Hive -> Spark) - Evaluate Data Lake technologies - Continue to scale k8s and Spark Infrastructure
  • 22. Conclusion - YARN vs K8s - Workload analysis is required before identifying the best solution - For Lyft - existing YARN infrastructure helped choosing a hybrid model - Fixing K8s model requires: - K8s supports to IPv6 in the latest versions - Scaling - Number of k8s clusters and a gateway to perform the routing between them - Image - Design considerations/overheads with high number of images - Quota - Investing in projects trying to solve this aspect - Web Identity Provider - Custom Roles in K8s
  • 23. Q & A Contact Info: ctoda@lyft.com rmenon@lyft.com