SlideShare une entreprise Scribd logo
1  sur  40
Télécharger pour lire hors ligne
Cost Efficiency Strategies for
Managed Apache Spark Services
Adi Polak
Microsoft
Find me on Social Media – Adi Polak
Twitter @adipolak
Medium – https://medium.com/@adipolak
Dev.to - dev.to/adipolak
LinkedIn - https://www.linkedin.com/in/adi-polak-68548365/
Agenda
▪ Motivation
▪ Tools
▪ Azure Databricks
▪ Cost Optimizations Strategies
▪ Wrap-up
Motivation
Start from the Beginning
Business idea ( Product Manager )
Prioritization Process (R&D)
Design & Build (Software Dev HLD)
HDL
▪ Requirements
▪ Features
▪ Architecture
▪ Test Plans
▪ Security
▪ Deployments
▪ Monitoring /Audit Trails
▪ Maintenance
High Level Design
Yeah, but why should I care about costs ?!
- Understand how Budget works – P&L
- Be able to influence Technical Decisions
- Culture of Financial Accountability
https://www.insightpartners.com/blog/product-leaders-are-rd-costs-part-of-your-strategic-discussions/
Tools
Many, many Services
Apache Spark & Cloud Computing delivery model
IaaS vs. PaaS vs. SaaS
Cloud Pricing Calculator
Azure Pricing Calculator - https://azure.microsoft.com/pricing/calculator/
AWS Pricing Calculator - https://calculator.aws/
GCP Pricing Calculator - https://cloud.google.com/products/calculator
Organize Resources for Cost Awareness
▪ Report and billings - Azure Cost Management
▪ Organize – Resources Groups and/or subscriptions
control, reporting, and attribution of costs
Subscription and Billing models
• Pay as you go
• Enterprise Agreements
• …
Azure Databricks
Where to run Spark Workloads
▪ Small - Mid-size Team
▪ Spark expertise
▪ Optimizations
▪ Machines –VMs
▪ Network
▪ Storage
▪ DBU
Kubernetes/IaaS vs. Azure Databricks ( Managed Spark Service )
▪ Bigger Team
▪ K8s + Spark expertise
▪ Optimizations Experts
▪ Machines – VMs
▪ Network
▪ Storage
Resource Consumed:
Plan Tiers
Premium vs Standard
Performance
Security
Monitoring
Databricks Units
▪ DATA ENGINEERING LIGHT
▪ DATA ENGINEERING
▪ DATA ANALYTICS
Three levels of service, AWS + Azure have the same levels
Databricks Data Engineering Light supports:
▪ scheduled JAR, Python, or spark-submit job
▪ Only.
Databricks Light does NOT support:
▪ Delta Lake
▪ Autopilot features such as autoscaling
▪ Highly concurrent, all-purpose clusters
▪ Notebooks, dashboards, and collaboration features
▪ Connectors to various data sources and BI tools
▪ Databricks Light is a runtime environment for jobs (or “automated
workloads”).
DBU: Standard vs. Premium
DBU Standard Premium
Analytics 0.4 0.55
Engineering 0.15 0.5
Engineering Light 0.07 0.22
https://bit.ly/2Tp5Zkh
Workloads Examples
▪ Scheduled Job - Data Engineer
▪ On Demand Job – triggered - Data Engineer / BI / Analytics
▪ Exploratory – Interactive - BI/ML
VMs and DBUs
Prosenjit Chakraborty - https://medium.com/@cprosenjit/azure-databricks-cost-optimizations-5e1e17b39125
Scenario Breakdown
▪ # VMs = 400
▪ Hours run = 1
▪ Cores in VM = 4
▪ General workload
Cost - Engineering VS. Engineering Light - Standart
$$$ = (#VMs*VMs $/hour + $RuntimeType*#DBUs)*(1-performance factor)
156.6
140.94
125.28
109.62
93.96
78.3
132.6 132.6 132.6 132.6 132.6 132.6
0
20
40
60
80
100
120
140
160
180
1 0.9 0.8 0.7 0.6 0.5
Engineering Engineering Light
1- Performance factor
Cost
#VMs = 400
VMs $/hour = 0.279
#DBUs = #VMs*0.75
$EngineeringLight = 0.07
$Engineering = 0.15
Cost - Engineering VS. Engineering Light - Premium
$$$ = (#VMs*VMs $/hour + $RuntimeType*#DBUs)*(1-performance factor)
#VMs = 400
VMs $/hour = 0.279
#DBUs = #VMs*0.75
$EngineeringLight = 0.22
$Engineering = 0. 5
261.6
235.44
209.28
183.12
156.96
130.8
177.6 177.6 177.6 177.6 177.6 177.6
0
50
100
150
200
250
300
1 0.9 0.8 0.7 0.6 0.5
Premium: Engineering VS. Engineering Light
Engineering Engineering Light
Cost 1- Performance factor
Cost Optimizations strategies
1 - Pre Purchase Plan – 1 & 3 years
2 – Select the right runtime & frameworks
• DeltaLake
▪ PySpark Pandas UDF
▪ Photon Engine
3 – Don’t use tmp/local files system storage
▪ dbutils storage is RA-GRS (read-access geo-
redundant storage) - you might not need this type
of storage!
https://bit.ly/2TdXsAi
Cost Optimizations tips
Manage Spending limit
▪ Per subscription
▪ Per management group
▪ Per resource group
▪ Enable Quota alerts
Enable AutoScale
▪ Scaling machines up and down automatically
https://bit.ly/3dMOROK
VMs
▪ Think about your needs
Thank You!
@adipolak

Contenu connexe

Tendances

Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Azure data factory
Azure data factoryAzure data factory
Azure data factoryBizTalk360
 
DataMinds 2022 Azure Purview Erwin de Kreuk
DataMinds 2022 Azure Purview Erwin de KreukDataMinds 2022 Azure Purview Erwin de Kreuk
DataMinds 2022 Azure Purview Erwin de KreukErwin de Kreuk
 
Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsDATAVERSITY
 
대규모 온프레미스 하둡 마이그레이션을 위한 실행 전략과 최적화 방안 소개-유철민, AWS Data Architect / 박성열,AWS Pr...
대규모 온프레미스 하둡 마이그레이션을 위한 실행 전략과 최적화 방안 소개-유철민, AWS Data Architect / 박성열,AWS Pr...대규모 온프레미스 하둡 마이그레이션을 위한 실행 전략과 최적화 방안 소개-유철민, AWS Data Architect / 박성열,AWS Pr...
대규모 온프레미스 하둡 마이그레이션을 위한 실행 전략과 최적화 방안 소개-유철민, AWS Data Architect / 박성열,AWS Pr...Amazon Web Services Korea
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Cloud Cost Management and Apache Spark with Xuan Wang
Cloud Cost Management and Apache Spark with Xuan WangCloud Cost Management and Apache Spark with Xuan Wang
Cloud Cost Management and Apache Spark with Xuan WangDatabricks
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Michael Rys
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure DatabricksJames Serra
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Altis: AWS Snowflake Practice
Altis: AWS Snowflake PracticeAltis: AWS Snowflake Practice
Altis: AWS Snowflake PracticeAltis Consulting
 
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdf
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdfData & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdf
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdfChris Bingham
 
Azure Databricks - An Introduction (by Kris Bock)
Azure Databricks - An Introduction (by Kris Bock)Azure Databricks - An Introduction (by Kris Bock)
Azure Databricks - An Introduction (by Kris Bock)Daniel Toomey
 
Introduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF LoftIntroduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF LoftAmazon Web Services
 

Tendances (20)

Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Azure data factory
Azure data factoryAzure data factory
Azure data factory
 
DataMinds 2022 Azure Purview Erwin de Kreuk
DataMinds 2022 Azure Purview Erwin de KreukDataMinds 2022 Azure Purview Erwin de Kreuk
DataMinds 2022 Azure Purview Erwin de Kreuk
 
Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced Analytics
 
대규모 온프레미스 하둡 마이그레이션을 위한 실행 전략과 최적화 방안 소개-유철민, AWS Data Architect / 박성열,AWS Pr...
대규모 온프레미스 하둡 마이그레이션을 위한 실행 전략과 최적화 방안 소개-유철민, AWS Data Architect / 박성열,AWS Pr...대규모 온프레미스 하둡 마이그레이션을 위한 실행 전략과 최적화 방안 소개-유철민, AWS Data Architect / 박성열,AWS Pr...
대규모 온프레미스 하둡 마이그레이션을 위한 실행 전략과 최적화 방안 소개-유철민, AWS Data Architect / 박성열,AWS Pr...
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Cloud Cost Management and Apache Spark with Xuan Wang
Cloud Cost Management and Apache Spark with Xuan WangCloud Cost Management and Apache Spark with Xuan Wang
Cloud Cost Management and Apache Spark with Xuan Wang
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
HDInsight for Architects
HDInsight for ArchitectsHDInsight for Architects
HDInsight for Architects
 
BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Altis: AWS Snowflake Practice
Altis: AWS Snowflake PracticeAltis: AWS Snowflake Practice
Altis: AWS Snowflake Practice
 
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdf
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdfData & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdf
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdf
 
Azure Databricks - An Introduction (by Kris Bock)
Azure Databricks - An Introduction (by Kris Bock)Azure Databricks - An Introduction (by Kris Bock)
Azure Databricks - An Introduction (by Kris Bock)
 
Introduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF LoftIntroduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF Loft
 

Similaire à Cost Efficiency Strategies for Managed Apache Spark Service

Sunrun slide for informatica summit - Harish Ramachandraiah
Sunrun slide for informatica summit - Harish RamachandraiahSunrun slide for informatica summit - Harish Ramachandraiah
Sunrun slide for informatica summit - Harish RamachandraiahHarish Ramachandraiah
 
20180701 - 1st Meeting - Data Science Orientation
20180701 - 1st Meeting - Data Science Orientation20180701 - 1st Meeting - Data Science Orientation
20180701 - 1st Meeting - Data Science OrientationDuc Lai Trung Minh
 
Databricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With DataDatabricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With DataDatabricks
 
Neo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael MooreNeo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael MooreNeo4j
 
Why the Microsoft 365 Administrator should care about the Power Platform Gove...
Why the Microsoft 365 Administrator should care about the Power Platform Gove...Why the Microsoft 365 Administrator should care about the Power Platform Gove...
Why the Microsoft 365 Administrator should care about the Power Platform Gove...Sara Barbosa
 
Using Power BI and Azure as analytics engine for business applications
Using Power BI and Azure as analytics engine for business applicationsUsing Power BI and Azure as analytics engine for business applications
Using Power BI and Azure as analytics engine for business applicationsDigital Illustrated
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyNeo4j
 
(PUGDV04) Embedding Powerapps into a Power BI Dashboard.pptx
(PUGDV04) Embedding Powerapps into a Power BI Dashboard.pptx(PUGDV04) Embedding Powerapps into a Power BI Dashboard.pptx
(PUGDV04) Embedding Powerapps into a Power BI Dashboard.pptxaditya555320
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksDatabricks
 
Shop talk - Project Server 2013
Shop talk - Project Server 2013Shop talk - Project Server 2013
Shop talk - Project Server 2013Chris Givens
 
The Microsoft Well Architected Framework For Data Analytics
The Microsoft Well Architected Framework For Data AnalyticsThe Microsoft Well Architected Framework For Data Analytics
The Microsoft Well Architected Framework For Data AnalyticsStephanie Locke
 
Solution Design & Architecture.pptx
Solution Design & Architecture.pptxSolution Design & Architecture.pptx
Solution Design & Architecture.pptxNikhileshSathyavarap
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimSpark Summit
 
Streamlining Workflows: Unleashing Automation with Azure and Power Automate
Streamlining Workflows: Unleashing Automation with Azure and Power AutomateStreamlining Workflows: Unleashing Automation with Azure and Power Automate
Streamlining Workflows: Unleashing Automation with Azure and Power AutomateHamida Rebai Trabelsi
 
Measure and increase developer productivity with help of Severless by Kazulki...
Measure and increase developer productivity with help of Severless by Kazulki...Measure and increase developer productivity with help of Severless by Kazulki...
Measure and increase developer productivity with help of Severless by Kazulki...Vadym Kazulkin
 
Microsoft Flow advanced: tips, pitfalls, problems and warnings to be known be...
Microsoft Flow advanced: tips, pitfalls, problems and warnings to be known be...Microsoft Flow advanced: tips, pitfalls, problems and warnings to be known be...
Microsoft Flow advanced: tips, pitfalls, problems and warnings to be known be...BIWUG
 
Microsoft for BI and DW: Using the Right Tool for the Job
Microsoft for BI and DW: Using the Right Tool for the JobMicrosoft for BI and DW: Using the Right Tool for the Job
Microsoft for BI and DW: Using the Right Tool for the JobSenturus
 
How to Build TOGAF Architectures With System Architect (2).ppt
How to Build TOGAF Architectures With System Architect (2).pptHow to Build TOGAF Architectures With System Architect (2).ppt
How to Build TOGAF Architectures With System Architect (2).pptStevenShing
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy Neo4j
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyNeo4j
 

Similaire à Cost Efficiency Strategies for Managed Apache Spark Service (20)

Sunrun slide for informatica summit - Harish Ramachandraiah
Sunrun slide for informatica summit - Harish RamachandraiahSunrun slide for informatica summit - Harish Ramachandraiah
Sunrun slide for informatica summit - Harish Ramachandraiah
 
20180701 - 1st Meeting - Data Science Orientation
20180701 - 1st Meeting - Data Science Orientation20180701 - 1st Meeting - Data Science Orientation
20180701 - 1st Meeting - Data Science Orientation
 
Databricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With DataDatabricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With Data
 
Neo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael MooreNeo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael Moore
 
Why the Microsoft 365 Administrator should care about the Power Platform Gove...
Why the Microsoft 365 Administrator should care about the Power Platform Gove...Why the Microsoft 365 Administrator should care about the Power Platform Gove...
Why the Microsoft 365 Administrator should care about the Power Platform Gove...
 
Using Power BI and Azure as analytics engine for business applications
Using Power BI and Azure as analytics engine for business applicationsUsing Power BI and Azure as analytics engine for business applications
Using Power BI and Azure as analytics engine for business applications
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
(PUGDV04) Embedding Powerapps into a Power BI Dashboard.pptx
(PUGDV04) Embedding Powerapps into a Power BI Dashboard.pptx(PUGDV04) Embedding Powerapps into a Power BI Dashboard.pptx
(PUGDV04) Embedding Powerapps into a Power BI Dashboard.pptx
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
 
Shop talk - Project Server 2013
Shop talk - Project Server 2013Shop talk - Project Server 2013
Shop talk - Project Server 2013
 
The Microsoft Well Architected Framework For Data Analytics
The Microsoft Well Architected Framework For Data AnalyticsThe Microsoft Well Architected Framework For Data Analytics
The Microsoft Well Architected Framework For Data Analytics
 
Solution Design & Architecture.pptx
Solution Design & Architecture.pptxSolution Design & Architecture.pptx
Solution Design & Architecture.pptx
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Streamlining Workflows: Unleashing Automation with Azure and Power Automate
Streamlining Workflows: Unleashing Automation with Azure and Power AutomateStreamlining Workflows: Unleashing Automation with Azure and Power Automate
Streamlining Workflows: Unleashing Automation with Azure and Power Automate
 
Measure and increase developer productivity with help of Severless by Kazulki...
Measure and increase developer productivity with help of Severless by Kazulki...Measure and increase developer productivity with help of Severless by Kazulki...
Measure and increase developer productivity with help of Severless by Kazulki...
 
Microsoft Flow advanced: tips, pitfalls, problems and warnings to be known be...
Microsoft Flow advanced: tips, pitfalls, problems and warnings to be known be...Microsoft Flow advanced: tips, pitfalls, problems and warnings to be known be...
Microsoft Flow advanced: tips, pitfalls, problems and warnings to be known be...
 
Microsoft for BI and DW: Using the Right Tool for the Job
Microsoft for BI and DW: Using the Right Tool for the JobMicrosoft for BI and DW: Using the Right Tool for the Job
Microsoft for BI and DW: Using the Right Tool for the Job
 
How to Build TOGAF Architectures With System Architect (2).ppt
How to Build TOGAF Architectures With System Architect (2).pptHow to Build TOGAF Architectures With System Architect (2).ppt
How to Build TOGAF Architectures With System Architect (2).ppt
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 

Plus de Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionDatabricks
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityDatabricks
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueDatabricks
 

Plus de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and Quality
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
 

Dernier

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...gajnagarg
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numberssuginr1
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...HyderabadDolls
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...HyderabadDolls
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfSayantanBiswas37
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...gragchanchal546
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themeitharjee
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 

Dernier (20)

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 

Cost Efficiency Strategies for Managed Apache Spark Service

  • 1. Cost Efficiency Strategies for Managed Apache Spark Services Adi Polak Microsoft
  • 2. Find me on Social Media – Adi Polak Twitter @adipolak Medium – https://medium.com/@adipolak Dev.to - dev.to/adipolak LinkedIn - https://www.linkedin.com/in/adi-polak-68548365/
  • 3. Agenda ▪ Motivation ▪ Tools ▪ Azure Databricks ▪ Cost Optimizations Strategies ▪ Wrap-up
  • 5. Start from the Beginning Business idea ( Product Manager ) Prioritization Process (R&D) Design & Build (Software Dev HLD)
  • 6. HDL ▪ Requirements ▪ Features ▪ Architecture ▪ Test Plans ▪ Security ▪ Deployments ▪ Monitoring /Audit Trails ▪ Maintenance High Level Design
  • 7. Yeah, but why should I care about costs ?! - Understand how Budget works – P&L - Be able to influence Technical Decisions - Culture of Financial Accountability
  • 11. Apache Spark & Cloud Computing delivery model IaaS vs. PaaS vs. SaaS
  • 12. Cloud Pricing Calculator Azure Pricing Calculator - https://azure.microsoft.com/pricing/calculator/ AWS Pricing Calculator - https://calculator.aws/ GCP Pricing Calculator - https://cloud.google.com/products/calculator
  • 13. Organize Resources for Cost Awareness ▪ Report and billings - Azure Cost Management ▪ Organize – Resources Groups and/or subscriptions control, reporting, and attribution of costs
  • 14.
  • 15. Subscription and Billing models • Pay as you go • Enterprise Agreements • …
  • 17. Where to run Spark Workloads ▪ Small - Mid-size Team ▪ Spark expertise ▪ Optimizations ▪ Machines –VMs ▪ Network ▪ Storage ▪ DBU Kubernetes/IaaS vs. Azure Databricks ( Managed Spark Service ) ▪ Bigger Team ▪ K8s + Spark expertise ▪ Optimizations Experts ▪ Machines – VMs ▪ Network ▪ Storage
  • 20.
  • 21.
  • 23. Databricks Units ▪ DATA ENGINEERING LIGHT ▪ DATA ENGINEERING ▪ DATA ANALYTICS Three levels of service, AWS + Azure have the same levels
  • 24. Databricks Data Engineering Light supports: ▪ scheduled JAR, Python, or spark-submit job ▪ Only.
  • 25. Databricks Light does NOT support: ▪ Delta Lake ▪ Autopilot features such as autoscaling ▪ Highly concurrent, all-purpose clusters ▪ Notebooks, dashboards, and collaboration features ▪ Connectors to various data sources and BI tools ▪ Databricks Light is a runtime environment for jobs (or “automated workloads”).
  • 26. DBU: Standard vs. Premium DBU Standard Premium Analytics 0.4 0.55 Engineering 0.15 0.5 Engineering Light 0.07 0.22 https://bit.ly/2Tp5Zkh
  • 27. Workloads Examples ▪ Scheduled Job - Data Engineer ▪ On Demand Job – triggered - Data Engineer / BI / Analytics ▪ Exploratory – Interactive - BI/ML
  • 28. VMs and DBUs Prosenjit Chakraborty - https://medium.com/@cprosenjit/azure-databricks-cost-optimizations-5e1e17b39125
  • 29. Scenario Breakdown ▪ # VMs = 400 ▪ Hours run = 1 ▪ Cores in VM = 4 ▪ General workload
  • 30. Cost - Engineering VS. Engineering Light - Standart $$$ = (#VMs*VMs $/hour + $RuntimeType*#DBUs)*(1-performance factor) 156.6 140.94 125.28 109.62 93.96 78.3 132.6 132.6 132.6 132.6 132.6 132.6 0 20 40 60 80 100 120 140 160 180 1 0.9 0.8 0.7 0.6 0.5 Engineering Engineering Light 1- Performance factor Cost #VMs = 400 VMs $/hour = 0.279 #DBUs = #VMs*0.75 $EngineeringLight = 0.07 $Engineering = 0.15
  • 31. Cost - Engineering VS. Engineering Light - Premium $$$ = (#VMs*VMs $/hour + $RuntimeType*#DBUs)*(1-performance factor) #VMs = 400 VMs $/hour = 0.279 #DBUs = #VMs*0.75 $EngineeringLight = 0.22 $Engineering = 0. 5 261.6 235.44 209.28 183.12 156.96 130.8 177.6 177.6 177.6 177.6 177.6 177.6 0 50 100 150 200 250 300 1 0.9 0.8 0.7 0.6 0.5 Premium: Engineering VS. Engineering Light Engineering Engineering Light Cost 1- Performance factor
  • 33. 1 - Pre Purchase Plan – 1 & 3 years
  • 34. 2 – Select the right runtime & frameworks • DeltaLake ▪ PySpark Pandas UDF ▪ Photon Engine
  • 35. 3 – Don’t use tmp/local files system storage ▪ dbutils storage is RA-GRS (read-access geo- redundant storage) - you might not need this type of storage! https://bit.ly/2TdXsAi
  • 37. Manage Spending limit ▪ Per subscription ▪ Per management group ▪ Per resource group ▪ Enable Quota alerts
  • 38. Enable AutoScale ▪ Scaling machines up and down automatically https://bit.ly/3dMOROK
  • 39. VMs ▪ Think about your needs