SlideShare une entreprise Scribd logo
1  sur  19
Télécharger pour lire hors ligne
How R Developers Can Build and
Share Data and AI Applications
that Scale with Databricks and
Rstudio Connect
James Blair, Solutions Engineer, RStudio PBC
Rafi Kurlansik, Sr. Solutions Architect, Databricks
Agenda
Rafi Kurlansik, Databricks
Building Scalable R and Shiny apps with RStudio
and Databricks
James Blair, RStudio PBC
Deploying Scalable Shiny apps with RStudio
Connect and Databricks
Benchmarking performance of Shiny connections
to Spark
How to scale R and Shiny
with RStudio and Databricks
How can we open up the data lake to R users?
▪ Typical development patterns
▪ Local
▪ Cloud / On Prem VM
▪ Challenges with big data
▪ Server memory - can only process so much data in the app itself before crashing R
▪ Performance - even on a powerful VM, eventually see our app get less responsive as we reach 100+ GBs
▪ Managing big data infrastructure - app value must be higher to justify the energy investment
If only there was a technology with a familiar API in R that let our app scale to process 100s of GBs...
Imagine trying to do so with traditional R development...
Scale R Apps with Databricks and RStudio
▪ Development Patterns
▪ Hosted RStudio Server (Pro) on Databricks Cluster
▪ RStudio with remote Spark access using Databricks Connect
▪ Overcoming challenges with big data
▪ Auto-scaling Databricks Spark Clusters - dynamically respond to accommodate larger data processing tasks
▪ Consistently fast performance with Delta Lake and Databricks Runtime
▪ Managed service allows data teams to focus on building data products, not maintaining infrastructure
Databricks Spark, RStudio IDE
Hosted RStudio Server Pro on Databricks
RStudio with Databricks Connect
Local RStudio, Remote Spark
Sharing Scalable Shiny Apps
The Data Science Process
Shiny and Spark: A cautionary tale
ODBC to the Rescue
- The R + ODBC toolchain is robust and stable
- As performant as a native Spark connection
- Easy to migrate code from sparklyr to ODBC
- Spark still does all of the computation
- Databricks provides an optimized Spark ODBC driver
ODBC with RStudio Connect
ODBC Performance
Comparing sparklyr against two versions of the Databricks ODBC/JDBC Driver
Collecting Joins
Sparklyr to ODBC
▪ Interactive data analysis with SparkSQL
▪ sparklyr
▪ ODBC
▪ Other Spark APIs
▪ sparklyr
▪ Interactive data analysis with SparkSQL
▪ Shiny with ODBC
▪ Other Spark APIs
▪ ¯_(ツ)_/¯
▪ Deploy models with MLflow?
▪ Submit individual commands with Databricks
REST API 1.2?
Run sparklyr jobs from RStudio on Databricks
with bricksteR?
Stay tuned….
Deploy at scaleDevelop at scale
Conclusion
Additional Resources
▪ Hosted RStudio on Databricks
▪ Databricks Connect
▪ ODBC
▪ ODBC Configuration
▪ RStudio Connect
▪ Sparklyr
▪ blairj09-talks/spark-summit-2020
▪ RafiKurlansik/bricksteR
▪ delta-io/delta
▪ sparklyr/sparklyr
Related ReposDocumentation
Feedback
Your feedback is important to us.
Don’t forget to rate and
review the sessions.
How R Developers Can Build and Share Data and AI Applications that Scale with Databricks and RStudio Connect

Contenu connexe

Tendances

Introdução ao Google Cloud Platform: Computação em Nuvem do Google
Introdução ao Google Cloud Platform: Computação em Nuvem do GoogleIntrodução ao Google Cloud Platform: Computação em Nuvem do Google
Introdução ao Google Cloud Platform: Computação em Nuvem do GoogleJose Papo, MSc
 
Technical Best Practices for Veritas and Microsoft Azure Using a Detailed Ref...
Technical Best Practices for Veritas and Microsoft Azure Using a Detailed Ref...Technical Best Practices for Veritas and Microsoft Azure Using a Detailed Ref...
Technical Best Practices for Veritas and Microsoft Azure Using a Detailed Ref...Veritas Technologies LLC
 
Introduction to Azure Blueprints
Introduction to Azure BlueprintsIntroduction to Azure Blueprints
Introduction to Azure BlueprintsCheah Eng Soon
 
DBaaS - The Next generation of database infrastructure
DBaaS - The Next generation of database infrastructureDBaaS - The Next generation of database infrastructure
DBaaS - The Next generation of database infrastructureEmiliano Fusaglia
 
Asp interview Question and Answer
Asp interview Question and Answer Asp interview Question and Answer
Asp interview Question and Answer home
 
Azure cosmos db
Azure cosmos dbAzure cosmos db
Azure cosmos dbBill Liu
 
Identity and Access Management from Microsoft and Razor Technology
Identity and Access Management from Microsoft and Razor TechnologyIdentity and Access Management from Microsoft and Razor Technology
Identity and Access Management from Microsoft and Razor TechnologyDavid J Rosenthal
 
ADF Demo_ppt.pptx
ADF Demo_ppt.pptxADF Demo_ppt.pptx
ADF Demo_ppt.pptxvamsytaurus
 

Tendances (13)

Microservices
MicroservicesMicroservices
Microservices
 
Introdução ao Google Cloud Platform: Computação em Nuvem do Google
Introdução ao Google Cloud Platform: Computação em Nuvem do GoogleIntrodução ao Google Cloud Platform: Computação em Nuvem do Google
Introdução ao Google Cloud Platform: Computação em Nuvem do Google
 
Technical Best Practices for Veritas and Microsoft Azure Using a Detailed Ref...
Technical Best Practices for Veritas and Microsoft Azure Using a Detailed Ref...Technical Best Practices for Veritas and Microsoft Azure Using a Detailed Ref...
Technical Best Practices for Veritas and Microsoft Azure Using a Detailed Ref...
 
Ransomware: Estratégias de Mitigación
Ransomware: Estratégias de MitigaciónRansomware: Estratégias de Mitigación
Ransomware: Estratégias de Mitigación
 
Azure 900
Azure 900Azure 900
Azure 900
 
Introduction to Azure Blueprints
Introduction to Azure BlueprintsIntroduction to Azure Blueprints
Introduction to Azure Blueprints
 
DDS QoS Unleashed
DDS QoS UnleashedDDS QoS Unleashed
DDS QoS Unleashed
 
DBaaS - The Next generation of database infrastructure
DBaaS - The Next generation of database infrastructureDBaaS - The Next generation of database infrastructure
DBaaS - The Next generation of database infrastructure
 
Guia de migración Azure
Guia de migración AzureGuia de migración Azure
Guia de migración Azure
 
Asp interview Question and Answer
Asp interview Question and Answer Asp interview Question and Answer
Asp interview Question and Answer
 
Azure cosmos db
Azure cosmos dbAzure cosmos db
Azure cosmos db
 
Identity and Access Management from Microsoft and Razor Technology
Identity and Access Management from Microsoft and Razor TechnologyIdentity and Access Management from Microsoft and Razor Technology
Identity and Access Management from Microsoft and Razor Technology
 
ADF Demo_ppt.pptx
ADF Demo_ppt.pptxADF Demo_ppt.pptx
ADF Demo_ppt.pptx
 

Similaire à How R Developers Can Build and Share Data and AI Applications that Scale with Databricks and RStudio Connect

Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksDatabricks
 
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...Databricks
 
Introduction to Spark 2.0 Dataset API
Introduction to Spark 2.0 Dataset APIIntroduction to Spark 2.0 Dataset API
Introduction to Spark 2.0 Dataset APIdatamantra
 
Rajeev kumar apache_spark & scala developer
Rajeev kumar apache_spark & scala developerRajeev kumar apache_spark & scala developer
Rajeev kumar apache_spark & scala developerRajeev Kumar
 
Simplifying AI integration on Apache Spark
Simplifying AI integration on Apache SparkSimplifying AI integration on Apache Spark
Simplifying AI integration on Apache SparkDatabricks
 
A Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
A Tale of 2 BI Standards: One for Data Warehouses and One for Data LakesA Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
A Tale of 2 BI Standards: One for Data Warehouses and One for Data LakesArcadia Data
 
NuoDB 3.0: Getting Started with Community Edition
NuoDB 3.0: Getting Started with Community EditionNuoDB 3.0: Getting Started with Community Edition
NuoDB 3.0: Getting Started with Community EditionNuoDB
 
BDTC2015 databricks-辛湜-state of spark
BDTC2015 databricks-辛湜-state of sparkBDTC2015 databricks-辛湜-state of spark
BDTC2015 databricks-辛湜-state of sparkJerry Wen
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 
Elevate MongoDB with ODBC/JDBC
Elevate MongoDB with ODBC/JDBCElevate MongoDB with ODBC/JDBC
Elevate MongoDB with ODBC/JDBCMongoDB
 
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKESBig Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKESMatt Stubbs
 
Beyond Relational
Beyond RelationalBeyond Relational
Beyond RelationalLynn Langit
 
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache CassandraApache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache CassandraAnant Corporation
 
A Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
A Tale of 2 BI Standards: One for Data Warehouses and One for Data LakesA Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
A Tale of 2 BI Standards: One for Data Warehouses and One for Data LakesArcadia Data
 
Data Engineer Intro - WeCloudData
Data Engineer Intro - WeCloudDataData Engineer Intro - WeCloudData
Data Engineer Intro - WeCloudDataWeCloudData
 
What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3Databricks
 
List of Top Local Databases used for react native app developement in 2022
List of Top Local Databases used for react native app developement in 2022					List of Top Local Databases used for react native app developement in 2022
List of Top Local Databases used for react native app developement in 2022 Shelly Megan
 
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT
Apache Spark and Apache Ignite: Where Fast Data Meets the IoTApache Spark and Apache Ignite: Where Fast Data Meets the IoT
Apache Spark and Apache Ignite: Where Fast Data Meets the IoTDenis Magda
 

Similaire à How R Developers Can Build and Share Data and AI Applications that Scale with Databricks and RStudio Connect (20)

Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
 
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
 
Introduction to Spark 2.0 Dataset API
Introduction to Spark 2.0 Dataset APIIntroduction to Spark 2.0 Dataset API
Introduction to Spark 2.0 Dataset API
 
Unlock the value of your big data infrastructure
Unlock the value of your big data infrastructureUnlock the value of your big data infrastructure
Unlock the value of your big data infrastructure
 
Rajeev kumar apache_spark & scala developer
Rajeev kumar apache_spark & scala developerRajeev kumar apache_spark & scala developer
Rajeev kumar apache_spark & scala developer
 
Simplifying AI integration on Apache Spark
Simplifying AI integration on Apache SparkSimplifying AI integration on Apache Spark
Simplifying AI integration on Apache Spark
 
A Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
A Tale of 2 BI Standards: One for Data Warehouses and One for Data LakesA Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
A Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
 
ESGYN Overview
ESGYN OverviewESGYN Overview
ESGYN Overview
 
NuoDB 3.0: Getting Started with Community Edition
NuoDB 3.0: Getting Started with Community EditionNuoDB 3.0: Getting Started with Community Edition
NuoDB 3.0: Getting Started with Community Edition
 
BDTC2015 databricks-辛湜-state of spark
BDTC2015 databricks-辛湜-state of sparkBDTC2015 databricks-辛湜-state of spark
BDTC2015 databricks-辛湜-state of spark
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Elevate MongoDB with ODBC/JDBC
Elevate MongoDB with ODBC/JDBCElevate MongoDB with ODBC/JDBC
Elevate MongoDB with ODBC/JDBC
 
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKESBig Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
 
Beyond Relational
Beyond RelationalBeyond Relational
Beyond Relational
 
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache CassandraApache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
 
A Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
A Tale of 2 BI Standards: One for Data Warehouses and One for Data LakesA Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
A Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
 
Data Engineer Intro - WeCloudData
Data Engineer Intro - WeCloudDataData Engineer Intro - WeCloudData
Data Engineer Intro - WeCloudData
 
What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3
 
List of Top Local Databases used for react native app developement in 2022
List of Top Local Databases used for react native app developement in 2022					List of Top Local Databases used for react native app developement in 2022
List of Top Local Databases used for react native app developement in 2022
 
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT
Apache Spark and Apache Ignite: Where Fast Data Meets the IoTApache Spark and Apache Ignite: Where Fast Data Meets the IoT
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT
 

Plus de Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Plus de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Dernier

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 

Dernier (20)

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 

How R Developers Can Build and Share Data and AI Applications that Scale with Databricks and RStudio Connect

  • 1.
  • 2. How R Developers Can Build and Share Data and AI Applications that Scale with Databricks and Rstudio Connect James Blair, Solutions Engineer, RStudio PBC Rafi Kurlansik, Sr. Solutions Architect, Databricks
  • 3. Agenda Rafi Kurlansik, Databricks Building Scalable R and Shiny apps with RStudio and Databricks James Blair, RStudio PBC Deploying Scalable Shiny apps with RStudio Connect and Databricks Benchmarking performance of Shiny connections to Spark
  • 4. How to scale R and Shiny with RStudio and Databricks
  • 5. How can we open up the data lake to R users? ▪ Typical development patterns ▪ Local ▪ Cloud / On Prem VM ▪ Challenges with big data ▪ Server memory - can only process so much data in the app itself before crashing R ▪ Performance - even on a powerful VM, eventually see our app get less responsive as we reach 100+ GBs ▪ Managing big data infrastructure - app value must be higher to justify the energy investment If only there was a technology with a familiar API in R that let our app scale to process 100s of GBs... Imagine trying to do so with traditional R development...
  • 6. Scale R Apps with Databricks and RStudio ▪ Development Patterns ▪ Hosted RStudio Server (Pro) on Databricks Cluster ▪ RStudio with remote Spark access using Databricks Connect ▪ Overcoming challenges with big data ▪ Auto-scaling Databricks Spark Clusters - dynamically respond to accommodate larger data processing tasks ▪ Consistently fast performance with Delta Lake and Databricks Runtime ▪ Managed service allows data teams to focus on building data products, not maintaining infrastructure Databricks Spark, RStudio IDE
  • 7. Hosted RStudio Server Pro on Databricks
  • 8. RStudio with Databricks Connect Local RStudio, Remote Spark
  • 10. The Data Science Process
  • 11. Shiny and Spark: A cautionary tale
  • 12. ODBC to the Rescue - The R + ODBC toolchain is robust and stable - As performant as a native Spark connection - Easy to migrate code from sparklyr to ODBC - Spark still does all of the computation - Databricks provides an optimized Spark ODBC driver
  • 13. ODBC with RStudio Connect
  • 14. ODBC Performance Comparing sparklyr against two versions of the Databricks ODBC/JDBC Driver Collecting Joins
  • 16. ▪ Interactive data analysis with SparkSQL ▪ sparklyr ▪ ODBC ▪ Other Spark APIs ▪ sparklyr ▪ Interactive data analysis with SparkSQL ▪ Shiny with ODBC ▪ Other Spark APIs ▪ ¯_(ツ)_/¯ ▪ Deploy models with MLflow? ▪ Submit individual commands with Databricks REST API 1.2? Run sparklyr jobs from RStudio on Databricks with bricksteR? Stay tuned…. Deploy at scaleDevelop at scale Conclusion
  • 17. Additional Resources ▪ Hosted RStudio on Databricks ▪ Databricks Connect ▪ ODBC ▪ ODBC Configuration ▪ RStudio Connect ▪ Sparklyr ▪ blairj09-talks/spark-summit-2020 ▪ RafiKurlansik/bricksteR ▪ delta-io/delta ▪ sparklyr/sparklyr Related ReposDocumentation
  • 18. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.