SlideShare a Scribd company logo
1 of 19
Download to read offline
Uber’s Data Science Workbench
Randy Wei Peng Du
Mission
Unleash the productivity of the Data
Science community at Uber by
providing scalable infrastructure,
tools, customization and support.
Tools of the Trade: Jupyter Notebooks
Alternative to traditional CLIs
Interactive tool which combines
Prose (HTML Markdown),
Code (Py, R, Scala)
Visualization (charts, maps, tables)
Shareable artifact of knowledge
Hosted webapp
Notebook, Notes, Cells
Each cell is an executable line of code
Used for
Data exploration, Cleansing, Modeling
Dashboarding/reporting
HTML
Code
Output
Tools of the Trade: RStudio Server
Browser interface to a remote R server
Centrally manage compute infrastructure
IDE for R
Syntax highlight, code completion
Debugging
Charts
File Browser
RStudio also has Notebook functionality
R has a huge library repository
Used mostly for rapid prototyping of models
on small datasets (UbeR)
Data
Code
Output
Tools of the Trade: Apache Spark
Distributed statistical computing framework
Run R code without translating it to Java
Choice of Intelligent Decision, Insurance, etc
teams
Distributed machine learning framework
Easy to integrate with scientific Python
libraries
Choice of Fraud Detection, Sensing and
Perception, etc teams
SparkR PySpark
● Productivity
● Py, R, Scala interpreters in Jupyter
● Hosted RStudio support
● Version Control
● Custom libraries/environment
● Single-pane lifecycle mgmnt.
● PySpark, SparkR
Scale
● Scalable Jupyter Server infra.
● Large dist. computation backend
● Multitenancy
● File Persistence
● Security
Requirements
Ecosystem Integration
● Scheduling: Piper
● Dashboards: Shiny
● Data Exploration: Query engine API
● Deploy: Machine learning platform
● Chargeback: Monitoring platform
● Knowledge
● Search
● Access Controls
● Sharing Controls
● Publish
● Comments & Discussion
Scale Productivity
Social Ecosystem
State of the Union
Problem
● Data Scientists (DSs) start
at Uber with diverse
skillsets and backgrounds
● Precious time wasted in
infra. setup, version control,
search, sharing...
● Teams are building their
own solutions
Vision
● Web-based hub for all Data
Scientists at Uber
● Ability to centrally:
○ provision tools
○ leverage dist.
Backend
○ search, comment,
share
○ monitor
● Integrated with Uber’s data
ecosystem
● Dedicated SRE
Opportunity
● Find and reuse knowledge
● Opportunity for a dedicated
team to advocate for and
build tools needs to make
DSs hyper-productive
● Cloud experience
● Chargeback
Similar offerings...
Management Service
Create, Delete, Search, Share, Publish, Schedule
RStudio
(Docker)
Uber Mesos Infra Shared File System
MLlib
Worker
MLlib
Worker
MLlib
Worker
MLlib
Worker
MLlib
Worker
PySpark
Worker
MLlib
Worker
MLlib
Worker
SparkR
Worker
Uber spark
debugging
toolkit
Uber spark
development
toolkit
RStudio
(Docker)
RStudio
(Docker) RStudio
(Docker)
RStudio
(Docker)
Jupyter
(Docker)
Manage
Mesos
Spark
Architecture
Architecture
NB1
Application
Management
Service
session / file
management,
proxy
Mesos Cluster
Docker Container Hadoop
Cluster
(Hive, Presto,
Spark)
Distributed
ProcessingDocker Container
Docker Container
RStudio
Server
RStudio
Jupyter
Docker Container
NB1Jupyter
Server NB2
Web GUI
Data Science
Workbench
Uber ML platform Palette
Hive Cassandra
Spark
Spark SDK, Spark Debug
tool, Spark templates
Uber Ecosystem
Models
HDFS
Query
Runner
Production
PySpark
for ML
Data Visualization
Workflow Demo
Q&A

More Related Content

What's hot

Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...
Spark Summit
 
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
Databricks
 
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion StoicaSpark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Databricks
 

What's hot (20)

Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
 
Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...
 
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
 
Spark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan SaldichSpark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan Saldich
 
Building a Big Data Pipeline
Building a Big Data PipelineBuilding a Big Data Pipeline
Building a Big Data Pipeline
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
 
Hadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an exampleHadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an example
 
Realtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIORealtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIO
 
Building Data Quality pipelines with Apache Spark and Delta Lake
Building Data Quality pipelines with Apache Spark and Delta LakeBuilding Data Quality pipelines with Apache Spark and Delta Lake
Building Data Quality pipelines with Apache Spark and Delta Lake
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
 
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
 
High-Scale Entity Resolution in Hadoop
High-Scale Entity Resolution in HadoopHigh-Scale Entity Resolution in Hadoop
High-Scale Entity Resolution in Hadoop
 
Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...
 
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion StoicaSpark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
 
Druid Overview by Rachel Pedreschi
Druid Overview by Rachel PedreschiDruid Overview by Rachel Pedreschi
Druid Overview by Rachel Pedreschi
 
Spark - Migration Story
Spark - Migration Story Spark - Migration Story
Spark - Migration Story
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
 

Viewers also liked

Viewers also liked (16)

Architecting a Next Generation Data Platform
Architecting a Next Generation Data PlatformArchitecting a Next Generation Data Platform
Architecting a Next Generation Data Platform
 
Churn management
Churn managementChurn management
Churn management
 
Presentación Gregorio Trimarco | Mastercard - eCommerce Day Buenos Aires 2017
Presentación Gregorio Trimarco | Mastercard - eCommerce Day Buenos Aires 2017Presentación Gregorio Trimarco | Mastercard - eCommerce Day Buenos Aires 2017
Presentación Gregorio Trimarco | Mastercard - eCommerce Day Buenos Aires 2017
 
Presentación Deb Reyes | Google - eCommerce Day Buenos Aires 2017 12 30 4. de...
Presentación Deb Reyes | Google - eCommerce Day Buenos Aires 2017 12 30 4. de...Presentación Deb Reyes | Google - eCommerce Day Buenos Aires 2017 12 30 4. de...
Presentación Deb Reyes | Google - eCommerce Day Buenos Aires 2017 12 30 4. de...
 
Presentación Juan Tomac | Unilever - eCommerce Day Buenos Aires 2017
Presentación Juan Tomac | Unilever - eCommerce Day Buenos Aires 2017Presentación Juan Tomac | Unilever - eCommerce Day Buenos Aires 2017
Presentación Juan Tomac | Unilever - eCommerce Day Buenos Aires 2017
 
Presentación Juan Pablo Lafosse | Almundo - eCommerce Day Buenos Aires 2017
Presentación Juan Pablo Lafosse | Almundo - eCommerce Day Buenos Aires 2017Presentación Juan Pablo Lafosse | Almundo - eCommerce Day Buenos Aires 2017
Presentación Juan Pablo Lafosse | Almundo - eCommerce Day Buenos Aires 2017
 
Presentación Mariano Tordo, Farmacity & Andres Zaied, Musimundo - eCommerce D...
Presentación Mariano Tordo, Farmacity & Andres Zaied, Musimundo - eCommerce D...Presentación Mariano Tordo, Farmacity & Andres Zaied, Musimundo - eCommerce D...
Presentación Mariano Tordo, Farmacity & Andres Zaied, Musimundo - eCommerce D...
 
Presentación Jorgelina Striedinger | Digital Element - eCommerce Day Buenos A...
Presentación Jorgelina Striedinger | Digital Element - eCommerce Day Buenos A...Presentación Jorgelina Striedinger | Digital Element - eCommerce Day Buenos A...
Presentación Jorgelina Striedinger | Digital Element - eCommerce Day Buenos A...
 
Presentación Alberto Banano Pardo | AdsMovil - eCommerce Day Buenos Aires 2017
Presentación Alberto Banano Pardo | AdsMovil - eCommerce Day Buenos Aires 2017Presentación Alberto Banano Pardo | AdsMovil - eCommerce Day Buenos Aires 2017
Presentación Alberto Banano Pardo | AdsMovil - eCommerce Day Buenos Aires 2017
 
Presentación Cristian Adamo | Avantrip - eCommerce Day Buenos Aires 2017
Presentación Cristian Adamo | Avantrip - eCommerce Day Buenos Aires 2017Presentación Cristian Adamo | Avantrip - eCommerce Day Buenos Aires 2017
Presentación Cristian Adamo | Avantrip - eCommerce Day Buenos Aires 2017
 
Presentaciones Gustavo Sambucetti | CACE & GoforEcommerce - eCommerce Day Bue...
Presentaciones Gustavo Sambucetti | CACE & GoforEcommerce - eCommerce Day Bue...Presentaciones Gustavo Sambucetti | CACE & GoforEcommerce - eCommerce Day Bue...
Presentaciones Gustavo Sambucetti | CACE & GoforEcommerce - eCommerce Day Bue...
 
Presentación Sergio Grinbaum | Think Thanks - eCommerce Day Buenos Aires 2017
Presentación Sergio Grinbaum | Think Thanks - eCommerce Day Buenos Aires 2017Presentación Sergio Grinbaum | Think Thanks - eCommerce Day Buenos Aires 2017
Presentación Sergio Grinbaum | Think Thanks - eCommerce Day Buenos Aires 2017
 
Presentación Eliane Iwasaki | Return Path - eCommerce Day Buenos Aires 2017
Presentación Eliane Iwasaki | Return Path - eCommerce Day Buenos Aires 2017Presentación Eliane Iwasaki | Return Path - eCommerce Day Buenos Aires 2017
Presentación Eliane Iwasaki | Return Path - eCommerce Day Buenos Aires 2017
 
Presentación Joan Miró | NetQuest - eCommerce Day Buenos Aires 2017
Presentación Joan Miró | NetQuest - eCommerce Day Buenos Aires 2017Presentación Joan Miró | NetQuest - eCommerce Day Buenos Aires 2017
Presentación Joan Miró | NetQuest - eCommerce Day Buenos Aires 2017
 
Presentación Francisco Berroeta | Samsonite - eCommerce Day Buenos Aires 2017
Presentación Francisco Berroeta | Samsonite - eCommerce Day Buenos Aires 2017Presentación Francisco Berroeta | Samsonite - eCommerce Day Buenos Aires 2017
Presentación Francisco Berroeta | Samsonite - eCommerce Day Buenos Aires 2017
 
Debugging Apache Spark - Scala & Python super happy fun times 2017
Debugging Apache Spark -   Scala & Python super happy fun times 2017Debugging Apache Spark -   Scala & Python super happy fun times 2017
Debugging Apache Spark - Scala & Python super happy fun times 2017
 

Similar to Uber's data science workbench

VanyaSehgal_Resume
VanyaSehgal_ResumeVanyaSehgal_Resume
VanyaSehgal_Resume
VANYA SEHGAL
 
Borys Rybak “How to make your data smart with Artificial Intelligence and Mac...
Borys Rybak “How to make your data smart with Artificial Intelligence and Mac...Borys Rybak “How to make your data smart with Artificial Intelligence and Mac...
Borys Rybak “How to make your data smart with Artificial Intelligence and Mac...
Lviv Startup Club
 
Bigdata.sunil_6+yearsExp
Bigdata.sunil_6+yearsExpBigdata.sunil_6+yearsExp
Bigdata.sunil_6+yearsExp
bigdata sunil
 

Similar to Uber's data science workbench (20)

Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
 
Bhadale group of companies our technology ecosystem
Bhadale group of companies our technology ecosystemBhadale group of companies our technology ecosystem
Bhadale group of companies our technology ecosystem
 
Lviv Data Science Club (Sergiy Lunyakin)
Lviv Data Science Club (Sergiy Lunyakin)Lviv Data Science Club (Sergiy Lunyakin)
Lviv Data Science Club (Sergiy Lunyakin)
 
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
 
Top 10 Data analytics tools to look for in 2021
Top 10 Data analytics tools to look for in 2021Top 10 Data analytics tools to look for in 2021
Top 10 Data analytics tools to look for in 2021
 
Tour de France Azure PaaS 6/7 Ajouter de l'intelligence
Tour de France Azure PaaS 6/7 Ajouter de l'intelligenceTour de France Azure PaaS 6/7 Ajouter de l'intelligence
Tour de France Azure PaaS 6/7 Ajouter de l'intelligence
 
VanyaSehgal_Resume
VanyaSehgal_ResumeVanyaSehgal_Resume
VanyaSehgal_Resume
 
Developing and deploying AI solutions on the cloud using Team Data Science Pr...
Developing and deploying AI solutions on the cloud using Team Data Science Pr...Developing and deploying AI solutions on the cloud using Team Data Science Pr...
Developing and deploying AI solutions on the cloud using Team Data Science Pr...
 
Borys Rybak “How to make your data smart with Artificial Intelligence and Mac...
Borys Rybak “How to make your data smart with Artificial Intelligence and Mac...Borys Rybak “How to make your data smart with Artificial Intelligence and Mac...
Borys Rybak “How to make your data smart with Artificial Intelligence and Mac...
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
 
Analyzing data with docker v4
Analyzing data with docker   v4Analyzing data with docker   v4
Analyzing data with docker v4
 
PPT5: Neuron Introduction
PPT5: Neuron IntroductionPPT5: Neuron Introduction
PPT5: Neuron Introduction
 
sudipto_resume
sudipto_resumesudipto_resume
sudipto_resume
 
December 2013 HUG: Hunk - Splunk over Hadoop
December 2013 HUG: Hunk - Splunk over HadoopDecember 2013 HUG: Hunk - Splunk over Hadoop
December 2013 HUG: Hunk - Splunk over Hadoop
 
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache CassandraApache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
 
Microsoft AI Platform Overview
Microsoft AI Platform OverviewMicrosoft AI Platform Overview
Microsoft AI Platform Overview
 
Bigdata.sunil_6+yearsExp
Bigdata.sunil_6+yearsExpBigdata.sunil_6+yearsExp
Bigdata.sunil_6+yearsExp
 
Sudipta_Mukherjee_Resume_APR_2023.pdf
Sudipta_Mukherjee_Resume_APR_2023.pdfSudipta_Mukherjee_Resume_APR_2023.pdf
Sudipta_Mukherjee_Resume_APR_2023.pdf
 
Integrating Apache Phoenix with Distributed Query Engines
Integrating Apache Phoenix with Distributed Query EnginesIntegrating Apache Phoenix with Distributed Query Engines
Integrating Apache Phoenix with Distributed Query Engines
 

Recently uploaded

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Christo Ananth
 

Recently uploaded (20)

(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spain
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 

Uber's data science workbench

  • 1. Uber’s Data Science Workbench Randy Wei Peng Du
  • 2.
  • 3. Mission Unleash the productivity of the Data Science community at Uber by providing scalable infrastructure, tools, customization and support.
  • 4. Tools of the Trade: Jupyter Notebooks Alternative to traditional CLIs Interactive tool which combines Prose (HTML Markdown), Code (Py, R, Scala) Visualization (charts, maps, tables) Shareable artifact of knowledge Hosted webapp Notebook, Notes, Cells Each cell is an executable line of code Used for Data exploration, Cleansing, Modeling Dashboarding/reporting HTML Code Output
  • 5. Tools of the Trade: RStudio Server Browser interface to a remote R server Centrally manage compute infrastructure IDE for R Syntax highlight, code completion Debugging Charts File Browser RStudio also has Notebook functionality R has a huge library repository Used mostly for rapid prototyping of models on small datasets (UbeR) Data Code Output
  • 6. Tools of the Trade: Apache Spark Distributed statistical computing framework Run R code without translating it to Java Choice of Intelligent Decision, Insurance, etc teams Distributed machine learning framework Easy to integrate with scientific Python libraries Choice of Fraud Detection, Sensing and Perception, etc teams SparkR PySpark
  • 7. ● Productivity ● Py, R, Scala interpreters in Jupyter ● Hosted RStudio support ● Version Control ● Custom libraries/environment ● Single-pane lifecycle mgmnt. ● PySpark, SparkR Scale ● Scalable Jupyter Server infra. ● Large dist. computation backend ● Multitenancy ● File Persistence ● Security Requirements Ecosystem Integration ● Scheduling: Piper ● Dashboards: Shiny ● Data Exploration: Query engine API ● Deploy: Machine learning platform ● Chargeback: Monitoring platform ● Knowledge ● Search ● Access Controls ● Sharing Controls ● Publish ● Comments & Discussion Scale Productivity Social Ecosystem
  • 8. State of the Union Problem ● Data Scientists (DSs) start at Uber with diverse skillsets and backgrounds ● Precious time wasted in infra. setup, version control, search, sharing... ● Teams are building their own solutions Vision ● Web-based hub for all Data Scientists at Uber ● Ability to centrally: ○ provision tools ○ leverage dist. Backend ○ search, comment, share ○ monitor ● Integrated with Uber’s data ecosystem ● Dedicated SRE Opportunity ● Find and reuse knowledge ● Opportunity for a dedicated team to advocate for and build tools needs to make DSs hyper-productive ● Cloud experience ● Chargeback
  • 10. Management Service Create, Delete, Search, Share, Publish, Schedule RStudio (Docker) Uber Mesos Infra Shared File System MLlib Worker MLlib Worker MLlib Worker MLlib Worker MLlib Worker PySpark Worker MLlib Worker MLlib Worker SparkR Worker Uber spark debugging toolkit Uber spark development toolkit RStudio (Docker) RStudio (Docker) RStudio (Docker) RStudio (Docker) Jupyter (Docker) Manage Mesos Spark Architecture
  • 11. Architecture NB1 Application Management Service session / file management, proxy Mesos Cluster Docker Container Hadoop Cluster (Hive, Presto, Spark) Distributed ProcessingDocker Container Docker Container RStudio Server RStudio Jupyter Docker Container NB1Jupyter Server NB2 Web GUI
  • 12. Data Science Workbench Uber ML platform Palette Hive Cassandra Spark Spark SDK, Spark Debug tool, Spark templates Uber Ecosystem Models HDFS Query Runner Production PySpark for ML Data Visualization
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19. Q&A