SlideShare une entreprise Scribd logo
1  sur  16
Télécharger pour lire hors ligne
AI Platform at Scale
Designing scalable platform for AI
Henry Saputra
Motivation for an AI Platform
● AI == ML for context of this presentation
● Developing AI Applications can easily incur technical debts
● Traditional software development assumes predictability during the lifetime
● Bring your own software and hardware
● Explainability and correctness are hard to quantify
● Data access and management is different from traditional software
● Compute and scale of workloads is different from traditional software
AI and ML code only small fraction ...
Reference: https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
AI Platform in the wild
● eBay Krylov
● Facebook FBLearner Flow
● Uber Michelangelo
● Google TFX
● Salesforce Einstein Platform
● Amazon Sagemaker
Problems to Solve?
● Reduce plumbing work by data scientists
● Dependency on data pipeline, compute infrastructure, and networking
● Large variance of quality, metrics, and measurement of success
● Research vs Applied
● Online vs Offline
● Collaboration requires different approach - Eg: Machine Learning models not
directly re-usable
● Undeclared consumers
Goals of an AI Platform
● Provides a system where data scientists could build reliable, secured, easy, reproducible, and
automated AI model training, and scoring/ inference at scale.
● Address the problem of platform approach to unified infrastructure to run AI and ML jobs - no
longer running inside data scientists computer
● Standardizing on tools and pipeline to simplify AI and ML jobs from training to deploy models
● AI and ML algorithms should be implemented once and shareable
● Enable parallelism and distributed jobs to accelerate and scale
● Support exploration of metrics about past experiments
● Secure and Easy to use
Common Architecture and Components
● Access to Data - Data analysis, Feature store, Data Lake, Data Format
● ML Workflow or Pipeline - DAG, Orchestration vs Choreography
● Domain Specific Language (DSL)
● Computing Platform and Infrastructure - Cloud vs In-house
○ “Tall” instances, GPU accelerated
○ Distributed computing framework
○ Fast network for data ingest
○ Data locality to compute resources
○ Containers and Microservices
● Models and Experiments lifecycle and management
● Models deployment and serving flow - Batch and Realtime
● Metrics and monitoring - dashboards, reports, logs
● APIs - UI, CLI, Program bindings/ SDK, RESTful, RPC
● Supported ML libraries
AI Platform at eBay - Krylov Project
Challenges of Deploying AI Platform at Scale
● Defining the “right” architecture
● Open source - build vs buy? Early stage for AI Platform
● Extendible and Scale - horizontal vs vertical
● Secure environment for data access and compute
● Standards and common tooling for ML development - reduce complexity
● Sharing and re-use of algorithms and models
● Reduce tech debts - fast moving
● Tech refresh of hardware - Cloud vs In-house
Future Looking ...
● AutoML
● Online training/ learning and edge devices update
● Distributed Deep Learning for training - model vs data parallelism
● Graph as machine learning
● Improve of computing infrastructure hardware - GPU, TPU
● Faster network
● Next generation of storage for ML use cases
● Better support for AI applications - update and retrain models from devices
● Support for newer AI computing paradigm at scale - generative models,
reinforcement learning
Who do we need in AI Platform Team?
● Engineers and scientists
● Product Management
● Runtime support and infrastructure
eBay Krylov High Level Architecture
eBay Krylov Cluster Deployment
eBay Krylov Cluster Infrastructure
eBay Krylov Dashboard
Thank You

Contenu connexe

Tendances

Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflowDatabricks
 
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...Databricks
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on AzureTrivadis
 
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron SchildkroutKafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkroutconfluent
 
Apply MLOps at Scale by H&M
Apply MLOps at Scale by H&MApply MLOps at Scale by H&M
Apply MLOps at Scale by H&MDatabricks
 
Architecting an Enterprise API Management Strategy
Architecting an Enterprise API Management StrategyArchitecting an Enterprise API Management Strategy
Architecting an Enterprise API Management StrategyWSO2
 
Managing the Machine Learning Lifecycle with MLOps
Managing the Machine Learning Lifecycle with MLOpsManaging the Machine Learning Lifecycle with MLOps
Managing the Machine Learning Lifecycle with MLOpsFatih Baltacı
 
Ml ops past_present_future
Ml ops past_present_futureMl ops past_present_future
Ml ops past_present_futureNisha Talagala
 
Observability in the world of microservices
Observability in the world of microservicesObservability in the world of microservices
Observability in the world of microservicesChandresh Pancholi
 
MLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLMLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLJordan Birdsell
 
Applying Domain-Driven Design to APIs and Microservices - Austin API Meetup
Applying Domain-Driven Design to APIs and Microservices  - Austin API MeetupApplying Domain-Driven Design to APIs and Microservices  - Austin API Meetup
Applying Domain-Driven Design to APIs and Microservices - Austin API MeetupLaunchAny
 
Multi-tenant Database Design for SaaS
Multi-tenant Database Design for SaaSMulti-tenant Database Design for SaaS
Multi-tenant Database Design for SaaSVõ Duy Tuấn
 
AIOps - The next 5 years
AIOps - The next 5 yearsAIOps - The next 5 years
AIOps - The next 5 yearsMoogsoft
 
Pinterest - Big Data Machine Learning Platform at Pinterest
Pinterest - Big Data Machine Learning Platform at PinterestPinterest - Big Data Machine Learning Platform at Pinterest
Pinterest - Big Data Machine Learning Platform at PinterestAlluxio, Inc.
 
From Data Science to MLOps
From Data Science to MLOpsFrom Data Science to MLOps
From Data Science to MLOpsCarl W. Handlin
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure DatabricksJames Serra
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesKarthik Murugesan
 
Learn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML LifecycleLearn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML LifecycleDatabricks
 
Event Driven Software Architecture Pattern
Event Driven Software Architecture PatternEvent Driven Software Architecture Pattern
Event Driven Software Architecture Patternjeetendra mandal
 

Tendances (20)

Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
 
MLOps in action
MLOps in actionMLOps in action
MLOps in action
 
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
 
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron SchildkroutKafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
 
Apply MLOps at Scale by H&M
Apply MLOps at Scale by H&MApply MLOps at Scale by H&M
Apply MLOps at Scale by H&M
 
Architecting an Enterprise API Management Strategy
Architecting an Enterprise API Management StrategyArchitecting an Enterprise API Management Strategy
Architecting an Enterprise API Management Strategy
 
Managing the Machine Learning Lifecycle with MLOps
Managing the Machine Learning Lifecycle with MLOpsManaging the Machine Learning Lifecycle with MLOps
Managing the Machine Learning Lifecycle with MLOps
 
Ml ops past_present_future
Ml ops past_present_futureMl ops past_present_future
Ml ops past_present_future
 
Observability in the world of microservices
Observability in the world of microservicesObservability in the world of microservices
Observability in the world of microservices
 
MLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLMLOps - The Assembly Line of ML
MLOps - The Assembly Line of ML
 
Applying Domain-Driven Design to APIs and Microservices - Austin API Meetup
Applying Domain-Driven Design to APIs and Microservices  - Austin API MeetupApplying Domain-Driven Design to APIs and Microservices  - Austin API Meetup
Applying Domain-Driven Design to APIs and Microservices - Austin API Meetup
 
Multi-tenant Database Design for SaaS
Multi-tenant Database Design for SaaSMulti-tenant Database Design for SaaS
Multi-tenant Database Design for SaaS
 
AIOps - The next 5 years
AIOps - The next 5 yearsAIOps - The next 5 years
AIOps - The next 5 years
 
Pinterest - Big Data Machine Learning Platform at Pinterest
Pinterest - Big Data Machine Learning Platform at PinterestPinterest - Big Data Machine Learning Platform at Pinterest
Pinterest - Big Data Machine Learning Platform at Pinterest
 
From Data Science to MLOps
From Data Science to MLOpsFrom Data Science to MLOps
From Data Science to MLOps
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slides
 
Learn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML LifecycleLearn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML Lifecycle
 
Event Driven Software Architecture Pattern
Event Driven Software Architecture PatternEvent Driven Software Architecture Pattern
Event Driven Software Architecture Pattern
 

Similaire à Ai platform at scale

Building a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowBuilding a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowGoDataDriven
 
Digital Reinvention by NRB
Digital Reinvention by NRBDigital Reinvention by NRB
Digital Reinvention by NRBWilliam Poos
 
ICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data ScienceICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data ScienceKaran Sachdeva
 
DDDP 2019 - Brown to Green
DDDP 2019  - Brown to GreenDDDP 2019  - Brown to Green
DDDP 2019 - Brown to GreenJohn Archer
 
Deploying ML models in the enterprise
Deploying ML models in the enterpriseDeploying ML models in the enterprise
Deploying ML models in the enterprisedoppenhe
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Sri Ambati
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018Adam Gibson
 
It Consulting & Services - Black Basil Technologies
It Consulting & Services  - Black Basil TechnologiesIt Consulting & Services  - Black Basil Technologies
It Consulting & Services - Black Basil TechnologiesBlack Basil Technologies
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSPhilip Filleul
 
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...Abhinav Joshi
 
Making machine learning model deployment boring - Big Data Expo 2019
Making machine learning model deployment boring - Big Data Expo 2019Making machine learning model deployment boring - Big Data Expo 2019
Making machine learning model deployment boring - Big Data Expo 2019webwinkelvakdag
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessAnant Corporation
 
Machine Learning at Scale with MLflow and Apache Spark
Machine Learning at Scale with MLflow and Apache SparkMachine Learning at Scale with MLflow and Apache Spark
Machine Learning at Scale with MLflow and Apache SparkDatabricks
 
Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...
Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...
Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...Databricks
 
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureFei Chen
 
Emergence of cloud computing and internet of things an overview
Emergence of cloud computing and internet of things   an overviewEmergence of cloud computing and internet of things   an overview
Emergence of cloud computing and internet of things an overviewSelvaraj Kesavan
 
Dell AI Telecom Webinar
Dell AI Telecom WebinarDell AI Telecom Webinar
Dell AI Telecom WebinarBill Wong
 
Developing and deploying AI solutions on the cloud using Team Data Science Pr...
Developing and deploying AI solutions on the cloud using Team Data Science Pr...Developing and deploying AI solutions on the cloud using Team Data Science Pr...
Developing and deploying AI solutions on the cloud using Team Data Science Pr...Debraj GuhaThakurta
 

Similaire à Ai platform at scale (20)

Building a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowBuilding a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlow
 
Digital Reinvention by NRB
Digital Reinvention by NRBDigital Reinvention by NRB
Digital Reinvention by NRB
 
ICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data ScienceICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data Science
 
DDDP 2019 - Brown to Green
DDDP 2019  - Brown to GreenDDDP 2019  - Brown to Green
DDDP 2019 - Brown to Green
 
Deploying ML models in the enterprise
Deploying ML models in the enterpriseDeploying ML models in the enterprise
Deploying ML models in the enterprise
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018
 
It Consulting & Services - Black Basil Technologies
It Consulting & Services  - Black Basil TechnologiesIt Consulting & Services  - Black Basil Technologies
It Consulting & Services - Black Basil Technologies
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
 
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
 
Making machine learning model deployment boring - Big Data Expo 2019
Making machine learning model deployment boring - Big Data Expo 2019Making machine learning model deployment boring - Big Data Expo 2019
Making machine learning model deployment boring - Big Data Expo 2019
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
 
Machine Learning at Scale with MLflow and Apache Spark
Machine Learning at Scale with MLflow and Apache SparkMachine Learning at Scale with MLflow and Apache Spark
Machine Learning at Scale with MLflow and Apache Spark
 
Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...
Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...
Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...
 
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
 
Cloud computingjun28
Cloud computingjun28Cloud computingjun28
Cloud computingjun28
 
Cloud computingjun28
Cloud computingjun28Cloud computingjun28
Cloud computingjun28
 
Emergence of cloud computing and internet of things an overview
Emergence of cloud computing and internet of things   an overviewEmergence of cloud computing and internet of things   an overview
Emergence of cloud computing and internet of things an overview
 
Dell AI Telecom Webinar
Dell AI Telecom WebinarDell AI Telecom Webinar
Dell AI Telecom Webinar
 
Developing and deploying AI solutions on the cloud using Team Data Science Pr...
Developing and deploying AI solutions on the cloud using Team Data Science Pr...Developing and deploying AI solutions on the cloud using Team Data Science Pr...
Developing and deploying AI solutions on the cloud using Team Data Science Pr...
 

Dernier

The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 

Dernier (20)

The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 

Ai platform at scale

  • 1. AI Platform at Scale Designing scalable platform for AI Henry Saputra
  • 2. Motivation for an AI Platform ● AI == ML for context of this presentation ● Developing AI Applications can easily incur technical debts ● Traditional software development assumes predictability during the lifetime ● Bring your own software and hardware ● Explainability and correctness are hard to quantify ● Data access and management is different from traditional software ● Compute and scale of workloads is different from traditional software
  • 3. AI and ML code only small fraction ... Reference: https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
  • 4. AI Platform in the wild ● eBay Krylov ● Facebook FBLearner Flow ● Uber Michelangelo ● Google TFX ● Salesforce Einstein Platform ● Amazon Sagemaker
  • 5. Problems to Solve? ● Reduce plumbing work by data scientists ● Dependency on data pipeline, compute infrastructure, and networking ● Large variance of quality, metrics, and measurement of success ● Research vs Applied ● Online vs Offline ● Collaboration requires different approach - Eg: Machine Learning models not directly re-usable ● Undeclared consumers
  • 6. Goals of an AI Platform ● Provides a system where data scientists could build reliable, secured, easy, reproducible, and automated AI model training, and scoring/ inference at scale. ● Address the problem of platform approach to unified infrastructure to run AI and ML jobs - no longer running inside data scientists computer ● Standardizing on tools and pipeline to simplify AI and ML jobs from training to deploy models ● AI and ML algorithms should be implemented once and shareable ● Enable parallelism and distributed jobs to accelerate and scale ● Support exploration of metrics about past experiments ● Secure and Easy to use
  • 7. Common Architecture and Components ● Access to Data - Data analysis, Feature store, Data Lake, Data Format ● ML Workflow or Pipeline - DAG, Orchestration vs Choreography ● Domain Specific Language (DSL) ● Computing Platform and Infrastructure - Cloud vs In-house ○ “Tall” instances, GPU accelerated ○ Distributed computing framework ○ Fast network for data ingest ○ Data locality to compute resources ○ Containers and Microservices ● Models and Experiments lifecycle and management ● Models deployment and serving flow - Batch and Realtime ● Metrics and monitoring - dashboards, reports, logs ● APIs - UI, CLI, Program bindings/ SDK, RESTful, RPC ● Supported ML libraries
  • 8. AI Platform at eBay - Krylov Project
  • 9. Challenges of Deploying AI Platform at Scale ● Defining the “right” architecture ● Open source - build vs buy? Early stage for AI Platform ● Extendible and Scale - horizontal vs vertical ● Secure environment for data access and compute ● Standards and common tooling for ML development - reduce complexity ● Sharing and re-use of algorithms and models ● Reduce tech debts - fast moving ● Tech refresh of hardware - Cloud vs In-house
  • 10. Future Looking ... ● AutoML ● Online training/ learning and edge devices update ● Distributed Deep Learning for training - model vs data parallelism ● Graph as machine learning ● Improve of computing infrastructure hardware - GPU, TPU ● Faster network ● Next generation of storage for ML use cases ● Better support for AI applications - update and retrain models from devices ● Support for newer AI computing paradigm at scale - generative models, reinforcement learning
  • 11. Who do we need in AI Platform Team? ● Engineers and scientists ● Product Management ● Runtime support and infrastructure
  • 12. eBay Krylov High Level Architecture
  • 13. eBay Krylov Cluster Deployment
  • 14. eBay Krylov Cluster Infrastructure