SlideShare une entreprise Scribd logo
1  sur  37
Télécharger pour lire hors ligne
Building a Machine Learning
Platform at Quora
Nikhil Garg
@nikhilgarg28
@Quora @MLconf 11/11/16
The Quora Answer To “Build vs Buy” For ML Platforms
● At Quora since 2012
● Currently leading two ML engineering teams:
○ Content Quality
○ ML Platform
A bit about me...
@nikhilgarg28
To Grow And Share World’s Knowledge
Over 100 million monthly uniques
Millions of questions & answers
In hundreds of thousands of topics
Supported by 80 engineers
What Slows Down ML Innovation?
● Pipeline jungles
● Lots of glue code to get data in/out of general
purpose packages.
● Strong coupling between business logic, data, ML
algorithms and configuration.
Curse Of Complexity
● Online vs offline
● Production vs experimentation
● C++ vs Python
● Engineering vs research
● ...even more glue code and pipeline jungles.
Clash Of Titans
● Hard to reuse existing features, data, algorithms,
tooling etc.
● Too costly to even get off the ground.
Getting New Applications Off The Ground
http://www.qvidian.com/blog/resistance-to-change-sales-organizations
Many Faces Of Chaos
One ring to bring them all and in
the darkness bind them!
Collection of systems to sustainably increase the
business impact of ML at scale.
Machine Learning Platform
ML Platform: Build or Buy?
The Quora Answer: Build
For Seven Reasons
Reason # 7
Just Can’t Buy Everything!
● No matter how powerful the platform is, still need to
maintain some form of integration
● This thin integration layer then becomes the platform.
● Real questions --
○ How much does this in-house layer delegate?
○ How much control does it have over delegation?
.
Degree Of Integration & Delegation
Reason # 6
Fast Scalable Production Systems
End-To-End Online Production Systems
● External platforms at best can deploy “predictive models”, as
services, not end-to-end online systems
● Gains come from optimizing the whole pipeline, not just
algorithms.
● Latency: tens of milliseconds. Managing sharding, batching, data
locality, caching, streaming, stragglers, graceful degradation...
● Real world systems -- boosts, diversity constraints, holes in data,
skipping stages, hard filters… sounds familiar?
Candidate Generation
Feature Extraction
Scoring
Post Processing
Data
Reason # 5
Blurry Line Between
Experimentation & Production
● We want the same code/systems/tools to
work for both experimentation &
production.
● But we need to carefully “control” the
production code to keep it be fast.
● So need to “control” offline
experimentation systems too.
Candidate Generation
Feature Extraction
Scoring
Post Processing
Data
Candidate Generation
Feature Extraction
Training
Reason # 4
Openly Using Open Source
● Logistic Regression
● Elastic Nets
● Random Forests
● Gradient Boosted Decision Trees
● Matrix Factorization
● (Deep) Neural Networks
● LambdaMart
● Clustering
● Random walk based methods
● Word Embeddings
● LDA
● ...
Production ML Algorithms At Quora
Candidate Generation
Feature Extraction
Training/Scoring
Post Processing
Data
● Open source is great -- lots of great technologies!
● Commerical ML platforms are also open sourcing stuff.
● Learning and cherry-picking favorite parts from ANY
open source systems.
● May write our own algorithms too (e.g QMF)
● Building own platform = controlling the delegation, not
lack of delegation
Reason # 3
Commercial Platforms’ Offerings
Are Not Super Valuable To Us
● Main offerings of external platforms are:
○ Lower operational overhead of running machines
○ Out-of-box distributed training.
● Operational overhead
○ Gets amortized over time
○ Shared with non-ML infrastructure.
● Can often train most models in a single multi-core machine.
.
Reason # 2
Blurry Line Between ML & Product Dev
● Answer ranking
● Feed ranking
● Search ranking
● User recommendations
● Topic recommendations
● Duplicate questions
● Email Digest
● Request Answers
● Trending now
● Topic expertise prediction
● Spam, abuse detection
● ….
Blurry Line Between ML/Non-ML Product
Blurry Line Between ML/Non-ML Data
Users
Answers
Questions
Topics Votes
Follow
Ask
Write
Cast
Have
Contain
Get
Comments
Get
Follow
Write
Have Have
Billions of relationships and words
Blurry Line Between ML/Non-ML Codebase
● Integration with other utility libraries/services
e.g A/B testing, debug tools, monitoring, alerting, data
transfer, ...
● Empowering all product engineers to do ML.
Reason # 1
ML As Quora’s Core Competency
● ML gives us a strategic competitive advantage.
● Want to control and develop deep expertise in the
whole stack.
● Quora has a long term focus -- investment in
platform more than pays off in the long term.
● Single most important reason to build ML Platform!
ML: Critical For Our Strategic Focus
Relevance
Quality Demand
Summary
● Anyone doing non-trivial ML needs an ML platform to
sustain innovation at scale.
● Build vs buy decision is not all-or-nothing.
● Surface area and importance of ML are deciding factors
in the build vs buy decision.
Nikhil Garg
@nikhilgarg28
Thank You!
YES, WE ARE HIRING :)

Contenu connexe

Tendances

Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...
Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...
Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...
Databricks
 

Tendances (20)

Advanced python
Advanced pythonAdvanced python
Advanced python
 
A Microservices Framework for Real-Time Model Scoring Using Structured Stream...
A Microservices Framework for Real-Time Model Scoring Using Structured Stream...A Microservices Framework for Real-Time Model Scoring Using Structured Stream...
A Microservices Framework for Real-Time Model Scoring Using Structured Stream...
 
“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps
 
Kyryl Truskovskyi: Kubeflow for end2end machine learning lifecycle
Kyryl Truskovskyi: Kubeflow for end2end machine learning lifecycleKyryl Truskovskyi: Kubeflow for end2end machine learning lifecycle
Kyryl Truskovskyi: Kubeflow for end2end machine learning lifecycle
 
Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...
Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...
Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...
 
Machine Learning In Production
Machine Learning In ProductionMachine Learning In Production
Machine Learning In Production
 
Deep Learning with GPUs in Production - AI By the Bay
Deep Learning with GPUs in Production - AI By the BayDeep Learning with GPUs in Production - AI By the Bay
Deep Learning with GPUs in Production - AI By the Bay
 
ML Platform Q1 Meetup: End to-end Feature Analysis, Validation and Transforma...
ML Platform Q1 Meetup: End to-end Feature Analysis, Validation and Transforma...ML Platform Q1 Meetup: End to-end Feature Analysis, Validation and Transforma...
ML Platform Q1 Meetup: End to-end Feature Analysis, Validation and Transforma...
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life CycleMLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
 
CI/CD for Machine Learning with Daniel Kobran
CI/CD for Machine Learning with Daniel KobranCI/CD for Machine Learning with Daniel Kobran
CI/CD for Machine Learning with Daniel Kobran
 
Automating machine learning lifecycle with kubeflow
Automating machine learning lifecycle with kubeflowAutomating machine learning lifecycle with kubeflow
Automating machine learning lifecycle with kubeflow
 
mlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecyclemlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecycle
 
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
 
AutoML Toolkit – Deep Dive
AutoML Toolkit – Deep DiveAutoML Toolkit – Deep Dive
AutoML Toolkit – Deep Dive
 
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...
 
Productionizing Deep Reinforcement Learning with Spark and MLflow
Productionizing Deep Reinforcement Learning with Spark and MLflowProductionizing Deep Reinforcement Learning with Spark and MLflow
Productionizing Deep Reinforcement Learning with Spark and MLflow
 
Brief introduction to Distributed Deep Learning
Brief introduction to Distributed Deep LearningBrief introduction to Distributed Deep Learning
Brief introduction to Distributed Deep Learning
 
Scaling machine learning as a service at Uber — Li Erran Li at #papis2016
Scaling machine learning as a service at Uber — Li Erran Li at #papis2016Scaling machine learning as a service at Uber — Li Erran Li at #papis2016
Scaling machine learning as a service at Uber — Li Erran Li at #papis2016
 
ML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talkML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talk
 

En vedette

En vedette (9)

PredictionIO - Scalable Machine Learning Architecture
PredictionIO - Scalable Machine Learning ArchitecturePredictionIO - Scalable Machine Learning Architecture
PredictionIO - Scalable Machine Learning Architecture
 
Pragmatic machine learning for the real world
Pragmatic machine learning for the real worldPragmatic machine learning for the real world
Pragmatic machine learning for the real world
 
Setting up a Machine Learning Platform - Monitoring social media the “smart” way
Setting up a Machine Learning Platform - Monitoring social media the “smart” waySetting up a Machine Learning Platform - Monitoring social media the “smart” way
Setting up a Machine Learning Platform - Monitoring social media the “smart” way
 
Distributed machine learning 101 using apache spark from a browser devoxx.b...
Distributed machine learning 101 using apache spark from a browser   devoxx.b...Distributed machine learning 101 using apache spark from a browser   devoxx.b...
Distributed machine learning 101 using apache spark from a browser devoxx.b...
 
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systemsBIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
 
PredictionIO – A Machine Learning Server in Scala – SF Scala
PredictionIO – A Machine Learning Server in Scala – SF ScalaPredictionIO – A Machine Learning Server in Scala – SF Scala
PredictionIO – A Machine Learning Server in Scala – SF Scala
 
Machine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora ExampleMachine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora Example
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
Recsys 2014 Tutorial - The Recommender Problem Revisited
Recsys 2014 Tutorial - The Recommender Problem RevisitedRecsys 2014 Tutorial - The Recommender Problem Revisited
Recsys 2014 Tutorial - The Recommender Problem Revisited
 

Similaire à Building A Machine Learning Platform At Quora (1)

Bridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to ProductionBridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to Production
Florian Wilhelm
 
Deploying ML models in the enterprise
Deploying ML models in the enterpriseDeploying ML models in the enterprise
Deploying ML models in the enterprise
doppenhe
 

Similaire à Building A Machine Learning Platform At Quora (1) (20)

Rsqrd AI: From R&D to ROI of AI
Rsqrd AI: From R&D to ROI of AIRsqrd AI: From R&D to ROI of AI
Rsqrd AI: From R&D to ROI of AI
 
Building a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowBuilding a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlow
 
Open, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI PipelinesOpen, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI Pipelines
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or reality
 
Bridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to ProductionBridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to Production
 
Deploying ML models in the enterprise
Deploying ML models in the enterpriseDeploying ML models in the enterprise
Deploying ML models in the enterprise
 
From Data Science to MLOps
From Data Science to MLOpsFrom Data Science to MLOps
From Data Science to MLOps
 
Benefits of a Homemade ML Platform
Benefits of a Homemade ML PlatformBenefits of a Homemade ML Platform
Benefits of a Homemade ML Platform
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
 
Ai platform at scale
Ai platform at scaleAi platform at scale
Ai platform at scale
 
Structurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureStructurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your Architecture
 
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
 
Bridging the gap in enterprise AI
Bridging the gap in enterprise AIBridging the gap in enterprise AI
Bridging the gap in enterprise AI
 
How I became ML Engineer
How I became ML Engineer How I became ML Engineer
How I became ML Engineer
 
Google Cloud Machine Learning
 Google Cloud Machine Learning  Google Cloud Machine Learning
Google Cloud Machine Learning
 
DevOps Days Rockies MLOps
DevOps Days Rockies MLOpsDevOps Days Rockies MLOps
DevOps Days Rockies MLOps
 
Ml infra at an early stage
Ml infra at an early stageMl infra at an early stage
Ml infra at an early stage
 
2020 | Metadata Day | LinkedIn
2020 | Metadata Day | LinkedIn2020 | Metadata Day | LinkedIn
2020 | Metadata Day | LinkedIn
 
Google cloud Study Jam 2023.pptx
Google cloud Study Jam 2023.pptxGoogle cloud Study Jam 2023.pptx
Google cloud Study Jam 2023.pptx
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018
 

Building A Machine Learning Platform At Quora (1)

  • 1. Building a Machine Learning Platform at Quora Nikhil Garg @nikhilgarg28 @Quora @MLconf 11/11/16 The Quora Answer To “Build vs Buy” For ML Platforms
  • 2. ● At Quora since 2012 ● Currently leading two ML engineering teams: ○ Content Quality ○ ML Platform A bit about me... @nikhilgarg28
  • 3.
  • 4. To Grow And Share World’s Knowledge
  • 5.
  • 6.
  • 7. Over 100 million monthly uniques Millions of questions & answers In hundreds of thousands of topics Supported by 80 engineers
  • 8. What Slows Down ML Innovation?
  • 9. ● Pipeline jungles ● Lots of glue code to get data in/out of general purpose packages. ● Strong coupling between business logic, data, ML algorithms and configuration. Curse Of Complexity
  • 10. ● Online vs offline ● Production vs experimentation ● C++ vs Python ● Engineering vs research ● ...even more glue code and pipeline jungles. Clash Of Titans
  • 11. ● Hard to reuse existing features, data, algorithms, tooling etc. ● Too costly to even get off the ground. Getting New Applications Off The Ground http://www.qvidian.com/blog/resistance-to-change-sales-organizations
  • 12. Many Faces Of Chaos
  • 13. One ring to bring them all and in the darkness bind them!
  • 14. Collection of systems to sustainably increase the business impact of ML at scale. Machine Learning Platform
  • 16. The Quora Answer: Build For Seven Reasons
  • 17. Reason # 7 Just Can’t Buy Everything!
  • 18. ● No matter how powerful the platform is, still need to maintain some form of integration ● This thin integration layer then becomes the platform. ● Real questions -- ○ How much does this in-house layer delegate? ○ How much control does it have over delegation? . Degree Of Integration & Delegation
  • 19. Reason # 6 Fast Scalable Production Systems
  • 20. End-To-End Online Production Systems ● External platforms at best can deploy “predictive models”, as services, not end-to-end online systems ● Gains come from optimizing the whole pipeline, not just algorithms. ● Latency: tens of milliseconds. Managing sharding, batching, data locality, caching, streaming, stragglers, graceful degradation... ● Real world systems -- boosts, diversity constraints, holes in data, skipping stages, hard filters… sounds familiar? Candidate Generation Feature Extraction Scoring Post Processing Data
  • 21. Reason # 5 Blurry Line Between Experimentation & Production
  • 22. ● We want the same code/systems/tools to work for both experimentation & production. ● But we need to carefully “control” the production code to keep it be fast. ● So need to “control” offline experimentation systems too. Candidate Generation Feature Extraction Scoring Post Processing Data Candidate Generation Feature Extraction Training
  • 23. Reason # 4 Openly Using Open Source
  • 24.
  • 25. ● Logistic Regression ● Elastic Nets ● Random Forests ● Gradient Boosted Decision Trees ● Matrix Factorization ● (Deep) Neural Networks ● LambdaMart ● Clustering ● Random walk based methods ● Word Embeddings ● LDA ● ... Production ML Algorithms At Quora Candidate Generation Feature Extraction Training/Scoring Post Processing Data
  • 26. ● Open source is great -- lots of great technologies! ● Commerical ML platforms are also open sourcing stuff. ● Learning and cherry-picking favorite parts from ANY open source systems. ● May write our own algorithms too (e.g QMF) ● Building own platform = controlling the delegation, not lack of delegation
  • 27. Reason # 3 Commercial Platforms’ Offerings Are Not Super Valuable To Us
  • 28. ● Main offerings of external platforms are: ○ Lower operational overhead of running machines ○ Out-of-box distributed training. ● Operational overhead ○ Gets amortized over time ○ Shared with non-ML infrastructure. ● Can often train most models in a single multi-core machine. .
  • 29. Reason # 2 Blurry Line Between ML & Product Dev
  • 30. ● Answer ranking ● Feed ranking ● Search ranking ● User recommendations ● Topic recommendations ● Duplicate questions ● Email Digest ● Request Answers ● Trending now ● Topic expertise prediction ● Spam, abuse detection ● …. Blurry Line Between ML/Non-ML Product
  • 31. Blurry Line Between ML/Non-ML Data Users Answers Questions Topics Votes Follow Ask Write Cast Have Contain Get Comments Get Follow Write Have Have Billions of relationships and words
  • 32. Blurry Line Between ML/Non-ML Codebase ● Integration with other utility libraries/services e.g A/B testing, debug tools, monitoring, alerting, data transfer, ... ● Empowering all product engineers to do ML.
  • 33. Reason # 1 ML As Quora’s Core Competency
  • 34. ● ML gives us a strategic competitive advantage. ● Want to control and develop deep expertise in the whole stack. ● Quora has a long term focus -- investment in platform more than pays off in the long term. ● Single most important reason to build ML Platform! ML: Critical For Our Strategic Focus Relevance Quality Demand
  • 36. ● Anyone doing non-trivial ML needs an ML platform to sustain innovation at scale. ● Build vs buy decision is not all-or-nothing. ● Surface area and importance of ML are deciding factors in the build vs buy decision.