SlideShare une entreprise Scribd logo
1  sur  36
11
Recommendations for
Building Machine Learning Software
Justin Basilico
Page Algorithms Engineering May 19, 2016
@JustinBasilico
22
Introduction
3
Change of focus
2006 2016
4
Netflix Scale
 > 81M members
 > 190 countries
 > 1000 device types
 > 3B hours/month
 > 36% of peak US
downstream traffic
5
Goal
Help members find content to watch and enjoy
to maximize member satisfaction and retention
6
Everything is a Recommendation
Rows
Ranking
Over 80% of what
people watch
comes from our
recommendations
Recommendations
are driven by
Machine Learning
7
Machine Learning Approach
Problem
Data
AlgorithmModel
Metrics
8
Models & Algorithms
 Regression (linear, logistic, elastic net)
 SVD and other Matrix Factorizations
 Factorization Machines
 Restricted Boltzmann Machines
 Deep Neural Networks
 Markov Models and Graph Algorithms
 Clustering
 Latent Dirichlet Allocation
 Gradient Boosted Decision
Trees/Random Forests
 Gaussian Processes
 …
9
Design Considerations
Recommendations
• Personal
• Accurate
• Diverse
• Novel
• Fresh
Software
• Scalable
• Responsive
• Resilient
• Efficient
• Flexible
10
Software Stack
http://techblog.netflix.com
1111
Recommendations
12
Be flexible about where and when
computation happens
Recommendation 1
13
System Architecture
 Offline: Process data
 Batch learning
 Nearline: Process events
 Model evaluation
 Online learning
 Asynchronous
 Online: Process requests
 Real-time
Netflix.Hermes
Netflix.Manhattan
Nearline
Computation
Models
Online
Data Service
Offline Data
Model
training
Online
Computation
Event Distribution
User Event
Queue
Algorithm
Service
UI Client
Member
Query results
Recommendations
NEARLINE
Machine
Learning
Algorithm
Machine
Learning
Algorithm
Offline
Computation Machine
Learning
Algorithm
Play, Rate,
Browse...
OFFLINE
ONLINE
More details on Netflix Techblog
14
Where to place components?
 Example: Matrix Factorization
 Offline:
 Collect sample of play data
 Run batch learning algorithm like
SGD to produce factorization
 Publish video factors
 Nearline:
 Solve user factors
 Compute user-video dot products
 Store scores in cache
 Online:
 Presentation-context filtering
 Serve recommendations
Netflix.Hermes
Netflix.Manhattan
Nearline
Computation
Models
Online
Data Service
Offline Data
Model
training
Online
Computation
Event Distribution
User Event
Queue
Algorithm
Service
UI Client
Member
Query results
Recommendations
NEARLINE
Machine
Learning
Algorithm
Machine
Learning
Algorithm
Offline
Computation Machine
Learning
Algorithm
Play, Rate,
Browse...
OFFLINE
ONLINE
V
sij=uivj Aui=b
sij
X≈UVt
X
sij>t
15
Design application software for
experimentation
Recommendation 2
16
Example development process
Idea Data
Offline
Modeling
(R, Python,
MATLAB, …)
Iterate
Implement in
production
system (Java,
C++, …)
Data
discrepancies
Missing post-
processing
logic
Performance
issues
Actual
output
Experimentation environment
Production environment
(A/B test) Code
discrepancies
Final
model
17
Solution: Share and lean towards production
 Developing machine learning is iterative
 Need a short pipeline to rapidly try ideas
 Want to see output of complete system
 So make the application easy to experiment with
 Share components between online, nearline, and offline
 Use the real code whenever possible
 Have well-defined interfaces and formats to allow you to go
off-the-beaten-path
18
Shared Engine
Avoid dual implementations
Experiment
code
Production
code
ProductionExperiment • Models
• Features
• Algorithms
• …
19
Make algorithms extensible and modular
Recommendation 3
20
Make algorithms and models extensible and modular
 Algorithms often need to be tailored for a
specific application
 Treating an algorithm as a black box is
limiting
 Better to make algorithms extensible and
modular to allow for customization
 Separate models and algorithms
 Many algorithms can learn the same model
(i.e. linear binary classifier)
 Many algorithms can be trained on the same
types of data
 Support composing algorithms
Data
Parameters
Data
Model
Parameters
Model
Algorithm
Vs.
21
Provide building blocks
 Don’t start from scratch
 Linear algebra: Vectors, Matrices, …
 Statistics: Distributions, tests, …
 Models, features, metrics, ensembles, …
 Loss, distance, kernel, … functions
 Optimization, inference, …
 Layers, activation functions, …
 Initializers, stopping criteria, …
 …
 Domain-specific components
Build abstractions on
familiar concepts
Make the software put
them together
22
Example: Tailoring Random Forests
Using Cognitive Foundry: http://github.com/algorithmfoundry/Foundry
Use a custom
tree split
Customize to
run it for an
hour
Report a
custom metric
each iteration
Inspect the
ensemble
23
Describe your input and output
transformations with your model
Recommendation 4
24
Application
Putting learning in an application
Feature
Encoding
Output
Decoding
?
Machine
Learned Model
Rd ⟶ Rk
Application or model code?
25
Example: Simple ranking system
 High-level API: List<Video> rank(User u, List<Video> videos)
 Example model description file:
{
“type”: “ScoringRanker”,
“scorer”: {
“type”: “FeatureScorer”,
“features”: [
{“type”: “Popularity”, “days”: 10},
{“type”: “PredictedRating”}
],
“function”: {
“type”: “Linear”,
“bias”: -0.5,
“weights”: {
“popularity”: 0.2,
“predictedRating”: 1.2,
“predictedRating*popularity”:
3.5
}
}
}
Ranker
Scorer
Features
Linear function
Feature transformations
26
Maximize out a single machine before
distributing your algorithms
Recommendation 5
27
Problem: Your great new algorithm doesn’t scale
 Want to run your algorithm on larger data
 Temptation to go distributed
 Spark/Hadoop/etc seem to make it easy
 But building distributed versions of non-trivial ML algorithms is hard
 Often means changing the algorithm or making lots of approximations
 So try to squeeze as much out of a single machine first
 Have a lot more communication bandwidth via memory than network
 You will be surprised how far one machine can go
 Example: Amazon announced today an X1 instance type with 2TB
memory and 128 virtual CPUs
28
How?
 Profile your code and think about memory
cache layout
 Small changes can have a big impact
 Example: Transposing a matrix can drop
computation from 100ms to 3ms
 Go multicore
 Algorithms like HogWild for SGD-type optimization
can make this very easy
 Use specialized resources like GPU (or TPU?)
 Only go distributed once you’ve optimized on
these dimensions (often you won’t need to)
29
Example: Training Neural Networks
 Level 1: Machines in different
AWS regions
 Level 2: Machines in same AWS
region
 Simple: Grid search
 Better: Bayesian optimization using
Gaussian Processes
 Mesos, Spark, etc. for coordination
 Level 3: Highly optimized, parallel
CUDA code on GPUs
30
Don’t just rely on metrics for testing
Recommendation 6
31
Machine Learning and Testing
 Temptation: Use validation metrics to test software
 When things work and metrics go up this seems great
 When metrics don’t improve was it the
 code
 data
 metric
 idea
 …?
32
Reality of Testing
 Machine learning code involves intricate math and logic
 Rounding issues, corner cases, …
 Is that a + or -? (The math or paper could be wrong.)
 Solution: Unit test
 Testing of metric code is especially important
 Test the whole system: Just unit testing is not enough
 At a minimum, compare output for unexpected changes across
versions
3333
Conclusions
34
Two ways to solve computational problems
Know
solution
Write code
Compile
code
Test code Deploy code
Know
relevant
data
Develop
algorithmic
approach
Train model
on data using
algorithm
Validate
model with
metrics
Deploy
model
Software Development
Machine Learning
(steps may involve Software Development)
35
Take-aways for building machine learning software
 Building machine learning is an iterative process
 Make experimentation easy
 Take a holistic view of application where you are placing
learning
 Design your algorithms to be modular
 Optimize how your code runs on a single machine before
going distributed
 Testing can be hard but is worthwhile
36
Thank You Justin Basilico
jbasilico@netflix.com
@JustinBasilico
We’re hiring

Contenu connexe

Tendances

Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Sudeep Das, Ph.D.
 
Personalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningAnoop Deoras
 
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsJustin Basilico
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized HomepageJustin Basilico
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixJustin Basilico
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsJustin Basilico
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Anoop Deoras
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at NetflixJustin Basilico
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsYves Raimond
 
Machine Learning at Netflix Scale
Machine Learning at Netflix ScaleMachine Learning at Netflix Scale
Machine Learning at Netflix ScaleAish Fenton
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at NetflixLinas Baltrunas
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender modelsParmeshwar Khurd
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectiveJustin Basilico
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...Sudeep Das, Ph.D.
 
Netflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time TravelNetflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time TravelFaisal Siddiqi
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated RecommendationsHarald Steck
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixJaya Kawale
 
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...Justin Basilico
 
Data council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixGrace T. Huang
 

Tendances (20)

Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
 
Personalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep Learning
 
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender Systems
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized Homepage
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Learning to Personalize
Learning to PersonalizeLearning to Personalize
Learning to Personalize
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Machine Learning at Netflix Scale
Machine Learning at Netflix ScaleMachine Learning at Netflix Scale
Machine Learning at Netflix Scale
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at Netflix
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender models
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 
Netflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time TravelNetflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time Travel
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated Recommendations
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
 
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
 
Data council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at Netflix
 

Similaire à Recommendations for Building Machine Learning Software

Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...MLconf
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareJustin Basilico
 
Presentation Verification & Validation
Presentation Verification & ValidationPresentation Verification & Validation
Presentation Verification & ValidationElmar Selbach
 
What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?Matei Zaharia
 
Production model lifecycle management 2016 09
Production model lifecycle management 2016 09Production model lifecycle management 2016 09
Production model lifecycle management 2016 09Greg Makowski
 
Automotive engineering design - Model Based Design
Automotive engineering design - Model Based DesignAutomotive engineering design - Model Based Design
Automotive engineering design - Model Based DesignVinayagam Mariappan
 
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...Joachim Schlosser
 
Model driven engineering for big data management systems
Model driven engineering for big data management systemsModel driven engineering for big data management systems
Model driven engineering for big data management systemsMarcos Almeida
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaData Science Milan
 
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Spark Summit
 
The Magic Of Application Lifecycle Management In Vs Public
The Magic Of Application Lifecycle Management In Vs PublicThe Magic Of Application Lifecycle Management In Vs Public
The Magic Of Application Lifecycle Management In Vs PublicDavid Solivan
 
AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...Institute of Contemporary Sciences
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsAnyscale
 
201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019Mark Tabladillo
 
Deploying Data Science Engines to Production
Deploying Data Science Engines to ProductionDeploying Data Science Engines to Production
Deploying Data Science Engines to ProductionMostafa Majidpour
 
Danny Bickson - Python based predictive analytics with GraphLab Create
Danny Bickson - Python based predictive analytics with GraphLab Create Danny Bickson - Python based predictive analytics with GraphLab Create
Danny Bickson - Python based predictive analytics with GraphLab Create PyData
 
AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software EngineeringMiroslaw Staron
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learningRajesh Muppalla
 
Novedades de MongoDB 3.6
Novedades de MongoDB 3.6Novedades de MongoDB 3.6
Novedades de MongoDB 3.6MongoDB
 

Similaire à Recommendations for Building Machine Learning Software (20)

Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Presentation Verification & Validation
Presentation Verification & ValidationPresentation Verification & Validation
Presentation Verification & Validation
 
What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?
 
Production model lifecycle management 2016 09
Production model lifecycle management 2016 09Production model lifecycle management 2016 09
Production model lifecycle management 2016 09
 
Automotive engineering design - Model Based Design
Automotive engineering design - Model Based DesignAutomotive engineering design - Model Based Design
Automotive engineering design - Model Based Design
 
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
 
Ds for finance day 4
Ds for finance day 4Ds for finance day 4
Ds for finance day 4
 
Model driven engineering for big data management systems
Model driven engineering for big data management systemsModel driven engineering for big data management systems
Model driven engineering for big data management systems
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
 
The Magic Of Application Lifecycle Management In Vs Public
The Magic Of Application Lifecycle Management In Vs PublicThe Magic Of Application Lifecycle Management In Vs Public
The Magic Of Application Lifecycle Management In Vs Public
 
AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
 
201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019
 
Deploying Data Science Engines to Production
Deploying Data Science Engines to ProductionDeploying Data Science Engines to Production
Deploying Data Science Engines to Production
 
Danny Bickson - Python based predictive analytics with GraphLab Create
Danny Bickson - Python based predictive analytics with GraphLab Create Danny Bickson - Python based predictive analytics with GraphLab Create
Danny Bickson - Python based predictive analytics with GraphLab Create
 
AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software Engineering
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 
Novedades de MongoDB 3.6
Novedades de MongoDB 3.6Novedades de MongoDB 3.6
Novedades de MongoDB 3.6
 

Dernier

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 

Dernier (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Recommendations for Building Machine Learning Software

  • 1. 11 Recommendations for Building Machine Learning Software Justin Basilico Page Algorithms Engineering May 19, 2016 @JustinBasilico
  • 4. 4 Netflix Scale  > 81M members  > 190 countries  > 1000 device types  > 3B hours/month  > 36% of peak US downstream traffic
  • 5. 5 Goal Help members find content to watch and enjoy to maximize member satisfaction and retention
  • 6. 6 Everything is a Recommendation Rows Ranking Over 80% of what people watch comes from our recommendations Recommendations are driven by Machine Learning
  • 8. 8 Models & Algorithms  Regression (linear, logistic, elastic net)  SVD and other Matrix Factorizations  Factorization Machines  Restricted Boltzmann Machines  Deep Neural Networks  Markov Models and Graph Algorithms  Clustering  Latent Dirichlet Allocation  Gradient Boosted Decision Trees/Random Forests  Gaussian Processes  …
  • 9. 9 Design Considerations Recommendations • Personal • Accurate • Diverse • Novel • Fresh Software • Scalable • Responsive • Resilient • Efficient • Flexible
  • 12. 12 Be flexible about where and when computation happens Recommendation 1
  • 13. 13 System Architecture  Offline: Process data  Batch learning  Nearline: Process events  Model evaluation  Online learning  Asynchronous  Online: Process requests  Real-time Netflix.Hermes Netflix.Manhattan Nearline Computation Models Online Data Service Offline Data Model training Online Computation Event Distribution User Event Queue Algorithm Service UI Client Member Query results Recommendations NEARLINE Machine Learning Algorithm Machine Learning Algorithm Offline Computation Machine Learning Algorithm Play, Rate, Browse... OFFLINE ONLINE More details on Netflix Techblog
  • 14. 14 Where to place components?  Example: Matrix Factorization  Offline:  Collect sample of play data  Run batch learning algorithm like SGD to produce factorization  Publish video factors  Nearline:  Solve user factors  Compute user-video dot products  Store scores in cache  Online:  Presentation-context filtering  Serve recommendations Netflix.Hermes Netflix.Manhattan Nearline Computation Models Online Data Service Offline Data Model training Online Computation Event Distribution User Event Queue Algorithm Service UI Client Member Query results Recommendations NEARLINE Machine Learning Algorithm Machine Learning Algorithm Offline Computation Machine Learning Algorithm Play, Rate, Browse... OFFLINE ONLINE V sij=uivj Aui=b sij X≈UVt X sij>t
  • 15. 15 Design application software for experimentation Recommendation 2
  • 16. 16 Example development process Idea Data Offline Modeling (R, Python, MATLAB, …) Iterate Implement in production system (Java, C++, …) Data discrepancies Missing post- processing logic Performance issues Actual output Experimentation environment Production environment (A/B test) Code discrepancies Final model
  • 17. 17 Solution: Share and lean towards production  Developing machine learning is iterative  Need a short pipeline to rapidly try ideas  Want to see output of complete system  So make the application easy to experiment with  Share components between online, nearline, and offline  Use the real code whenever possible  Have well-defined interfaces and formats to allow you to go off-the-beaten-path
  • 18. 18 Shared Engine Avoid dual implementations Experiment code Production code ProductionExperiment • Models • Features • Algorithms • …
  • 19. 19 Make algorithms extensible and modular Recommendation 3
  • 20. 20 Make algorithms and models extensible and modular  Algorithms often need to be tailored for a specific application  Treating an algorithm as a black box is limiting  Better to make algorithms extensible and modular to allow for customization  Separate models and algorithms  Many algorithms can learn the same model (i.e. linear binary classifier)  Many algorithms can be trained on the same types of data  Support composing algorithms Data Parameters Data Model Parameters Model Algorithm Vs.
  • 21. 21 Provide building blocks  Don’t start from scratch  Linear algebra: Vectors, Matrices, …  Statistics: Distributions, tests, …  Models, features, metrics, ensembles, …  Loss, distance, kernel, … functions  Optimization, inference, …  Layers, activation functions, …  Initializers, stopping criteria, …  …  Domain-specific components Build abstractions on familiar concepts Make the software put them together
  • 22. 22 Example: Tailoring Random Forests Using Cognitive Foundry: http://github.com/algorithmfoundry/Foundry Use a custom tree split Customize to run it for an hour Report a custom metric each iteration Inspect the ensemble
  • 23. 23 Describe your input and output transformations with your model Recommendation 4
  • 24. 24 Application Putting learning in an application Feature Encoding Output Decoding ? Machine Learned Model Rd ⟶ Rk Application or model code?
  • 25. 25 Example: Simple ranking system  High-level API: List<Video> rank(User u, List<Video> videos)  Example model description file: { “type”: “ScoringRanker”, “scorer”: { “type”: “FeatureScorer”, “features”: [ {“type”: “Popularity”, “days”: 10}, {“type”: “PredictedRating”} ], “function”: { “type”: “Linear”, “bias”: -0.5, “weights”: { “popularity”: 0.2, “predictedRating”: 1.2, “predictedRating*popularity”: 3.5 } } } Ranker Scorer Features Linear function Feature transformations
  • 26. 26 Maximize out a single machine before distributing your algorithms Recommendation 5
  • 27. 27 Problem: Your great new algorithm doesn’t scale  Want to run your algorithm on larger data  Temptation to go distributed  Spark/Hadoop/etc seem to make it easy  But building distributed versions of non-trivial ML algorithms is hard  Often means changing the algorithm or making lots of approximations  So try to squeeze as much out of a single machine first  Have a lot more communication bandwidth via memory than network  You will be surprised how far one machine can go  Example: Amazon announced today an X1 instance type with 2TB memory and 128 virtual CPUs
  • 28. 28 How?  Profile your code and think about memory cache layout  Small changes can have a big impact  Example: Transposing a matrix can drop computation from 100ms to 3ms  Go multicore  Algorithms like HogWild for SGD-type optimization can make this very easy  Use specialized resources like GPU (or TPU?)  Only go distributed once you’ve optimized on these dimensions (often you won’t need to)
  • 29. 29 Example: Training Neural Networks  Level 1: Machines in different AWS regions  Level 2: Machines in same AWS region  Simple: Grid search  Better: Bayesian optimization using Gaussian Processes  Mesos, Spark, etc. for coordination  Level 3: Highly optimized, parallel CUDA code on GPUs
  • 30. 30 Don’t just rely on metrics for testing Recommendation 6
  • 31. 31 Machine Learning and Testing  Temptation: Use validation metrics to test software  When things work and metrics go up this seems great  When metrics don’t improve was it the  code  data  metric  idea  …?
  • 32. 32 Reality of Testing  Machine learning code involves intricate math and logic  Rounding issues, corner cases, …  Is that a + or -? (The math or paper could be wrong.)  Solution: Unit test  Testing of metric code is especially important  Test the whole system: Just unit testing is not enough  At a minimum, compare output for unexpected changes across versions
  • 34. 34 Two ways to solve computational problems Know solution Write code Compile code Test code Deploy code Know relevant data Develop algorithmic approach Train model on data using algorithm Validate model with metrics Deploy model Software Development Machine Learning (steps may involve Software Development)
  • 35. 35 Take-aways for building machine learning software  Building machine learning is an iterative process  Make experimentation easy  Take a holistic view of application where you are placing learning  Design your algorithms to be modular  Optimize how your code runs on a single machine before going distributed  Testing can be hard but is worthwhile
  • 36. 36 Thank You Justin Basilico jbasilico@netflix.com @JustinBasilico We’re hiring

Notes de l'éditeur

  1. http://techblog.netflix.com/2013/03/system-architectures-for.html
  2. http://techblog.netflix.com/2014/02/distributed-neural-networks-with-gpus.html