SlideShare une entreprise Scribd logo
1  sur  36
Building robust
machine learning
systems
Or, how to sleep well when running machine
learning systems in production
ravelin.com
@sjwhitworth
InfoQ.com: News & Community Site
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations
/ravelin-ml-best-practices
• Over 1,000,000 software developers, architects and CTOs read the site world-
wide every month
• 250,000 senior developers subscribe to our weekly newsletter
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• 2 dedicated podcast channels: The InfoQ Podcast, with a focus on
Architecture and The Engineering Culture Podcast, with a focus on building
• 96 deep dives on innovative topics packed as downloadable emags and
minibooks
• Over 40 new content items per week
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon London
www.qconlondon.com
Building robust machine learning systems
Me & Ravelin
ravelin.com
● Co-founder and engineer at Ravelin - protect merchants from credit
card fraud in real time
● Ingest large amounts of data about customer behaviour
● Use ML to return likelihood of fraud in milliseconds
● Clients label customers to improve system continually
● Go and Python shop
● Use other strategies in addition to ML
● Merchants bear the cost of credit card fraud - can lose 1-5% of
revenue with no protection, with heavy fines
● Adversarial problem - fraudsters continually trying to evade us
● If our ML messes up:
○ False positives - good customers hassled
○ False negatives - let fraudsters through
● Move quickly, but want to be sure we’re getting it right
Building robust machine learning systems
Fraud detection & prevention
ravelin.com
ravelin.com
Systems are moving from the deterministic to
the probabilistic
ML learns functions from data, instead of
building them
Building robust machine learning systems
Software engineering - we aim for determinism
of individual components, provably correct
ravelin.com
Building robust machine learning systems
Machine learning - we learn functions from a
snapshot in time, with some level of predictive
performance - ‘ground truth’ is approximate
ravelin.com
Building robust machine learning systems
ravelin.com
How can we make machine learning systems
less of a mound of complexity and tech debt,
and more like normal software?
Building robust machine learning systems
Building robust machine learning systems
● Silent failures and performance degradation
● Current actions impact future performance
● Unsure if loss function is a good proxy
● High base complexity
● Lots of supporting infrastructure needed
● Easy to mess up
Why is running robust ML systems hard?
ravelin.com
Building robust machine learning systems
● We discuss algorithms, frameworks, papers
● We don’t really talk about designing, running, debugging, operating ML
systems
● Rules of Machine Learning: Best Practices for ML Engineering (Martin
Zinkevich)
● Tweet I saw in reaction: ‘Spent the last 10 years of my life learning these
problems the hard way’
We should talk about it more
ravelin.com
● Does what it needs to do, well
● Well tested
● Easily deployable/upgradeable
● Quick to debug problems, and avoid in future
● Audited, version controlled, automated,
reproducible
Key aspects of good software
ravelin.com
Building robust machine learning systems
Training
ravelin.com
Building robust machine learning systems
Building robust machine learning systems
Labels & ‘truth’
ravelin.com
● Can you trust your labels?
● Are they fuzzy or subject to feedback loops?
● Are you labels instant? Or do they suffer from a time delay?
● Implicit labels vs. explicit labels
● Your system’s actions may stop you getting labels
● Fuzzy problem, but the most important!
Building robust machine learning systems
Data
ravelin.com
● As a general rule, be wary when touching your training data - you’re
explicitly biasing
● Filtering examples out of your training data can work
● Sampling training data - sometimes computationally necessary
● Do you need a time machine?
Testing
ravelin.com
Building robust machine learning systems
Test your data
ravelin.com
● Features are a function of your code, and your data. Either could be
broken. Yay!
● Test invariants in your feature extraction process
● ‘This feature should monotonically increase with time per customer’
● ‘The mean of this feature across all examples should be 1’
● ‘This value should always be positive’
● We wrote a small system on top of Pandas to do so
● Flushes out bugs in feature extraction, and bad data
Building robust machine learning systems
● Build assertion set of examples that you’d be highly embarrassed to
get wrong
● Don’t fit to it - use as unit test
● Failures / regressions on this set indicates something has gone awfully
wrong
● Never trust any big jumps in performance without verification
● Understand/cluster most misclassified examples
● Errors should be somewhat random, not systemic
Test your models
ravelin.com
Building robust machine learning systems
Test your system + infrastructure
ravelin.com
● Test that you get the same prediction offline that you do online
● Ensure you don’t leak information from the future
● If system is latency sensitive, add performance benchmarks to
automation
● Live integration tests - be the client, send predictions, assert they’re
right
Building robust machine learning systems
Deploying
ravelin.com
Building robust machine learning systems
Shipping
● Ensure you share as much code as possible between offline and online
● If you have a difference between offline and online, your system will
fail silently
● Thus, throwing models over the wall to developers to rebuild from
scratch isn’t a great idea
● Abstract data source, to the computation of features on that data
● Pickles, PMML, weights - anything to let ML people ship quick
ravelin.com
Building robust machine learning systems
Rollout
ravelin.com
● Run new model side by side with live system, in production
● Canary, release in dark mode, A/B test
● Manually inspect biggest differences online - are they sensible or
spurious?
Building robust machine learning systems
Metrics and logging
ravelin.com
Building robust machine learning systems
● Measure the difference in probabilities between your live + offline
model. Small differences, lower risk in deploying
● Instrument (Prometheus, Datadog), and log (BigQuery / S3) everything
● Instrument feature extraction in the same way you tested offline data
● Instrument inference distribution (e.g. p99 of of probability offline
should match p99 online)
● Measure whatever business metric this system is trying to improve
Debugging bad predictions
ravelin.com
Building robust machine learning systems
Look at the data!
● This is so obvious as to be nearly insulting
● We love to just jump in and Do Machine Learning™
● Models learn the world that you present them
● Debug when training your model, not only live predictions
● Add to the ‘obvious set’, to prevent future regressions
ravelin.com
Building robust machine learning systems
Make your model explain itself
● Random forests: walk the trees for bad predictions
● Neural nets: T-SNE on embeddings
● Linear models - relatively easy
● Usual suspects: difference between offline/online, information
leakage, or model cheating
● Few packages to help: LIME, eli5
● The Mythos Of Model Interpretability - ZC Lipton
ravelin.com
Building robust machine learning systems
Automation
ravelin.com
Building robust machine learning systems
Why automate?
ravelin.com
● Training in notebooks or shells works, until something goes wrong
● End up training on ‘data_1_final_edited_really_final (1)(2).csv’
● Hard to replicate state for troubleshooting
● Experiments buggy and hard to replicate
● Wasting human time
Building robust machine learning systems
Automate everything
ravelin.com
● Choose a task manager (AirFlow/Luigi), use it
● Automate whole pipeline: extraction -> hyperparam search -> training
-> evaluation -> comparison against current system
● Fresh models + fresh data acts as integration test
● Automation helps distill things you care about
● Team much more productive
Building robust machine learning systems
Auditing models
ravelin.com
● Who trained the model?
● When? What git hash?
● On what raw data?
● Which features did you use?
● With which hyperparameters?
● Bake these answers into your models so you can interact with them
programmatically
Building robust machine learning systems
Reproducibility and archiving
● Can you reproduce it?
● Control random seeds through pipeline
● Archive training stage to (cloud) storage - features, logs,
hyperparameter searches, models
● A model is a snapshot of the world around it at a given time
● Taking a snapshot of that snapshot is a good idea - you’ll thank me
later
ravelin.com
Building robust machine learning systems
ravelin.com
‘Do machine learning like the great engineer you
are, not like the great machine learning expert
you aren’t’ - Zinkevich
Building robust machine learning systems
ravelin.com
Building robust machine learning systems
ML systems = normal software + extra paranoia
Thanks!
@sjwhitworth
ravelin.com
Watch the video with slide synchronization on
InfoQ.com!
https://www.infoq.com/presentations/ravelin-
ml-best-practices

Contenu connexe

Plus de C4Media

Plus de C4Media (20)

Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery Teams
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in Adtech
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/await
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven Utopia
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
Are We Really Cloud-Native?
Are We Really Cloud-Native?Are We Really Cloud-Native?
Are We Really Cloud-Native?
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL Database
 
A Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinA Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with Brooklin
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Dernier (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

Building Robust Machine Learning Systems

  • 1. Building robust machine learning systems Or, how to sleep well when running machine learning systems in production ravelin.com @sjwhitworth
  • 2. InfoQ.com: News & Community Site Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations /ravelin-ml-best-practices • Over 1,000,000 software developers, architects and CTOs read the site world- wide every month • 250,000 senior developers subscribe to our weekly newsletter • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • 2 dedicated podcast channels: The InfoQ Podcast, with a focus on Architecture and The Engineering Culture Podcast, with a focus on building • 96 deep dives on innovative topics packed as downloadable emags and minibooks • Over 40 new content items per week
  • 3. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon London www.qconlondon.com
  • 4. Building robust machine learning systems Me & Ravelin ravelin.com ● Co-founder and engineer at Ravelin - protect merchants from credit card fraud in real time ● Ingest large amounts of data about customer behaviour ● Use ML to return likelihood of fraud in milliseconds ● Clients label customers to improve system continually ● Go and Python shop ● Use other strategies in addition to ML
  • 5. ● Merchants bear the cost of credit card fraud - can lose 1-5% of revenue with no protection, with heavy fines ● Adversarial problem - fraudsters continually trying to evade us ● If our ML messes up: ○ False positives - good customers hassled ○ False negatives - let fraudsters through ● Move quickly, but want to be sure we’re getting it right Building robust machine learning systems Fraud detection & prevention ravelin.com
  • 6. ravelin.com Systems are moving from the deterministic to the probabilistic ML learns functions from data, instead of building them Building robust machine learning systems
  • 7. Software engineering - we aim for determinism of individual components, provably correct ravelin.com Building robust machine learning systems
  • 8. Machine learning - we learn functions from a snapshot in time, with some level of predictive performance - ‘ground truth’ is approximate ravelin.com Building robust machine learning systems
  • 9. ravelin.com How can we make machine learning systems less of a mound of complexity and tech debt, and more like normal software? Building robust machine learning systems
  • 10.
  • 11. Building robust machine learning systems ● Silent failures and performance degradation ● Current actions impact future performance ● Unsure if loss function is a good proxy ● High base complexity ● Lots of supporting infrastructure needed ● Easy to mess up Why is running robust ML systems hard? ravelin.com
  • 12. Building robust machine learning systems ● We discuss algorithms, frameworks, papers ● We don’t really talk about designing, running, debugging, operating ML systems ● Rules of Machine Learning: Best Practices for ML Engineering (Martin Zinkevich) ● Tweet I saw in reaction: ‘Spent the last 10 years of my life learning these problems the hard way’ We should talk about it more ravelin.com
  • 13. ● Does what it needs to do, well ● Well tested ● Easily deployable/upgradeable ● Quick to debug problems, and avoid in future ● Audited, version controlled, automated, reproducible Key aspects of good software ravelin.com Building robust machine learning systems
  • 15. Building robust machine learning systems Labels & ‘truth’ ravelin.com ● Can you trust your labels? ● Are they fuzzy or subject to feedback loops? ● Are you labels instant? Or do they suffer from a time delay? ● Implicit labels vs. explicit labels ● Your system’s actions may stop you getting labels ● Fuzzy problem, but the most important!
  • 16. Building robust machine learning systems Data ravelin.com ● As a general rule, be wary when touching your training data - you’re explicitly biasing ● Filtering examples out of your training data can work ● Sampling training data - sometimes computationally necessary ● Do you need a time machine?
  • 18. Test your data ravelin.com ● Features are a function of your code, and your data. Either could be broken. Yay! ● Test invariants in your feature extraction process ● ‘This feature should monotonically increase with time per customer’ ● ‘The mean of this feature across all examples should be 1’ ● ‘This value should always be positive’ ● We wrote a small system on top of Pandas to do so ● Flushes out bugs in feature extraction, and bad data Building robust machine learning systems
  • 19. ● Build assertion set of examples that you’d be highly embarrassed to get wrong ● Don’t fit to it - use as unit test ● Failures / regressions on this set indicates something has gone awfully wrong ● Never trust any big jumps in performance without verification ● Understand/cluster most misclassified examples ● Errors should be somewhat random, not systemic Test your models ravelin.com Building robust machine learning systems
  • 20. Test your system + infrastructure ravelin.com ● Test that you get the same prediction offline that you do online ● Ensure you don’t leak information from the future ● If system is latency sensitive, add performance benchmarks to automation ● Live integration tests - be the client, send predictions, assert they’re right Building robust machine learning systems
  • 22. Shipping ● Ensure you share as much code as possible between offline and online ● If you have a difference between offline and online, your system will fail silently ● Thus, throwing models over the wall to developers to rebuild from scratch isn’t a great idea ● Abstract data source, to the computation of features on that data ● Pickles, PMML, weights - anything to let ML people ship quick ravelin.com Building robust machine learning systems
  • 23. Rollout ravelin.com ● Run new model side by side with live system, in production ● Canary, release in dark mode, A/B test ● Manually inspect biggest differences online - are they sensible or spurious? Building robust machine learning systems
  • 24. Metrics and logging ravelin.com Building robust machine learning systems ● Measure the difference in probabilities between your live + offline model. Small differences, lower risk in deploying ● Instrument (Prometheus, Datadog), and log (BigQuery / S3) everything ● Instrument feature extraction in the same way you tested offline data ● Instrument inference distribution (e.g. p99 of of probability offline should match p99 online) ● Measure whatever business metric this system is trying to improve
  • 25. Debugging bad predictions ravelin.com Building robust machine learning systems
  • 26. Look at the data! ● This is so obvious as to be nearly insulting ● We love to just jump in and Do Machine Learning™ ● Models learn the world that you present them ● Debug when training your model, not only live predictions ● Add to the ‘obvious set’, to prevent future regressions ravelin.com Building robust machine learning systems
  • 27. Make your model explain itself ● Random forests: walk the trees for bad predictions ● Neural nets: T-SNE on embeddings ● Linear models - relatively easy ● Usual suspects: difference between offline/online, information leakage, or model cheating ● Few packages to help: LIME, eli5 ● The Mythos Of Model Interpretability - ZC Lipton ravelin.com Building robust machine learning systems
  • 29. Why automate? ravelin.com ● Training in notebooks or shells works, until something goes wrong ● End up training on ‘data_1_final_edited_really_final (1)(2).csv’ ● Hard to replicate state for troubleshooting ● Experiments buggy and hard to replicate ● Wasting human time Building robust machine learning systems
  • 30. Automate everything ravelin.com ● Choose a task manager (AirFlow/Luigi), use it ● Automate whole pipeline: extraction -> hyperparam search -> training -> evaluation -> comparison against current system ● Fresh models + fresh data acts as integration test ● Automation helps distill things you care about ● Team much more productive Building robust machine learning systems
  • 31. Auditing models ravelin.com ● Who trained the model? ● When? What git hash? ● On what raw data? ● Which features did you use? ● With which hyperparameters? ● Bake these answers into your models so you can interact with them programmatically Building robust machine learning systems
  • 32. Reproducibility and archiving ● Can you reproduce it? ● Control random seeds through pipeline ● Archive training stage to (cloud) storage - features, logs, hyperparameter searches, models ● A model is a snapshot of the world around it at a given time ● Taking a snapshot of that snapshot is a good idea - you’ll thank me later ravelin.com Building robust machine learning systems
  • 33. ravelin.com ‘Do machine learning like the great engineer you are, not like the great machine learning expert you aren’t’ - Zinkevich Building robust machine learning systems
  • 34. ravelin.com Building robust machine learning systems ML systems = normal software + extra paranoia
  • 36. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ravelin- ml-best-practices