Contenu connexe Similaire à Emerging Best Practises for Machine Learning Engineering- Lex Toumbourou (By ThoughtWorks) (20) Plus de Thoughtworks (20) Emerging Best Practises for Machine Learning Engineering- Lex Toumbourou (By ThoughtWorks)4. ©ThoughtWorks 2019 Commercial in Confidence 4
Why this talk?
● Aimed at individuals and organisations getting
started with Machine Learning.
● Reduce uncertainty, cost and time to deliver.
● Based on my own experience, my colleagues and
various other authors.
Photo by Ken Treloar on Unsplash
5. ©ThoughtWorks 2019 Commercial in Confidence 5
Talk overview
● Review of best practises or “sensible defaults” in
software projects.
● Consider challenges that ML projects introduce.
● Practises by ML projects phases.
Photo by Casey Horner on Unsplash
6. ©ThoughtWorks 2019 Commercial in Confidence 6
Motivating example
● Predict how long a pet will take to be
adopted based on its profile.
● Combination of structured, NLP and
image data.
● Generates a continual supply of
training data.
Petfinder.my Adoption Prediction
https://www.kaggle.com/c/petfinder-adoption-prediction
7. ©ThoughtWorks 2019 Commercial in Confidence 7
Terminology
● Supervised learning - process of learning a predictive function from training a dataset of
input / output pairs.
● Model - another name for the learned function and its parameters.
● Features - another word for inputs of model (sometimes engineered).
● Training set - whole collection of feature -> output pairs used to train model.
● Validation set - set of training data set aside to tune model.
● Test set - set of data used to evaluate model.
Review of terminology used throughout talk
9. ©ThoughtWorks 2019 Commercial in Confidence 9
Waterfall
● Based on the assumption that
sufficient upfront planning would
save time and money from rework
later in project
● Slow feedback loop
● Doesn’t account for unforeseeable
complexity
10. ©ThoughtWorks 2019 Commercial in Confidence 10
Iterative/Agile development
● Work in cycles (“sprints”) of
requirements, design, code and,
release.
● Rapid Application Development (RAD),
Rational Unified Process, XP
Programming, Scrum, Kanban etc.
https://blog.itil.org/2014/08/allgemein/what-it-
service-management-can-learn-from-the-agile-
manifesto-and-vice-versa/
11. ©ThoughtWorks 2019 Commercial in Confidence 11
Modern software excellence
● Continuous delivery.
● Fast feedback.
● Rigorous testing.
● Continuous integration.
● Sophisticated version control.
● Infrastructure as code (DevOps).
From Continuous Deliver in a Nutshell by Zaiku
13. ©ThoughtWorks 2019 Commercial in Confidence 13
Uncertain of outcomes
● Paradigm shift for product managers.
● Is this even a problem I can effectively
solve with Machine Learning?
Photo by Miguel Bruna on Unsplash
14. ©ThoughtWorks 2019 Commercial in Confidence 14
Training data requirements
● Unstructured problems: collecting big datasets (used
to be) a big barrier to entry.
● Structured datasets in the wild are often spread
across multiple sources with different governance
policies
Source unknown
15. ©ThoughtWorks 2019 Commercial in Confidence 15
Reproducibility requirements
● State must be consistent to allow
experiments to build upon each
other.
● Large datasets and artifacts don’t fit
into traditional version control tools
Photo by 85Fifteen on Unsplash
16. ©ThoughtWorks 2019 Commercial in Confidence 16
Slow feedback
● Large models can take from hours to days
to train.
● Models can be fiddly and difficult to train.
Photo by Nick Abrams on Unsplash
17. ©ThoughtWorks 2019 Commercial in Confidence 17
Model drift
● Models trained to make predictions on
today’s data have no guarantees they will
work on future data.
Photo by Josh Yang ∙ White. ∙ . on Unsplash
18. ©ThoughtWorks 2019 Commercial in Confidence 18
Blackbox-ness
● Hard to assess “correctness”.
● Production results may differ from
dev results.
● New class of concerns for QA and
support.
Photo by Emily Morter on Unsplash
22. ©ThoughtWorks 2019 Commercial in Confidence 22
Focus on product not tech
● Can you validate it without training any
models?
● Is there a open-source or vendor
solution that will get you close?
● Tip: if you use a vendor solution, you still
need to evaluate its performance with a
test set
Project-wide practises
Photo by Nicolas Hoizey on Unsplash
23. ©ThoughtWorks 2019 Commercial in Confidence 23
Fast cycle time
● Start small and increase complexity as
needed.
● “If you're not embarrassed by the first
version of your product (model), you've
launched too late” - Reid Hoffman
Project-wide practises
Photo by Fabian Bächli on Unsplash
24. ©ThoughtWorks 2019 Commercial in Confidence 24
Consistent code structure
● Document where to put things and
create a linter-enforce style guide.
http://flake8.pycqa.org/en/latest/
● Cookie cutter data science to reduce
“bike-shedding” and decision fatigue
https://github.com/drivendata/cooki
ecutter-data-science
Project-wide practises
Photo by Dan Ritson on Unsplash
26. ©ThoughtWorks 2019 Commercial in Confidence 26
Consider implications
● What happens if the model is bad?
● What are the implications of what
I’m optimising?
● Do we need a human in the loop?
Plan
https://www.slideshare.net/ThoughtWorks/social-implications-of-bias-in-
machine-learning-fiona-coath-by-thoughtworks-133798261
27. ©ThoughtWorks 2019 Commercial in Confidence 27
Pick an evaluation metric
● “Main” (even single) evaluation metric
based on after considering your problem
and data understanding.
https://www.coursera.org/lecture/machine-learning-
projects/single-number-evaluation-metric-wIKkC
● Baseline metric predicting at random or
majority class predictions.
Plan
https://www.biochemia-
medica.com/en/journal/22/3/10.11613/BM.2012.031
28. ©ThoughtWorks 2019 Commercial in Confidence 28
Plan test set
● Test set should be production data.
● Test set shouldn’t overlap with training
set.
● Newer data is (usually) most
important.
Plan
29. ©ThoughtWorks 2019 Commercial in Confidence 29
Determine run criteria
● How will our production infrastructure
constrain our model?
● How fast does the inference need to
be?
Plan
Photo by NORTHFOLK on Unsplash
30. ©ThoughtWorks 2019 Commercial in Confidence 30
Collect
1. Plan
2. Collect
3. Prepare
4. Train
5. Deploy
Photo by Phad Pichetbovornkul on Unsplash
31. ©ThoughtWorks 2019 Commercial in Confidence 31
Data scientist builds dataset
● If you are building models, you should have a
good understanding of how the dataset was
collected.
● Active learning can make this fast.
https://platform.ai/ https://prodi.gy
Collect
Building labelled dataset with platform.ai
32. ©ThoughtWorks 2019 Commercial in Confidence 32
Small data first
● Small datasets can (sometimes) go a long way.
● Transfer learning for image classification,
natural language processing and even
structured data.
Collect
Photo by Ayo Ogunseinde on Unsplash
33. ©ThoughtWorks 2019 Commercial in Confidence 33
More data > solution complexity
● “Most people overestimate the cost
associated with gathering and labeling
data, and underestimate the hardship
of solving problems in a data starved
environment.” - Emmanuel Ameisen
https://blog.insightdatascience.com/ho
w-to-deliver-on-machine-learning-
projects-c8d82ce642b0
Collect
Photo by Simon Maage on Unsplash
34. ©ThoughtWorks 2019 Commercial in Confidence 34
Share collected data
● Package and share collected
datasets.
https://dvc.org https://quiltdata.com
● Encourage centralised &
compliant storage (data lakes).
Collect
https://quiltdata.com/
36. ©ThoughtWorks 2019 Commercial in Confidence 36
Look at your data
● Look at random
examples.
● Histograms.
● Missingno for missing
number visualizations.
https://github.com/ResidentMario
/missingno
Prepare
37. ©ThoughtWorks 2019 Commercial in Confidence 37
ML-driven exploratory analysis (EDA)
● Aim to train a model fast then use
interpretability and SME knowledge to guide
feature engineering and data collection.
From Fast.ai’s Machine Learning for Coders
● GBM (XGBoost, LightGBM, Catboost)
software can handle missing values,
categorical values and varying scales out the
box.
Prepare
38. ©ThoughtWorks 2019 Commercial in Confidence 38
Version artifacts and pipelines
● Version control artifacts.
● Track the pipelines used to generate
features. https://dvc.org
● Pipenv & Poetry for tracking dependencies
chains.
https://pipenv.readthedocs.io/en/latest/
https://github.com/sdispater/poetry
Prepare
39. ©ThoughtWorks 2019 Commercial in Confidence 39
Practise good code hygiene
● Test-driven development for feature
engineering code: unit, integration, etc
● Refactor into modules.
● Fix bugs with your features before
worrying about hyperparameters.
Prepare
Photo by Piron Guillaume on Unsplash
41. ©ThoughtWorks 2019 Commercial in Confidence 41
“Easiest” models first
● Favour simple, interpretable models
initially.
● GBMs are great default choice for
structured data.
Train
https://towardsdatascience.com/interpretable-machine-learning-
with-xgboost-9ec80d148d27
42. ©ThoughtWorks 2019 Commercial in Confidence 42
Fast feedback
● Overfit first.
● Train on samples or small images etc
while testing experiments: Aim to keep
training time < 5 minutes.
● Val set from the same distribution as
test set
Train
https://www.bridgewateruk.com/2016/08/working-large-company-vs-working-
small-company-pros-cons/
43. ©ThoughtWorks 2019 Commercial in Confidence 43
Transfer learning
● (Almost) always start with a
pretrained model if possible.
● Transfer learning for image
classification and recently natural
language processing.
Universal Language Model Fine-tuning for Text
Classification
BERT: Pre-training of Deep Bidirectional Transformers
for Language Understanding
Train
https://machinelearningmastery.com/transfer-learning-for-deep-learning/
44. ©ThoughtWorks 2019 Commercial in Confidence 44
Constrain training to run criteria
● Constrain model selection to
suit run criteria.
● CatBoost, as alternative to
XGBoost, support fast
inference and model size
regularization
Train
45. ©ThoughtWorks 2019 Commercial in Confidence 45
Perform error analysis
● Error analysis by hand: look at 100
examples of errors and determine
common themes.
● View most confidence and least
confident predicts.
● Feature importance and ablation.
Train
From https://www.kdnuggets.com/2018/01/error-analysis-your-rescue.html
based on ideas by Andrew Ng
47. ©ThoughtWorks 2019 Commercial in Confidence 47
Go to prod early
● Test your model on data and
conditions in prod early.
● A/B deployments: new model
receives inputs alongside production
model to compare performance.
Deploy
Model
A
Model
B
48. ©ThoughtWorks 2019 Commercial in Confidence 48
Validate inputs (and outputs)
● Fast feedback on prod data not
accounted for in training / test set.
● Pydantic validates using Python types.
https://github.com/samuelcolvin/pydanti
c
Deploy
Photo by Fancycrave on Unsplash
49. ©ThoughtWorks 2019 Commercial in Confidence 49
Minimise ops
● Aim for Serverless and low infrastructure.
● Automate deployments.
● Developers and data scientists on call.
Deploy
50. ©ThoughtWorks 2019 Commercial in Confidence 50
Monitor metric
● Monitor metric by continually building
new test sets.
● Track performance over time.
● Schedule retraining.
Deploy
Photo by Kyle Hanson on Unsplash
51. ©ThoughtWorks 2019 Commercial in Confidence 51
Accessible interpretability tools
● Data scientist should create
tools to make model
accessible to all.
● Interpretability dashboards to
make predictions against real
data and view interpretations.
https://www.thoughtworks.com/clients/ark
ose-labs
Deploy
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
A Unified Approach to Interpreting Model Predictions
52. ©ThoughtWorks 2019 Commercial in Confidence
Conclusion
● Research is uncertain but we can define clear goals.
● Data can be collected iteratively.
● Carefully track data, artifacts and pipelines for
reproducibility.
● Aim for fast feedback while training models.
● Deploy early and monitor production.
● Make interpretability tools accessible to the organisation.