Emerging Best Practises for Machine Learning Engineering- Lex Toumbourou (By ThoughtWorks)

EMERGING BEST PRACTICES FOR
MACHINE LEARNING ENGINEERING
Lex Toumbourou

2
Ready to
make your
mark?
/careers
©ThoughtWorks 2019 Commercial in Confidence

3
Emerging Best Practises
for
Machine Learning Engineering
Lex Toumbourou

©ThoughtWorks 2019 Commercial in Confidence 4
Why this talk?
● Aimed at individuals and organisations getting
started with Machine Learning.
● Reduce uncertainty, cost and time to deliver.
● Based on my own experience, my colleagues and
various other authors.
Photo by Ken Treloar on Unsplash

Talk overview
● Review of best practises or “sensible defaults” in
software projects.
● Consider challenges that ML projects introduce.
● Practises by ML projects phases.
Photo by Casey Horner on Unsplash

Motivating example
● Predict how long a pet will take to be
adopted based on its profile.
● Combination of structured, NLP and
image data.
● Generates a continual supply of
training data.
Petfinder.my Adoption Prediction
https://www.kaggle.com/c/petfinder-adoption-prediction

Terminology
● Supervised learning - process of learning a predictive function from training a dataset of
input / output pairs.
● Model - another name for the learned function and its parameters.
● Features - another word for inputs of model (sometimes engineered).
● Training set - whole collection of feature -> output pairs used to train model.
● Validation set - set of training data set aside to tune model.
● Test set - set of data used to evaluate model.
Review of terminology used throughout talk

Part 1. Software engineering
“best practises”

Waterfall
● Based on the assumption that
sufficient upfront planning would
save time and money from rework
later in project
● Slow feedback loop
● Doesn’t account for unforeseeable
complexity

Iterative/Agile development
● Work in cycles (“sprints”) of
requirements, design, code and,
release.
● Rapid Application Development (RAD),
Rational Unified Process, XP
Programming, Scrum, Kanban etc.
https://blog.itil.org/2014/08/allgemein/what-it-
service-management-can-learn-from-the-agile-
manifesto-and-vice-versa/

Modern software excellence
● Continuous delivery.
● Fast feedback.
● Rigorous testing.
● Continuous integration.
● Sophisticated version control.
● Infrastructure as code (DevOps).
From Continuous Deliver in a Nutshell by Zaiku

Part 2: Challenges of ML
projects

Uncertain of outcomes
● Paradigm shift for product managers.
● Is this even a problem I can effectively
solve with Machine Learning?
Photo by Miguel Bruna on Unsplash

Training data requirements
● Unstructured problems: collecting big datasets (used
to be) a big barrier to entry.
● Structured datasets in the wild are often spread
across multiple sources with different governance
policies
Source unknown

Reproducibility requirements
● State must be consistent to allow
experiments to build upon each
other.
● Large datasets and artifacts don’t fit
into traditional version control tools
Photo by 85Fifteen on Unsplash

Slow feedback
● Large models can take from hours to days
to train.
● Models can be fiddly and difficult to train.
Photo by Nick Abrams on Unsplash

Model drift
● Models trained to make predictions on
today’s data have no guarantees they will
work on future data.
Photo by Josh Yang ∙ White. ∙ . on Unsplash

Blackbox-ness
● Hard to assess “correctness”.
● Production results may differ from
dev results.
● New class of concerns for QA and
support.
Photo by Emily Morter on Unsplash

Part 3: Phases of ML projects

ML project overview
Plan
Collect
PrepareTrain
Deploy

Project-wide practises
Photo by Hunter Haley on Unsplash

Focus on product not tech
● Can you validate it without training any
models?
● Is there a open-source or vendor
solution that will get you close?
● Tip: if you use a vendor solution, you still
need to evaluate its performance with a
test set
Photo by Nicolas Hoizey on Unsplash

Fast cycle time
● Start small and increase complexity as
needed.
● “If you're not embarrassed by the first
version of your product (model), you've
launched too late” - Reid Hoffman
Photo by Fabian Bächli on Unsplash

Consistent code structure
● Document where to put things and
create a linter-enforce style guide.
http://flake8.pycqa.org/en/latest/
● Cookie cutter data science to reduce
“bike-shedding” and decision fatigue
https://github.com/drivendata/cooki
ecutter-data-science
Photo by Dan Ritson on Unsplash

1. Plan
2. Collect
3. Prepare
4. Train
5. Deploy
Plan
Photo by NORTHFOLK on Unsplash

Consider implications
● What happens if the model is bad?
● What are the implications of what
I’m optimising?
● Do we need a human in the loop?
Plan
https://www.slideshare.net/ThoughtWorks/social-implications-of-bias-in-
machine-learning-fiona-coath-by-thoughtworks-133798261

Pick an evaluation metric
● “Main” (even single) evaluation metric
based on after considering your problem
and data understanding.
https://www.coursera.org/lecture/machine-learning-
projects/single-number-evaluation-metric-wIKkC
● Baseline metric predicting at random or
majority class predictions.
Plan
https://www.biochemia-
medica.com/en/journal/22/3/10.11613/BM.2012.031

Plan test set
● Test set should be production data.
● Test set shouldn’t overlap with training
set.
● Newer data is (usually) most
important.
Plan

Determine run criteria
● How will our production infrastructure
constrain our model?
● How fast does the inference need to
be?
Plan
Photo by NORTHFOLK on Unsplash

Collect
1. Plan
2. Collect
3. Prepare
4. Train
5. Deploy
Photo by Phad Pichetbovornkul on Unsplash

Data scientist builds dataset
● If you are building models, you should have a
good understanding of how the dataset was
collected.
● Active learning can make this fast.
https://platform.ai/ https://prodi.gy
Collect
Building labelled dataset with platform.ai

Small data first
● Small datasets can (sometimes) go a long way.
● Transfer learning for image classification,
natural language processing and even
structured data.
Collect
Photo by Ayo Ogunseinde on Unsplash

More data > solution complexity
● “Most people overestimate the cost
associated with gathering and labeling
data, and underestimate the hardship
of solving problems in a data starved
environment.” - Emmanuel Ameisen
https://blog.insightdatascience.com/ho
w-to-deliver-on-machine-learning-
projects-c8d82ce642b0
Collect
Photo by Simon Maage on Unsplash

Share collected data
● Package and share collected
datasets.
https://dvc.org https://quiltdata.com
● Encourage centralised &
compliant storage (data lakes).
Collect
https://quiltdata.com/

Prepare
1. Plan
2. Collect
3. Prepare
4. Train
5. Deploy

Look at your data
● Look at random
examples.
● Histograms.
● Missingno for missing
number visualizations.
https://github.com/ResidentMario
/missingno
Prepare

ML-driven exploratory analysis (EDA)
● Aim to train a model fast then use
interpretability and SME knowledge to guide
feature engineering and data collection.
From Fast.ai’s Machine Learning for Coders
● GBM (XGBoost, LightGBM, Catboost)
software can handle missing values,
categorical values and varying scales out the
box.
Prepare

Version artifacts and pipelines
● Version control artifacts.
● Track the pipelines used to generate
features. https://dvc.org
● Pipenv & Poetry for tracking dependencies
chains.
https://pipenv.readthedocs.io/en/latest/
https://github.com/sdispater/poetry
Prepare

Practise good code hygiene
● Test-driven development for feature
engineering code: unit, integration, etc
● Refactor into modules.
● Fix bugs with your features before
worrying about hyperparameters.
Prepare
Photo by Piron Guillaume on Unsplash

Train
1. Plan
2. Collect
3. Prepare
4. Train
5. Deploy
Photo by Fancycrave on Unsplash

“Easiest” models first
● Favour simple, interpretable models
initially.
● GBMs are great default choice for
structured data.
Train
https://towardsdatascience.com/interpretable-machine-learning-
with-xgboost-9ec80d148d27

Fast feedback
● Overfit first.
● Train on samples or small images etc
while testing experiments: Aim to keep
training time < 5 minutes.
● Val set from the same distribution as
test set
Train
https://www.bridgewateruk.com/2016/08/working-large-company-vs-working-
small-company-pros-cons/

Transfer learning
● (Almost) always start with a
pretrained model if possible.
● Transfer learning for image
classification and recently natural
language processing.
Universal Language Model Fine-tuning for Text
Classification
BERT: Pre-training of Deep Bidirectional Transformers
for Language Understanding
Train
https://machinelearningmastery.com/transfer-learning-for-deep-learning/

Constrain training to run criteria
● Constrain model selection to
suit run criteria.
● CatBoost, as alternative to
XGBoost, support fast
inference and model size
regularization
Train

Perform error analysis
● Error analysis by hand: look at 100
examples of errors and determine
common themes.
● View most confidence and least
confident predicts.
● Feature importance and ablation.
Train
From https://www.kdnuggets.com/2018/01/error-analysis-your-rescue.html
based on ideas by Andrew Ng

Deploy
1. Plan
2. Collect
3. Prepare
4. Train
5. Deploy
Photo by Agto Nugroho on Unsplash

Go to prod early
● Test your model on data and
conditions in prod early.
● A/B deployments: new model
receives inputs alongside production
model to compare performance.
Deploy
Model
A
Model
B

Validate inputs (and outputs)
● Fast feedback on prod data not
accounted for in training / test set.
● Pydantic validates using Python types.
https://github.com/samuelcolvin/pydanti
c
Deploy
Photo by Fancycrave on Unsplash

Minimise ops
● Aim for Serverless and low infrastructure.
● Automate deployments.
● Developers and data scientists on call.
Deploy

Monitor metric
● Monitor metric by continually building
new test sets.
● Track performance over time.
● Schedule retraining.
Deploy
Photo by Kyle Hanson on Unsplash

Accessible interpretability tools
● Data scientist should create
tools to make model
accessible to all.
● Interpretability dashboards to
make predictions against real
data and view interpretations.
https://www.thoughtworks.com/clients/ark
ose-labs
Deploy
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
A Unified Approach to Interpreting Model Predictions

Conclusion
● Research is uncertain but we can define clear goals.
● Data can be collected iteratively.
● Carefully track data, artifacts and pipelines for
reproducibility.
● Aim for fast feedback while training models.
● Deploy early and monitor production.
● Make interpretability tools accessible to the organisation.

Thank you
53
Lex Toumbourou
lext@thoughtworks.com
@lexandstuff

Emerging Best Practises for Machine Learning Engineering- Lex Toumbourou (By ThoughtWorks)

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Emerging Best Practises for Machine Learning Engineering- Lex Toumbourou (By ThoughtWorks)

Similaire à Emerging Best Practises for Machine Learning Engineering- Lex Toumbourou (By ThoughtWorks) (20)

Plus de Thoughtworks

Plus de Thoughtworks (20)

Dernier

Dernier (20)

Emerging Best Practises for Machine Learning Engineering- Lex Toumbourou (By ThoughtWorks)