Anatomy of an Application: Machine Learning End-to-End - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
4. BigML, Inc #DutchMLSchool
Real-world ML Applications
4
• Should you sign that NDA?
• Upload the NDA to the website
• The service uses Machine Learning to decide if the terms are fair
https://ndalynn.com/
5. BigML, Inc #DutchMLSchool
Real-world ML Applications
5
• Gathers over 500 features about companies:
• Crunchbase / Tweets / Patents / LinkedIn / etc.
• Creates a label for success/failure:
• IPO or acquisition = success
• Bankruptcy or irrelevance = failure
• Uses Machine Learning to build a model that predicts the success
or failure of startups
• And puts all of the information together into an investor dashboard
https://preseries.com
6. BigML, Inc #DutchMLSchool
ML Adoption
6
"The gap for most
companies isn’t that
machine learning
doesn’t work, but that
they struggle to actually
use it”
• Why?
• Too much focus on algorithms
• Not enough focus on applying Machine
7. BigML, Inc #DutchMLSchool
Real-world ML Applications
7
https://thepointsguy.com/news/this-is-the-reason-you-arent-feeling-as-much-turbulence-on-delta-flights/
…collecting and
analyzing “hundreds
of thousands of data
points,” with a plan
to boost that to
“millions,” creating a
model that forecasts
turbulence with a
level of confidence
heretofore unseen.
Not Important: the algorithm!
8. BigML, Inc #DutchMLSchool
Machine Learning Evolution
8
Genesis
Custom built
Product Service
Utility
Academics &
Researchers
Scientists
Developers
Analysts
Everyone
1950s
2000s 2011
2030
Commodity
2020
Ubiquity
CertaintyUnknown Defined
NovelCommon
Weka, Scikit
BigML, Azure
ML, Amazon
ML, Google
Cloud ML1st
Workshop on
Machine Learning
1980
1980
• Machine Learning algorithms are fun to talk about: GPUs, NNs, etc
• But the algorithms are largely a commodity already
• Difficulty is knowing how to apply ML
9. BigML, Inc #DutchMLSchool
What is an ML Application
9
AIRLINE ORIGIN DESTINATION
DEPARTURE
DELAY
DISTANCE
ARRIVAL
DELAY
AS ANC SEA -11 1448,0 -22
AA LAX PBI -8 2330,0 -9
US SFO CLT -2 2296,0 5
AA LAX MIA -5 2342,0 -9
AS SEA ANC -1 1448,0 -21
DL SFO MSP -5 1589 8
NK LAS MSP -6 1299 -17
US LAX CLT 14 2125,0 -10
AA SFO DFW -11 1464,0 -13
DL LAS ATL 3 1747,0 -15
Finding patterns in data that can be used to
make inferences…
Predictive Models
Consider: ML Definition
10. BigML, Inc #DutchMLSchool
What is an ML Application
10
AIRLINE ORIGIN DESTINATION
DEPARTURE
DELAY
DISTANCE
ARRIVAL
DELAY
AS ANC SEA -11 1448,0 -22
AA LAX PBI -8 2330,0 -9
US SFO CLT -2 2296,0 5
AA LAX MIA -5 2342,0 -9
AS SEA ANC -1 1448,0 -21
DL SFO MSP -5 1589 8
NK LAS MSP -6 1299 -17
US LAX CLT 14 2125,0 -10
AA SFO DFW -11 1464,0 -13
DL LAS ATL 3 1747,0 -15
Predictive Models
• Where does this data come from?
• How do you know what data?
• Is the data formatted correctly?
• What do you do with these models?
• How do you combine them?
• Will it work?
11. BigML, Inc #DutchMLSchool
Reality of a ML Application
11
Data
Transformations
Feature
Engineering
Data
Collection
Evaluation
& Retraining
Seen
Unseen
Predictive App
12. BigML, Inc #DutchMLSchool
Where to Start?
12
Step
1
Finish
Step
2
- - - - - - - -
???
“Let’s predict
customer churn!”
“Here are the
customers we predict
will leave our service”
13. BigML, Inc #DutchMLSchool
Where to Start?
13
Step
1
Finish
Step
2
- - - - - - - -
???
“Let’s detect
fraud!
“Here are the
transactions we should
stop immediately.
14. BigML, Inc #DutchMLSchool
ML Application Guide
14
• Remember: ML finds patterns in data enabling predictions about
future events
• This means you need data
• What data depends on what you want to predict
• And the data you have or can collect
• Data needs to have patterns related to what you want to predict
• Not magic: still can’t predict random events, lotteries, etc
• Your problem statement needs to be specific
• Not “Let’s predict churn”
• But “Let’s predict churn by looking at the profile data of all
previous customers of our service who have/have not
churned”
• This can be tricky…
State the problem as an ML Task
15. BigML, Inc #DutchMLSchool
Where to Start?
15
Step
1
Finish
“Let’s predict
the Oscars!”
“Here are the
predicted winners”
Step
2
- - - - - - - -
???
• Statement is not specific enough!!!
• What data can we collect that predicts Oscar wins?
16. BigML, Inc #DutchMLSchool
Predicting the Oscars
16
• 6 out of 6 right!
• 8 out of 8 actually, but
probability of the predictions
was “too low”
• Adapted Screenplay
• Original Screenplay
BigML Scoresheet
2018
• 4 our of 8 major awards
correctly predicted
• Probabilities were lower this
year
• This is still significantly
better than guessing
2019
How is this possible? Isn't the winner random?
17. BigML, Inc #DutchMLSchool
How an Oscar is Won
17
voting
intention?
7,000+ members
Insight: winning awards is not a random event!
18. BigML, Inc #DutchMLSchool
Let’s Predict Best Picture
18
Win
London
Critics
Lose
Writers
Guild
Win
Directors
Guild
Win
Golden
Win
Bafta
• These events are *not* independent
• Similar, but not identical, factors contribute to
each win…
• We can expect a higher probability for Shape of
Water to win
Oscar
?Win?
21. BigML, Inc #DutchMLSchool
Oscars Example
21
• When specifying the problem, be as specific as possible
• Not: “Let’s predict the Oscars”
• Instead: “Let’s Predict the Oscars by correlating a series
of award wins with the final Oscar win.”
• The statement of the problem will guide the data required
• Be aware of the cost of collecting the data versus the ROI:
Tidbits and Lessons Learned….
22. BigML, Inc #DutchMLSchool
Ranking ML Applications
22
FEASIBILITY
(incdataavailability/deccomplexity)
ROI
(impact and cost)
-
+
+
NO-BRAINERS
START HERE
NO-GO
POSTPONABLE
BRAINERS
Thinking about an ML Application?
23. BigML, Inc #DutchMLSchool
Oscars Example
23
• When specifying the problem, be as specific as possible
• Not: “Let’s predict the Oscars”
• Instead: “Let’s Predict the Oscars by correlating a series
of award wins with the final Oscar win.”
• The statement of the problem will guide the data required
• Be aware of the cost of collecting the data versus the ROI:
• IMDB data is readily availble
• We’re done right?
• Nope. You can’t escape Feature Engineering
• Items: BAFTA_won_categories = list of nominations
• Aggregations: Nomination and Award counts
• You can’t escape Feature Selection
• Full user reviews costly to collect and not useful
Tidbits and Lessons Learned….
Wait: How were you confident in the predictions?
24. BigML, Inc #DutchMLSchool
2013
2016
119 variables
Evaluating the Model
24
119 variables
2000
2016 119 variables
2000
2012Original Dataset
Test Dataset
Train Dataset
• Ultimately, we want to use all the history to predict the winner
for the current year
• In order to evaluate success, we use a model built from
2000-2012 data to predict the winners for 2013-2016
• Built a separate Deepnet for each award category
• Evaluation obtained a ROC AUC over 0.98 across all award
categories
Great: The model seems OK, what next?
25. BigML, Inc #DutchMLSchool
Effort of a ML Application
25
State the problem as an ML task
Data wrangling
Feature engineering
Modeling and Evaluations
Predictions
Measure Results
Data transformations ~80% effort
~5% effort
~5% effort
This is only such low
effort because of
platforms like
This is an area where
is currently
innovating
Task
~10% effort
Effort
26. BigML, Inc #DutchMLSchool
Reality Check
26
• All Machine Learned models are wrong
• Real-world Machine Learning is iterative
• End-to-end Machine Learning is compositional
Three Important Concepts in Applying ML…
27. BigML, Inc #DutchMLSchool
End-to-end ML is Compositional
27
• Real-world problems
• Solved by applying a combination of algorithms
• Very rarely is it one-and-done
29. BigML, Inc #DutchMLSchool
Feature Engineering
29
MODEL
FILTERSOLD HOMES
BATCH
PREDICTION
NEW FEATURES
DATASET DEALS
DATASET
FILTERFORSALE HOMES NEW FEATURES
30. BigML, Inc #DutchMLSchool
End-to-end ML is Compositional
30
• Real-world problems
• Solved by applying a combination of algorithms
• Very rarely is it one-and-done
• Each “step” is often multi-stage as well
• Filtering/Cleaning data
31. BigML, Inc #DutchMLSchool
Anomaly Filter and Evaluate
31
DIABETES
SOURCE
DIABETES
DATASET
TRAIN SET
TEST SET
ALL
MODEL
CLEAN
DATASET
FILTER
ALL
MODEL
ALL
EVALUATION
CLEAN
EVALUATION
COMPARE
EVALUATIONS
ANAOMALY
DETECTOR
32. BigML, Inc #DutchMLSchool
Fixing Missing Values
32
Fix Missing Values in a “Meaningful” Way
Filter Zeros
Model
insulin
Predict
insulin
Select
insulin
Fixed
Dataset
Amended
Dataset
Original
Dataset
Clean
Dataset
33. BigML, Inc #DutchMLSchool
End-to-end ML is Compositional
33
• Real-world problems
• Solved by applying a combination of algorithms
• Very rarely is it one-and-done
• Each “step” is often multi-stage as well
• Filtering/Cleaning data
• Tuning a model for optimum performance
34. BigML, Inc #DutchMLSchool
Ensemble Tuning
34
ENSEMBLE
N=20
EVALUATION
SOURCE DATASET
TRAINING
TEST
EVALUATIONEVALUATION
ENSEMBLE
N=10
ENSEMBLE
N=1000
CHOOSE
35. BigML, Inc #DutchMLSchool
End-to-end ML is Compositional
35
• Real-world problems
• Solved by applying a combination of algorithms
• Very rarely is it one-and-done
• Each “step” is often multi-stage as well
• Filtering/Cleaning data
• Tuning a model for optimum performance
• Finding the best features
36. BigML, Inc #DutchMLSchool
Best-first Feature Selection
36
{F1}
CHOOSE BEST
S = {Fa}
{F2} {F3} {F4} Fn
S+{F1} S+{F2} S+{F3} S+{F4} S+{Fn-1}
CHOOSE BEST
S = {Fa, Fb}
S+{F1} S+{F2} S+{F3} S+{F4} S+{Fn-1}
CHOOSE BEST
S = {Fa, Fb, Fc}
37. BigML, Inc #DutchMLSchool
End-to-end ML is Compositional
37
• Real-world problems
• Solved by applying a combination of algorithms
• Very rarely is it one-and-done
• Each “step” is often multi-stage as well
• Filtering/Cleaning data
• Tuning a model for optimum performance
• Finding the best features
• May require models for several domains of knowledge
• Multiple Training / Scoring
38. BigML, Inc #DutchMLSchool
AGGREGATED
BY CARD
AGGREGATED
BY USER
AGGREGATED
BY PROFILE
Multiple Domains
38
TRANSACTIONS
ANOMALY
BY CARD
ANOMALY
BY USER
ANOMALY
BY PROFILE
ANOMALY
SCORE
ANOMALY
SCORE
ANOMALY
SCORE
NEW TRANSACTION
APPROVED?
39. BigML, Inc #DutchMLSchool
End-to-end ML is Compositional
39
• Real-world problems
• Solved by applying a combination of algorithms
• Very rarely is it one-and-done
• Each “step” is often multi-stage as well
• Filtering/Cleaning data
• Tuning a model for optimum performance
• Finding the best features
• May require models for several domains of knowledge
• Multiple Training / Scoring
• Even after deploying a model
• Workflow to monitor performance, know when to retrain
41. BigML, Inc #DutchMLSchool
Reality Check
41
• All Machine Learned models are wrong
Three Important Concepts in Applying ML…
• Real-world Machine Learning is iterative
• End-to-end Machine Learning is compositional
42. BigML, Inc #DutchMLSchool
• Better features always beat better algorithms
• Good algorithms already exist and are good enough
• Tools like OptiML exist which can help optimize performance
• The data is never good enough
Tenets of Machine Learning
42
• All Machine Learned models are wrong
• Real-world Machine Learning is iterative
• End-to-end Machine Learning is compositional
• Automation is better than hand tuning - you need an API!
• When data changes quickly, training speed is more
important than accuracy
• Repeatability is superior to a single strong result
• Problems are solved with workflows of algorithms
• A ML solution is not real until it is in production
• ML is here: Now we need 100,000x people applying ML
, but some are useful