3. Agenda
10 Minutes of Content & 5 for Minutes for Questions
3
• Traditional software systems versus ML & AI Solutions
Let’s apply some definitions for clarity. Why would we want to invest in ML?
• What do I want for lunch?
A simple example demonstrating the benefits, challenges, and operational
considerations of ML versus traditional software.
• Machine Learning in the field
What do we need to know about deploying a solution with ML?
• Where do I begin and what is this going to cost me?
Adding ML can be very cost effective when using trained models, but what about
training bespoke models for custom needs?
4. Definitions
4
Artificial Intelligence
Encompasses all approaches to
simulate human intelligence.
General AI is the goal.
Machine Learning
Algorithmic approach to parse
data, learn from it, and make
predictions.
Deep Learning
Massive artificial neural networks
targeting narrow AI.
Source – A great article by Michael Copeland
https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/
5. Why Machine Learning?
5
• Many problems do not
require ML.
• Where there is overlap,
the ML offers
generalization.
• To the far right are
problem domains that
traditional software
cannot solve, e.g., speech
recognition, computer
vision, etc.
Traditional
Software
Machine
Learning
Descriptive Predictive
6. Version 3
Provide recommendations
based on past order history
Version 2
Order history remembered
for quick reordering
Lunchtime Ordering App
Integrating machine learning with a traditional app
6
Version 1
Users can
order lunch
Cuisine?
American
Italian
Indian
French
Menu?
Hamburger
Fries
Chicken
Application Development Methodology (Scrum)
Data Science Methodology (CRISP-DM)
=
7. Version 3
Provide recommendations
based on past order history
Lunchtime Ordering App
Machine learning versus conventional approaches
7
Non-ML Approach – Considerations
Simple list of whatever a user orders the
most is what is recommended
What was ordered most recently is what
is recommended
Recommend based on price
Mine reviews from Yelp and recommend
based on user reviews
Recommended based on location
?
ML Approach
There is no need for us to
programmatically try to understand all
the relationships between the influencing
factors that go into making a lunch
determination. We will let Machine
Learning determine this for us, but we
need to provide the inputs – these are
called Features.
8. Lunchtime Ordering App
Machine learning observations, features & labels
8
Features are an individual property being observed
that be believe will have predictive power.
External features can
have significance
The label informs our
algorithm of the correct
result we seek to predict
9. Lunchtime Ordering App
Machine learning training & model selection
9
Multiclass neural network
Accuracy, long training times
Multiclass logistic regression
Fast training times, linear model
Multiclass decision forest
Accuracy, fast training times
Multiclass decision jungle
Accuracy, small memory footprint
A good article on model performance – Accuracy, Precision, Recall
https://blogs.msdn.microsoft.com/andreasderuiter/2015/02/09/performance-measures-in-azure-ml-accuracy-precision-recall-and-f1-score/
10. Lunchtime Ordering App
Deploying our model and predicting where to eat!
10
Multiclass decision forest
Accuracy, fast training times
Observation Day of Week Time of Day Ordered Recently Distance Vegan Option Calories Cuisine Price Yelp Rating Weather
1 Wednesday 12:30 PM No 0 - 5 Miles Yes Medium American $$ 4 Clear
2 Wednesday 12:30 PM No 15 - 25 Miles Yes High Indian $ 5 Clear
3 Wednesday 12:30 PM No 6 - 10 Miles Yes Medium American $$$ 4 Clear
4 Wednesday 12:30 PM No 0 - 5 Miles Yes Medium Italian $ 4 Clear
5 Wednesday 12:30 PM No 15 - 25 Miles Yes High American $$ 5 Clear
6 Wednesday 12:30 PM Yes 6 - 10 Miles Yes High Italian $$$ 5 Clear
7 Wednesday 12:30 PM No 0 - 5 Miles No Medium American $ 4 Clear
8 Wednesday 12:30 PM No 15 - 25 Miles Yes Low American $$ 4 Clear
1
2
3
Observation Scored Label Scored Probobablites
1 High 0.92
2 High 0.80
3 Medium 0.94
4 Medium 0.70
5 High 0.80
6 Low 0.95
7 Low 0.80
8 High 0.50
11. What does this cost?
11
Custom Model - ML
• Azure ML
• Microsoft R Server
• Google TensorFlow
• Amazon Machine Learning
• Big Data – Spark R
1-4 Months
Prebuilt Intelligence APIs
Microsoft Azure Cognitive Services
Google Cloud Prediction
IBM Watson APIs
4-12+ Months
Deep Learning
• Microsoft Cognitive Toolkit
(CNTK)
• Google TensorFlow
• Custom Algorithm Neural
Network
6-18+ Months
12. Observation User Age Income Gender Day of Week Time of Day Satisfaction
1 Jeff 44 50 - 75k Male Thursday 11:00 AM High
2 Jeff 44 50 - 75k Male Friday 1:00 PM Low
3 Jeff 44 50 - 75k Male Friday 1:00 PM Medium
4 Tony 43 75 - 100k Male Monday 12:30 PM Medium
5 Tony 43 75 - 100k Male Tuesday 12:30 PM High
6 Tony 43 75 - 100k Male Friday 12:00 PM Low
7 Jill 28 75 - 100k Female Friday 11:30 AM High
8 Jill 28 75 - 100k Female Friday 2:00 PM High
…
N -
Lunchtime Ordering App
Bonus section – Unsupervised learning (clustering)
12
Imagine adding demographic features
to our data set.
The label informs our
algorithm of the correct
result we seek to predict
What type of clusters do we
see for users that are highly
satisfied?
{Female, 24-30, 75-100k}
Perhaps an ad campaign?
14. Data Analytics
CRISP Methodology
14
• Business Understanding
This initial phase focuses on understanding the project objectives and requirements
from a business perspective, and then converting this knowledge into a data mining
problem definition, and a preliminary plan designed to achieve the objectives.
• Data Understanding
The data understanding phase starts with an initial data collection and proceeds with
activities in order to get familiar with the data, to identify data quality problems, to
discover first insights into the data, or to detect interesting subsets to form hypotheses
for hidden information.
• Data Preparation
The data preparation phase covers all activities to construct the final dataset. This data
will be fed into the modeling tools from the initial raw data. Data preparation tasks are
likely to be performed multiple times.
• Modeling
Modeling techniques are selected and applied, and their parameters are calibrated to
optimal values. Typically, there are several techniques for the same data mining problem
type. Stepping back to the data preparation phase is often needed.
16. Additional and Links
• Scrum Software Development
https://en.wikipedia.org/wiki/Scrum_(software_development)
• CRISP-DM, Cross Industry Standard Process for Data Mining
https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining