SlideShare une entreprise Scribd logo
1  sur  125
Télécharger pour lire hors ligne
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Models in Production
Deriving Knowledge from Data at Scale
Putting an ML Model into Production
• A/B Testing
Deriving Knowledge from Data at Scale
Controlled Experiments in One Slide
Concept is Trivial
• Must run statistical tests to confirm differences are not due to chance
• Best scientific way to prove causality, i.e., the changes in metrics are
caused by changes introduced in the treatment(s)
Deriving Knowledge from Data at Scale
Best Practice: A/A Test
Run A/A tests
before
Deriving Knowledge from Data at Scale
Best Practice: Ramp-up
Ramp-up
Deriving Knowledge from Data at Scale
Best Practice: Run Experiments at 50/50%
Deriving Knowledge from Data at Scale
Cost based learning
Deriving Knowledge from Data at Scale
Imbalanced Class Distribution & Error Costs
WEKA cost sensitive learning
weighting method
false negatives, FN
try to avoid
false negatives
Deriving Knowledge from Data at Scale
Imbalanced Class Distribution
WEKA cost sensitive learning
Preprocess Classify
meta.CostSensitiveClassifier
set the FN to 10.0 FP to 1.0
tries to optimize accuracy or error can be cost-sensitive
decision trees rule learner
Deriving Knowledge from Data at Scale
Imbalanced Class Distribution
WEKA cost sensitive learning
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
curated
completely specify a problem measure progress
paired with a metric target SLAs score
board
Deriving Knowledge from Data at Scale
This isn’t easy…
• Building high quality gold sets is a challenge.
• It is time consuming.
• It requires making difficult and long lasting
choices, and the rewards are delayed…
Deriving Knowledge from Data at Scale
enforce a few principles
1. Distribution parity
2. Testing blindness
3. Production parity
4. Single metric
5. Reproducibility
6. Experimentation velocity
7. Data is gold
Deriving Knowledge from Data at Scale
• Test set blindness
• Reproducibility and Data is gold
• Experimentation velocity
Deriving Knowledge from Data at Scale
Building Gold sets is hard work. Many common and avoidable mistakes are
made. This suggests having a checklist. Some questions will be trivial to
answer or not applicable, some will require work…
1. Metrics: For each gold set, chose one (1) metric. Having two metrics on the same
gold set is a problem (you can’t optimize both at once).
2. Weighting/Slicing: Not all errors are equal. This should be reflected in the metric, not
through sampling manipulation. Having the weighting in the metric has two
advantages: 1) it is explicitly documented and reproducible in the form of a metric
algorithm, and 2) production, train, and test sets results remain directly comparable
(automatic testing).
3. Yardstick(s): Define algorithms and configuration parameters for public yardstick(s).
There could be more than one yardstick. A simple yardstick is useful for ramping up.
Once one can reproduce/understand the simple yardstick’s result, it becomes easier
to improve on the latest “production” yardstick. Ideally yardsticks come with
downloadable code. The yardsticks provide a set of errors that suggests where
innovation should happen.
Deriving Knowledge from Data at Scale
4. Sizes and access: What are the set sizes? Each size corresponds to an innovation
velocity and a level of representativeness. A good rule of thumb is 5X size ratios
between gold sets drawn from the same distribution. Where should the data live? If
on a server, some services are needed for access and simple manipulations. There
should always be a size that is downloadable (< 1GB) to a desktop for high velocity
innovation.
5. Documentation and format: Create a format/API for the data. Is the data
compressed? Provide sample code to load the data. Document the format. Assign
someone to be the curator of the gold set.
Deriving Knowledge from Data at Scale
6. Features: What (gold) features go in the gold sets? Features must be pickled for result
to be reproducible. Ideally, we would have 2, and possibly 3 types of gold sets.
a. One set should have the deployed features (computed from the raw data). This provides the
production yardstick.
b. One set should be Raw (e.g. contains all information, possibly through tables). This allows
contributors to create features from the raw data to investigate its potential compared to existing
features. This set has more information per pattern and a smaller number of patterns.
c. One set should have an extended number of features. The additional features may be “building
blocks”, features that are scheduled to be deployed next, or high potential features. Moving some
features to a gold set is convenient if multiple people are working on the next generation. Not all
features are worth being in a gold set.
7. Feature optimization sets: Does the data require feature optimization? For instance,
an IP address, a query, or a listing id may be features. But only the most frequent 10M
instances are worth having specific trainable parameters. A pass over the data can
identify the top 10M instance. This is a form of feature optimization. Identifying these
features does not require labels. If a form of feature optimization is done, a separate
data set (disjoint from the training and test set) must be provided.
Deriving Knowledge from Data at Scale
8. Stale rate, optimization, monitoring: How long does the set stay current? In many
cases, we hide the fact that the problem is a time series even though the goal is to
predict the future and we know that the distribution is changing. We must quantify
how much a distribution changes over a fixed period of time. There are several ways
to mitigate the changing distribution problem:
a. Assume the distribution is I.I.D. Regularly re-compute training sets and Gold sets. Determine the
frequency of re-computation, or set in place a system to monitor distribution drifts (monitor KPI
changes while the algorithm is kept constant).
b. Decompose the model along “distribution (fast) tracking parameters” and slow tracking parameters.
The fast tracking model may be a simple calibration with very few parameters.
c. Recast the problem as a time series problem: patterns are (input data from t-T to t-1, prediction at
time t). In this space, the patterns are much larger, but the problem is closer to being I.I.D.
9. The gold sets should have information that reveal the stale rate and allows algorithms
to differentiate themselves based on how they degrade with time.
Deriving Knowledge from Data at Scale
10. Grouping: Should the patterns be grouped? For example in handwriting, examples are
grouped per writer. A set built by shuffling the words is misleading because training
and testing would have word examples for the same writer, which makes
generalization much easier. If the words are grouped per writers, then a writer is
unlikely to appear in both training and test set, which requires the system to generalize
to never seen before handwriting (as opposed to never seen before words). Do we
have these type of constraints? Should we group per advertisers, campaign, users to
generalize across new instances of these entities (as opposed to generalizing to new
queries)? ML requires training and testing to be drawn from the same distribution.
Drawing duplicates is not a problem. Problems arise when one partially draw
examples from the same entity on both training and testing on a small set of entities.
This breaks the IID assumption and makes the generalization on the test set much
easier than it actually is.
11. Sampling production data: What strategy is used for sampling? Uniform? Are any of
the following filtered out: fraud, bad configurations, duplicates, non-billable, adult,
overwrites, etc? Guidance: use the production sameness principle.
Deriving Knowledge from Data at Scale
11. Unlabeled set: If the number of labeled examples is small, a large data set of
unlabeled data with the same distribution should be collected and be made a gold
set. This enables the discovery of new features using intermediate classifiers and
active labeling.
Deriving Knowledge from Data at Scale
GreatestChallengeinMachineLearning
Deriving Knowledge from Data at Scale
gender age smoker eye
color
male 19 yes green
female 44 yes gray
male 49 yes blue
male 12 no brown
female 37 no brown
female 60 no brown
male 44 no blue
female 27 yes brown
female 51 yes green
female 81 yes gray
male 22 yes brown
male 29 no blue
lung
cancer
no
yes
yes
no
no
yes
no
no
yes
no
no
no
male 77 yes gray
male 19 yes green
female 44 no gray
yes
no
no
Train
ML Model
Deriving Knowledge from Data at Scale
The greatest challenge in Machine Learning?
Lack of Labelled Training Data…
What to Do?
• Controlled Experiments – get feedback from user to serve as labels;
• Mechanical Turk – pay people to label data to build training set;
• Ask Users to Label Data – report as spam, ‘hot or not?’, review a product,
observe their click behavior (ad retargeting, search results, etc).
Deriving Knowledge from Data at Scale
Whatifyoucan'tgetlabeledTrainingData?
Traditional Supervised Learning
• Promotion on bookseller’s web page
• Customers can rate books.
• Will a new customer like this book?
• Training set: observations on previous customers
• Test set: new customers
Whathappensif onlyfew customers rate a book?
Age Income LikesBook
24 60K +
65 80K -
60 95K -
35 52K +
20 45K +
43 75K +
26 51K +
52 47K -
47 38K -
25 22K -
33 47K +
Age Income LikesBook
22 67K ?
39 41K ?
Age Income LikesBook
22 67K +
39 41K -
Model
Test Data
Prediction
Training Data
Attributes
Target
Label
© 2013 Datameer, Inc. All rights reserved.
Age Income LikesBook
24 60K +
65 80K -
60 95K -
35 52K +
20 45K +
43 75K +
26 51K +
52 47K -
47 38K -
25 22K -
33 47K +
Age Income LikesBook
22 67K ?
39 41K ?
Age Income LikesBook
22 67K +
39 41K -
Deriving Knowledge from Data at Scale
Semi-SupervisedLearning
Can we makeuse of the unlabeled data?
In theory: no
... but we can make assumptions
PopularAssumptions
• Clustering assumption
• Low density assumption
• Manifold assumption
Deriving Knowledge from Data at Scale
TheClusteringAssumption
Clustering
• Partition instances into groups (clusters) of similar
instances
• Many different algorithms: k-Means, EM, etc.
Clustering Assumption
• The two classification targets are distinct clusters
• Simple semi-supervised learning: cluster, then
perform majority vote
Deriving Knowledge from Data at Scale
TheClusteringAssumption
Clustering
• Partition instances into groups (clusters) of similar
instances
• Many different algorithms: k-Means, EM, etc.
Clustering Assumption
• The two classification targets are distinct clusters
• Simple semi-supervised learning: cluster, then
perform majority vote
Deriving Knowledge from Data at Scale
TheClusteringAssumption
Clustering
• Partition instances into groups (clusters) of similar
instances
• Many different algorithms: k-Means, EM, etc.
Clustering Assumption
• The two classification targets are distinct clusters
• Simple semi-supervised learning: cluster, then
perform majority vote
Deriving Knowledge from Data at Scale
TheClusteringAssumption
Clustering
• Partition instances into groups (clusters) of similar
instances
• Many different algorithms: k-Means, EM, etc.
Clustering Assumption
• The two classification targets are distinct clusters
• Simple semi-supervised learning: cluster, then
perform majority vote
Deriving Knowledge from Data at Scale
Generative Models
Mixture of Gaussians
• Assumption: the data in each cluster is generated
by a normal distribution
• Find most probable location and shape of clusters
given data
Expectation-Maximization
• Two step optimization procedure
• Keeps estimates of cluster assignment probabilities
for each instance
• Might converge to local optimum
Deriving Knowledge from Data at Scale
GenerativeModels
Mixture of Gaussians
• Assumption: the data in each cluster is generated
by a normal distribution
• Find most probable location and shape of clusters
given data
Expectation-Maximization
• Two step optimization procedure
• Keeps estimates of cluster assignment probabilities
for each instance
• Might converge to local optimum
Deriving Knowledge from Data at Scale
GenerativeModels
Mixture of Gaussians
• Assumption: the data in each cluster is generated
by a normal distribution
• Find most probable location and shape of clusters
given data
Expectation-Maximization
• Two step optimization procedure
• Keeps estimates of cluster assignment probabilities
for each instance
• Might converge to local optimum
Deriving Knowledge from Data at Scale
Generative Models
Mixture of Gaussians
• Assumption: the data in each cluster is generated
by a normal distribution
• Find most probable location and shape of clusters
given data
Expectation-Maximization
• Two step optimization procedure
• Keeps estimates of cluster assignment probabilities
for each instance
• Might converge to local optimum
Deriving Knowledge from Data at Scale
Generative Models
Mixture of Gaussians
• Assumption: the data in each cluster is generated
by a normal distribution
• Find most probable location and shape of clusters
given data
Expectation-Maximization
• Two step optimization procedure
• Keeps estimates of cluster assignment probabilities
for each instance
• Might converge to local optimum
Deriving Knowledge from Data at Scale
Generative Models
Mixture of Gaussians
• Assumption: the data in each cluster is generated
by a normal distribution
• Find most probable location and shape of clusters
given data
Expectation-Maximization
• Two step optimization procedure
• Keeps estimates of cluster assignment probabilities
for each instance
• Might converge to local optimum
Deriving Knowledge from Data at Scale
BeyondMixtures of Gaussians
Expectation-Maximization
• Can be adjusted to all kinds of mixture models
• E.g. use Naive Bayes as mixture model for text classification
Self-Training
• Learn model on labeled instances only
• Apply model to unlabeled instances
• Learn new model on all instances
• Repeat until convergence
Deriving Knowledge from Data at Scale
TheLow DensityAssumption
Assumption
• The area between the two classes has low density
• Does not assume any specific form of cluster
Support Vector Machine
• Decision boundary is linear
• Maximizes margin to closest instances
Deriving Knowledge from Data at Scale
TheLow DensityAssumption
Assumption
• The area between the two classes has low density
• Does not assume any specific form of cluster
Support Vector Machine
• Decision boundary is linear
• Maximizes margin to closest instances
Deriving Knowledge from Data at Scale
TheLow DensityAssumption
Assumption
• The area between the two classes has low density
• Does not assume any specific form of cluster
Support Vector Machine
• Decision boundary is linear
• Maximizes margin to closest instances
Deriving Knowledge from Data at Scale
TheLow DensityAssumption
Semi-Supervised SVM
• Minimize distance to labeled and
unlabeled instances
• Parameter to fine-tune influence of
unlabeled instances
• Additional constraint: keep class balance correct
Implementation
• Simple extension of SVM
• But non-convex optimization problem
Deriving Knowledge from Data at Scale
TheLow DensityAssumption
Semi-Supervised SVM
• Minimize distance to labeled and
unlabeled instances
• Parameter to fine-tune influence of
unlabeled instances
• Additional constraint: keep class balance correct
Implementation
• Simple extension of SVM
• But non-convex optimization problem
Deriving Knowledge from Data at Scale
TheLow DensityAssumption
Semi-Supervised SVM
• Minimize distance to labeled and
unlabeled instances
• Parameter to fine-tune influence of
unlabeled instances
• Additional constraint: keep class balance correct
Implementation
• Simple extension of SVM
• But non-convex optimization problem
Deriving Knowledge from Data at Scale
Semi-Supervised SVM
Stochastic Gradient Descent
• One run over the data in random order
• Each misclassified or unlabeled instance moves
classifier a bit
• Steps get smaller over time
Implementation on Hadoop
• Mapper: send data to reducer in random order
• Reducer: update linear classifier for unlabeled
or misclassified instances
• Many random runs to find best one
Deriving Knowledge from Data at Scale
Semi-Supervised SVM
Stochastic Gradient Descent
• One run over the data in random order
• Each misclassified or unlabeled instance moves
classifier a bit
• Steps get smaller over time
Implementation on Hadoop
• Mapper: send data to reducer in random order
• Reducer: update linear classifier for unlabeled
or misclassified instances
• Many random runs to find best one
Deriving Knowledge from Data at Scale
Semi-Supervised SVM
Stochastic Gradient Descent
• One run over the data in random order
• Each misclassified or unlabeled instance moves
classifier a bit
• Steps get smaller over time
Implementation on Hadoop
• Mapper: send data to reducer in random order
• Reducer: update linear classifier for unlabeled
or misclassified instances
• Many random runs to find best one
Deriving Knowledge from Data at Scale
Semi-Supervised SVM
Stochastic Gradient Descent
• One run over the data in random order
• Each misclassified or unlabeled instance moves
classifier a bit
• Steps get smaller over time
Implementation on Hadoop
• Mapper: send data to reducer in random order
• Reducer: update linear classifier for unlabeled
or misclassified instances
• Many random runs to find best one
Deriving Knowledge from Data at Scale
Semi-Supervised SVM
Stochastic Gradient Descent
• One run over the data in random order
• Each misclassified or unlabeled instance moves
classifier a bit
• Steps get smaller over time
Implementation on Hadoop
• Mapper: send data to reducer in random order
• Reducer: update linear classifier for unlabeled
or misclassified instances
• Many random runs to find best one
Deriving Knowledge from Data at Scale
TheManifoldAssumption
The Assumption
• Training data is (roughly) contained in a low
dimensional manifold
• One can perform learning in a more meaningful
low-dimensional space
• Avoids curse of dimensionality
Similarity Graphs
• Idea: compute similarity scores between instances
• Create network where the nearest
neighbors are connected
Deriving Knowledge from Data at Scale
TheManifoldAssumption
The Assumption
• Training data is (roughly) contained in a low
dimensional manifold
• One can perform learning in a more meaningful
low-dimensional space
• Avoids curse of dimensionality
Similarity Graphs
• Idea: compute similarity scores between instances
• Create a network where the nearest neighbors are
connected
Deriving Knowledge from Data at Scale
TheManifoldAssumption
The Assumption
• Training data is (roughly) contained in a low
dimensional manifold
• One can perform learning in a more
meaningful low-dimensional space
• Avoids curse of dimensionality
SimilarityGraphs
• Idea: compute similarity scores between instances
•
Create network where the nearest neighbors
are connected
Deriving Knowledge from Data at Scale
Label Propagation
MainIdea
• Propagate label information to neighboring instances
• Then repeat until convergence
• Similar to PageRank
Theory
• Known to converge under weak conditions
• Equivalent to matrix inversion
Deriving Knowledge from Data at Scale
Label Propagation
MainIdea
• Propagate label information to neighboring instances
• Then repeat until convergence
• Similar to PageRank
Theory
• Known to converge under weak conditions
• Equivalent to matrix inversion
Deriving Knowledge from Data at Scale
Label Propagation
MainIdea
• Propagate label information to neighboring instances
• Then repeat until convergence
• Similar to PageRank
Theory
• Known to converge under weak conditions
• Equivalent to matrix inversion
Deriving Knowledge from Data at Scale
Label Propagation
MainIdea
• Propagate label information to neighboring instances
• Then repeat until convergence
• Similar to PageRank
Theory
• Known to converge under weak conditions
• Equivalent to matrix inversion
Deriving Knowledge from Data at Scale
Conclusion
Semi-Supervised Learning
• Only few training instances have labels
• Unlabeled instances can still provide valuable signal
Different assumptions lead to different approaches
• Cluster assumption: generative models
• Low density assumption: semi-supervised support vector machines
• Manifold assumption: label propagation
Deriving Knowledge from Data at Scale
10 Minute Break…
Deriving Knowledge from Data at Scale
Controlled Experiments
Deriving Knowledge from Data at Scale
• A
• B
Deriving Knowledge from Data at Scale
OEC
Overall Evaluation Criterion
Picking a good OEC is key
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
• Lesson #2: GET THE DATA
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
• Lesson #2: Get the data!
Deriving Knowledge from Data at Scale
Lesson #3: Prepare to be humbled
Left Elevator Right Elevator
Deriving Knowledge from Data at Scale
• Lesson #1
• Lesson #2
• Lesson #3
15% Bing
Deriving Knowledge from Data at Scale
• HiPPO stop the project
From Greg Linden’s Blog: http://glinden.blogspot.com/2006/04/early-amazon-shopping-cart.html
Deriving Knowledge from Data at Scale
TED talk
Deriving Knowledge from Data at Scale
• Must run statistical tests to confirm differences are not due to chance
• Best scientific way to prove causality, i.e., the changes in metrics are
caused by changes introduced in the treatment(s)
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
• Raise your right hand if you think A Wins
• Raise your left hand if you think B Wins
• Don’t raise your hand if you think they’re about the same
A B
Deriving Knowledge from Data at Scale
• A was 8.5% better
Deriving Knowledge from Data at Scale
A
B
Differences: A has taller search box (overall size is the same), has magnifying glass icon,
“popular searches”
B has big search button
• Raise your right hand if you think A Wins
• Raise your left hand if you think B Wins
• Don’t raise your hand if they are the about the same
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
A B
• Raise your right hand if you think A Wins
• Raise your left hand if you think B Wins
• Don’t raise your hand if they are the about the same
Deriving Knowledge from Data at Scale
get the data prepare to be
humbled
Deriving Knowledge from Data at Scale
Any statistic that appears interesting is almost certainly a mistake
 If something is “amazing,” find the flaw!
 Examples
 If you have a mandatory birth date field and people think it’s
unnecessary, you’ll find lots of 11/11/11 or 01/01/01
 If you have an optional drop down, do not default to the first
alphabetical entry, or you’ll have lots jobs = Astronaut
 The previous Office example assumes click maps to revenue.
Seemed reasonable, but when the results look so extreme, find
the flaw (conversion rate is not the same; see why?)
Deriving Knowledge from Data at Scale
Data Trumps Intuition
Deriving Knowledge from Data at Scale
Sir Ken Robinson
Deriving Knowledge from Data at Scale
• OEC = Overall Evaluation Criterion
Deriving Knowledge from Data at Scale
• Controlled Experiments in one slide
• Examples: you’re the decision maker
Deriving Knowledge from Data at Scale
It is difficult to get a man to understand something when his
salary depends upon his not understanding it.
-- Upton Sinclair
Deriving Knowledge from Data at Scale
Hubris
Deriving Knowledge from Data at Scale
Cultural Stage 2
Insight through Measurement and Control
• Semmelweis worked at Vienna’s General Hospital, an
important teaching/research hospital, in the 1830s-40s
• In 19th-century Europe, childbed fever killed more than a million
women
• Measurement: the mortality rate for women giving birth was
• 15% in his ward, staffed by doctors and students
• 2% in the ward at the hospital, attended by midwives
Deriving Knowledge from Data at Scale
Cultural Stage 2
Insight through Measurement and Control
• He tried to control all differences
• Birthing positions, ventilation, diet, even the way laundry was done
• He was away for 4 months and death rate fell significantly when
he was away. Could it be related to him?
• Insight:
• Doctors were performing autopsies each morning on cadavers
• Conjecture: particles (called germs today) were being transmitted to
healthy patients on the hands of the physicians
• He experiments with cleansing agents
• Chlorine and lime was effective: death rate fell from 18% to 1%
Deriving Knowledge from Data at Scale
Semmelweis Reflex
• Semmelweis Reflex
2005 study: inadequate hand washing is one of the
prime contributors to the 2 million health-care-associated infections and
90,000 related deaths annually in the United States
Deriving Knowledge from Data at Scale
Fundamental Understanding
Deriving Knowledge from Data at Scale
Hubris
Measure and
Control
Accept Results
avoid
Semmelweis
Reflex
Fundamental
Understanding
Deriving Knowledge from Data at Scale
• Controlled Experiments in one slide
• Examples: you’re the decision maker
• Cultural evolution: hubris, insight through measurement,
Semmelweis reflex, fundamental understanding
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
• Real Data for the city of Oldenburg,
Germany
• X-axis: stork population
• Y-axis: human population
What your mother told you about babies and
storks when you were three is still not right,
despite the strong correlational “evidence”
Ornitholigische Monatsberichte 1936;44(2)
Deriving Knowledge from Data at Scale
Women have smaller palms and live 6 years longer
on average
But…don’t try to bandage your hands
Deriving Knowledge from Data at Scale
causal
Deriving Knowledge from Data at Scale
If you don't know where you are going, any road will take you there
—Lewis Carroll
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
before
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
• Hippos kill more humans than any other (non-human) mammal (really)
• OEC
Get the data
• Prepare to be humbled
The less data, the stronger the opinions…
Deriving Knowledge from Data at Scale
Out of Class Reading
Eight (8) page conference paper
40 page journal version…
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Course Project
Due Oct. 25th
Deriving Knowledge from Data at Scale
Open Discussion on
Course Project…
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Gallery of Experiments
Contributed by the community
Deriving Knowledge from Data at Scale
Azure Machine Learning Studio
Deriving Knowledge from Data at Scale
Sample
Experiments
To help you get started
Deriving Knowledge from Data at Scale
Experiment
Tools that you can use in your
experiment. For feature
selection, large set of machine
learning algorithms
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Using
classificatio
n
algorithms
Evaluating
the model
Splitting to
Training
and Testing
Datasets
Getting
Data
For the
Experiment
Deriving Knowledge from Data at Scale
http://gallery.azureml.net/browse/?tags=[%22Azure%20ML%20Book%22
Deriving Knowledge from Data at Scale
Customer Churn Model
Deriving Knowledge from Data at Scale
Deployed web service endpoints
that can be consumed by applications
and for batch processing
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Define
Objective
Access and
Understand the
Data
Pre-processing
Feature and/or
Target
construction
1. Define the objective and quantify it with a metric – optionally with constraints,
if any. This typically requires domain knowledge.
2. Collect and understand the data, deal with the vagaries and biases in the data
acquisition (missing data, outliers due to errors in the data collection process,
more sophisticated biases due to the data collection procedure etc
3. Frame the problem in terms of a machine learning problem – classification,
regression, ranking, clustering, forecasting, outlier detection etc. – some
combination of domain knowledge and ML knowledge is useful.
4. Transform the raw data into a “modeling dataset”, with features, weights,
targets etc., which can be used for modeling. Feature construction can often
be improved with domain knowledge. Target must be identical (or a very
good proxy) of the quantitative metric identified step 1.
Deriving Knowledge from Data at Scale
Feature selection
Model training
Model scoring
Evaluation
Train/ Test split
5. Train, test and evaluate, taking care to control
bias/variance and ensure the metrics are
reported with the right confidence intervals
(cross-validation helps here), be vigilant
against target leaks (which typically leads to
unbelievably good test metrics) – this is the
ML heavy step.
Deriving Knowledge from Data at Scale
Define
Objective
Access and
Understand
the data
Pre-processing
Feature and/or
Target
construction
Feature selection
Model training
Model scoring
Evaluation
Train/ Test split
6. Iterate steps (2) – (5) until the test metrics are satisfactory
Deriving Knowledge from Data at Scale
Access Data
Pre-processing
Feature
construction
Model scoring
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Book
Recommendation
Deriving Knowledge from Data at Scale
That’s all for our course….

Contenu connexe

Tendances

Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Roger Barga
 
Introduction to machine learning and deep learning
Introduction to machine learning and deep learningIntroduction to machine learning and deep learning
Introduction to machine learning and deep learningShishir Choudhary
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_financeStefan Duprey
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandrySri Ambati
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachinePulse
 
Fairly Measuring Fairness In Machine Learning
Fairly Measuring Fairness In Machine LearningFairly Measuring Fairness In Machine Learning
Fairly Measuring Fairness In Machine LearningHJ van Veen
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsSri Ambati
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchGreg Makowski
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018HJ van Veen
 
H2O World - Top 10 Deep Learning Tips & Tricks - Arno Candel
H2O World - Top 10 Deep Learning Tips & Tricks - Arno CandelH2O World - Top 10 Deep Learning Tips & Tricks - Arno Candel
H2O World - Top 10 Deep Learning Tips & Tricks - Arno CandelSri Ambati
 
Ml1 introduction to-supervised_learning_and_k_nearest_neighbors
Ml1 introduction to-supervised_learning_and_k_nearest_neighborsMl1 introduction to-supervised_learning_and_k_nearest_neighbors
Ml1 introduction to-supervised_learning_and_k_nearest_neighborsankit_ppt
 
Applications in Machine Learning
Applications in Machine LearningApplications in Machine Learning
Applications in Machine LearningJoel Graff
 
Feature Reduction Techniques
Feature Reduction TechniquesFeature Reduction Techniques
Feature Reduction TechniquesVishal Patel
 
Module 5: Decision Trees
Module 5: Decision TreesModule 5: Decision Trees
Module 5: Decision TreesSara Hooker
 
Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentationHJ van Veen
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Tutori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Tutori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Tutori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Tutori...Edureka!
 
Data Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksData Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksBICA Labs
 
Tips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsTips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsDarius Barušauskas
 

Tendances (20)

Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
Managing machine learning
Managing machine learningManaging machine learning
Managing machine learning
 
Introduction to machine learning and deep learning
Introduction to machine learning and deep learningIntroduction to machine learning and deep learning
Introduction to machine learning and deep learning
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark Landry
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
 
Fairly Measuring Fairness In Machine Learning
Fairly Measuring Fairness In Machine LearningFairly Measuring Fairness In Machine Learning
Fairly Measuring Fairness In Machine Learning
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
H2O World - Top 10 Deep Learning Tips & Tricks - Arno Candel
H2O World - Top 10 Deep Learning Tips & Tricks - Arno CandelH2O World - Top 10 Deep Learning Tips & Tricks - Arno Candel
H2O World - Top 10 Deep Learning Tips & Tricks - Arno Candel
 
Ml1 introduction to-supervised_learning_and_k_nearest_neighbors
Ml1 introduction to-supervised_learning_and_k_nearest_neighborsMl1 introduction to-supervised_learning_and_k_nearest_neighbors
Ml1 introduction to-supervised_learning_and_k_nearest_neighbors
 
Applications in Machine Learning
Applications in Machine LearningApplications in Machine Learning
Applications in Machine Learning
 
Feature Reduction Techniques
Feature Reduction TechniquesFeature Reduction Techniques
Feature Reduction Techniques
 
Module 5: Decision Trees
Module 5: Decision TreesModule 5: Decision Trees
Module 5: Decision Trees
 
Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentation
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Tutori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Tutori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Tutori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Tutori...
 
Data Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksData Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural Networks
 
Tips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsTips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitions
 

Similaire à Barga Data Science lecture 10

The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining ProcessMarc Berman
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial IndustrySubrat Panda, PhD
 
Types of Machine Learning- Tanvir Siddike Moin
Types of Machine Learning- Tanvir Siddike MoinTypes of Machine Learning- Tanvir Siddike Moin
Types of Machine Learning- Tanvir Siddike MoinTanvir Moin
 
Pricing like a data scientist
Pricing like a data scientistPricing like a data scientist
Pricing like a data scientistMatthew Evans
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruptionjagan477830
 
data-science-lifecycle-ebook.pdf
data-science-lifecycle-ebook.pdfdata-science-lifecycle-ebook.pdf
data-science-lifecycle-ebook.pdfDanilo Cardona
 
DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...Hakka Labs
 
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptxLesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptxcloudserviceuit
 
Understanding Mahout classification documentation
Understanding Mahout  classification documentationUnderstanding Mahout  classification documentation
Understanding Mahout classification documentationNaveen Kumar
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2Roger Barga
 
Week 4 advanced labeling, augmentation and data preprocessing
Week 4   advanced labeling, augmentation and data preprocessingWeek 4   advanced labeling, augmentation and data preprocessing
Week 4 advanced labeling, augmentation and data preprocessingAjay Taneja
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableJustin Basilico
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsSri Ambati
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxrajalakshmi5921
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Alok Singh
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needGibDevs
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluationeShikshak
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerDatabricks
 

Similaire à Barga Data Science lecture 10 (20)

The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining Process
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
 
Types of Machine Learning- Tanvir Siddike Moin
Types of Machine Learning- Tanvir Siddike MoinTypes of Machine Learning- Tanvir Siddike Moin
Types of Machine Learning- Tanvir Siddike Moin
 
Pricing like a data scientist
Pricing like a data scientistPricing like a data scientist
Pricing like a data scientist
 
Analytics
AnalyticsAnalytics
Analytics
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
 
data-science-lifecycle-ebook.pdf
data-science-lifecycle-ebook.pdfdata-science-lifecycle-ebook.pdf
data-science-lifecycle-ebook.pdf
 
DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...
 
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptxLesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
 
Understanding Mahout classification documentation
Understanding Mahout  classification documentationUnderstanding Mahout  classification documentation
Understanding Mahout classification documentation
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
Week 4 advanced labeling, augmentation and data preprocessing
Week 4   advanced labeling, augmentation and data preprocessingWeek 4   advanced labeling, augmentation and data preprocessing
Week 4 advanced labeling, augmentation and data preprocessing
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptx
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your need
 
Data processing
Data processingData processing
Data processing
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles Baker
 

Plus de Roger Barga

RS Barga STRATA'18 New York City
RS Barga STRATA'18 New York CityRS Barga STRATA'18 New York City
RS Barga STRATA'18 New York CityRoger Barga
 
Barga Strata'18 presentation
Barga Strata'18 presentationBarga Strata'18 presentation
Barga Strata'18 presentationRoger Barga
 
Barga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 KeynoteBarga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 KeynoteRoger Barga
 
Data Driven Engineering 2014
Data Driven Engineering 2014Data Driven Engineering 2014
Data Driven Engineering 2014Roger Barga
 
Barga DIDC'14 Invited Talk
Barga DIDC'14 Invited TalkBarga DIDC'14 Invited Talk
Barga DIDC'14 Invited TalkRoger Barga
 
Barga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteBarga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteRoger Barga
 

Plus de Roger Barga (6)

RS Barga STRATA'18 New York City
RS Barga STRATA'18 New York CityRS Barga STRATA'18 New York City
RS Barga STRATA'18 New York City
 
Barga Strata'18 presentation
Barga Strata'18 presentationBarga Strata'18 presentation
Barga Strata'18 presentation
 
Barga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 KeynoteBarga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 Keynote
 
Data Driven Engineering 2014
Data Driven Engineering 2014Data Driven Engineering 2014
Data Driven Engineering 2014
 
Barga DIDC'14 Invited Talk
Barga DIDC'14 Invited TalkBarga DIDC'14 Invited Talk
Barga DIDC'14 Invited Talk
 
Barga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteBarga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 Keynote
 

Dernier

VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 

Dernier (20)

VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 

Barga Data Science lecture 10

  • 1. Deriving Knowledge from Data at Scale
  • 2. Deriving Knowledge from Data at Scale Models in Production
  • 3. Deriving Knowledge from Data at Scale Putting an ML Model into Production • A/B Testing
  • 4. Deriving Knowledge from Data at Scale Controlled Experiments in One Slide Concept is Trivial • Must run statistical tests to confirm differences are not due to chance • Best scientific way to prove causality, i.e., the changes in metrics are caused by changes introduced in the treatment(s)
  • 5. Deriving Knowledge from Data at Scale Best Practice: A/A Test Run A/A tests before
  • 6. Deriving Knowledge from Data at Scale Best Practice: Ramp-up Ramp-up
  • 7. Deriving Knowledge from Data at Scale Best Practice: Run Experiments at 50/50%
  • 8. Deriving Knowledge from Data at Scale Cost based learning
  • 9. Deriving Knowledge from Data at Scale Imbalanced Class Distribution & Error Costs WEKA cost sensitive learning weighting method false negatives, FN try to avoid false negatives
  • 10. Deriving Knowledge from Data at Scale Imbalanced Class Distribution WEKA cost sensitive learning Preprocess Classify meta.CostSensitiveClassifier set the FN to 10.0 FP to 1.0 tries to optimize accuracy or error can be cost-sensitive decision trees rule learner
  • 11. Deriving Knowledge from Data at Scale Imbalanced Class Distribution WEKA cost sensitive learning
  • 12. Deriving Knowledge from Data at Scale
  • 13. Deriving Knowledge from Data at Scale curated completely specify a problem measure progress paired with a metric target SLAs score board
  • 14. Deriving Knowledge from Data at Scale This isn’t easy… • Building high quality gold sets is a challenge. • It is time consuming. • It requires making difficult and long lasting choices, and the rewards are delayed…
  • 15. Deriving Knowledge from Data at Scale enforce a few principles 1. Distribution parity 2. Testing blindness 3. Production parity 4. Single metric 5. Reproducibility 6. Experimentation velocity 7. Data is gold
  • 16. Deriving Knowledge from Data at Scale • Test set blindness • Reproducibility and Data is gold • Experimentation velocity
  • 17. Deriving Knowledge from Data at Scale Building Gold sets is hard work. Many common and avoidable mistakes are made. This suggests having a checklist. Some questions will be trivial to answer or not applicable, some will require work… 1. Metrics: For each gold set, chose one (1) metric. Having two metrics on the same gold set is a problem (you can’t optimize both at once). 2. Weighting/Slicing: Not all errors are equal. This should be reflected in the metric, not through sampling manipulation. Having the weighting in the metric has two advantages: 1) it is explicitly documented and reproducible in the form of a metric algorithm, and 2) production, train, and test sets results remain directly comparable (automatic testing). 3. Yardstick(s): Define algorithms and configuration parameters for public yardstick(s). There could be more than one yardstick. A simple yardstick is useful for ramping up. Once one can reproduce/understand the simple yardstick’s result, it becomes easier to improve on the latest “production” yardstick. Ideally yardsticks come with downloadable code. The yardsticks provide a set of errors that suggests where innovation should happen.
  • 18. Deriving Knowledge from Data at Scale 4. Sizes and access: What are the set sizes? Each size corresponds to an innovation velocity and a level of representativeness. A good rule of thumb is 5X size ratios between gold sets drawn from the same distribution. Where should the data live? If on a server, some services are needed for access and simple manipulations. There should always be a size that is downloadable (< 1GB) to a desktop for high velocity innovation. 5. Documentation and format: Create a format/API for the data. Is the data compressed? Provide sample code to load the data. Document the format. Assign someone to be the curator of the gold set.
  • 19. Deriving Knowledge from Data at Scale 6. Features: What (gold) features go in the gold sets? Features must be pickled for result to be reproducible. Ideally, we would have 2, and possibly 3 types of gold sets. a. One set should have the deployed features (computed from the raw data). This provides the production yardstick. b. One set should be Raw (e.g. contains all information, possibly through tables). This allows contributors to create features from the raw data to investigate its potential compared to existing features. This set has more information per pattern and a smaller number of patterns. c. One set should have an extended number of features. The additional features may be “building blocks”, features that are scheduled to be deployed next, or high potential features. Moving some features to a gold set is convenient if multiple people are working on the next generation. Not all features are worth being in a gold set. 7. Feature optimization sets: Does the data require feature optimization? For instance, an IP address, a query, or a listing id may be features. But only the most frequent 10M instances are worth having specific trainable parameters. A pass over the data can identify the top 10M instance. This is a form of feature optimization. Identifying these features does not require labels. If a form of feature optimization is done, a separate data set (disjoint from the training and test set) must be provided.
  • 20. Deriving Knowledge from Data at Scale 8. Stale rate, optimization, monitoring: How long does the set stay current? In many cases, we hide the fact that the problem is a time series even though the goal is to predict the future and we know that the distribution is changing. We must quantify how much a distribution changes over a fixed period of time. There are several ways to mitigate the changing distribution problem: a. Assume the distribution is I.I.D. Regularly re-compute training sets and Gold sets. Determine the frequency of re-computation, or set in place a system to monitor distribution drifts (monitor KPI changes while the algorithm is kept constant). b. Decompose the model along “distribution (fast) tracking parameters” and slow tracking parameters. The fast tracking model may be a simple calibration with very few parameters. c. Recast the problem as a time series problem: patterns are (input data from t-T to t-1, prediction at time t). In this space, the patterns are much larger, but the problem is closer to being I.I.D. 9. The gold sets should have information that reveal the stale rate and allows algorithms to differentiate themselves based on how they degrade with time.
  • 21. Deriving Knowledge from Data at Scale 10. Grouping: Should the patterns be grouped? For example in handwriting, examples are grouped per writer. A set built by shuffling the words is misleading because training and testing would have word examples for the same writer, which makes generalization much easier. If the words are grouped per writers, then a writer is unlikely to appear in both training and test set, which requires the system to generalize to never seen before handwriting (as opposed to never seen before words). Do we have these type of constraints? Should we group per advertisers, campaign, users to generalize across new instances of these entities (as opposed to generalizing to new queries)? ML requires training and testing to be drawn from the same distribution. Drawing duplicates is not a problem. Problems arise when one partially draw examples from the same entity on both training and testing on a small set of entities. This breaks the IID assumption and makes the generalization on the test set much easier than it actually is. 11. Sampling production data: What strategy is used for sampling? Uniform? Are any of the following filtered out: fraud, bad configurations, duplicates, non-billable, adult, overwrites, etc? Guidance: use the production sameness principle.
  • 22. Deriving Knowledge from Data at Scale 11. Unlabeled set: If the number of labeled examples is small, a large data set of unlabeled data with the same distribution should be collected and be made a gold set. This enables the discovery of new features using intermediate classifiers and active labeling.
  • 23. Deriving Knowledge from Data at Scale GreatestChallengeinMachineLearning
  • 24. Deriving Knowledge from Data at Scale gender age smoker eye color male 19 yes green female 44 yes gray male 49 yes blue male 12 no brown female 37 no brown female 60 no brown male 44 no blue female 27 yes brown female 51 yes green female 81 yes gray male 22 yes brown male 29 no blue lung cancer no yes yes no no yes no no yes no no no male 77 yes gray male 19 yes green female 44 no gray yes no no Train ML Model
  • 25. Deriving Knowledge from Data at Scale The greatest challenge in Machine Learning? Lack of Labelled Training Data… What to Do? • Controlled Experiments – get feedback from user to serve as labels; • Mechanical Turk – pay people to label data to build training set; • Ask Users to Label Data – report as spam, ‘hot or not?’, review a product, observe their click behavior (ad retargeting, search results, etc).
  • 26. Deriving Knowledge from Data at Scale Whatifyoucan'tgetlabeledTrainingData? Traditional Supervised Learning • Promotion on bookseller’s web page • Customers can rate books. • Will a new customer like this book? • Training set: observations on previous customers • Test set: new customers Whathappensif onlyfew customers rate a book? Age Income LikesBook 24 60K + 65 80K - 60 95K - 35 52K + 20 45K + 43 75K + 26 51K + 52 47K - 47 38K - 25 22K - 33 47K + Age Income LikesBook 22 67K ? 39 41K ? Age Income LikesBook 22 67K + 39 41K - Model Test Data Prediction Training Data Attributes Target Label © 2013 Datameer, Inc. All rights reserved. Age Income LikesBook 24 60K + 65 80K - 60 95K - 35 52K + 20 45K + 43 75K + 26 51K + 52 47K - 47 38K - 25 22K - 33 47K + Age Income LikesBook 22 67K ? 39 41K ? Age Income LikesBook 22 67K + 39 41K -
  • 27. Deriving Knowledge from Data at Scale Semi-SupervisedLearning Can we makeuse of the unlabeled data? In theory: no ... but we can make assumptions PopularAssumptions • Clustering assumption • Low density assumption • Manifold assumption
  • 28. Deriving Knowledge from Data at Scale TheClusteringAssumption Clustering • Partition instances into groups (clusters) of similar instances • Many different algorithms: k-Means, EM, etc. Clustering Assumption • The two classification targets are distinct clusters • Simple semi-supervised learning: cluster, then perform majority vote
  • 29. Deriving Knowledge from Data at Scale TheClusteringAssumption Clustering • Partition instances into groups (clusters) of similar instances • Many different algorithms: k-Means, EM, etc. Clustering Assumption • The two classification targets are distinct clusters • Simple semi-supervised learning: cluster, then perform majority vote
  • 30. Deriving Knowledge from Data at Scale TheClusteringAssumption Clustering • Partition instances into groups (clusters) of similar instances • Many different algorithms: k-Means, EM, etc. Clustering Assumption • The two classification targets are distinct clusters • Simple semi-supervised learning: cluster, then perform majority vote
  • 31. Deriving Knowledge from Data at Scale TheClusteringAssumption Clustering • Partition instances into groups (clusters) of similar instances • Many different algorithms: k-Means, EM, etc. Clustering Assumption • The two classification targets are distinct clusters • Simple semi-supervised learning: cluster, then perform majority vote
  • 32. Deriving Knowledge from Data at Scale Generative Models Mixture of Gaussians • Assumption: the data in each cluster is generated by a normal distribution • Find most probable location and shape of clusters given data Expectation-Maximization • Two step optimization procedure • Keeps estimates of cluster assignment probabilities for each instance • Might converge to local optimum
  • 33. Deriving Knowledge from Data at Scale GenerativeModels Mixture of Gaussians • Assumption: the data in each cluster is generated by a normal distribution • Find most probable location and shape of clusters given data Expectation-Maximization • Two step optimization procedure • Keeps estimates of cluster assignment probabilities for each instance • Might converge to local optimum
  • 34. Deriving Knowledge from Data at Scale GenerativeModels Mixture of Gaussians • Assumption: the data in each cluster is generated by a normal distribution • Find most probable location and shape of clusters given data Expectation-Maximization • Two step optimization procedure • Keeps estimates of cluster assignment probabilities for each instance • Might converge to local optimum
  • 35. Deriving Knowledge from Data at Scale Generative Models Mixture of Gaussians • Assumption: the data in each cluster is generated by a normal distribution • Find most probable location and shape of clusters given data Expectation-Maximization • Two step optimization procedure • Keeps estimates of cluster assignment probabilities for each instance • Might converge to local optimum
  • 36. Deriving Knowledge from Data at Scale Generative Models Mixture of Gaussians • Assumption: the data in each cluster is generated by a normal distribution • Find most probable location and shape of clusters given data Expectation-Maximization • Two step optimization procedure • Keeps estimates of cluster assignment probabilities for each instance • Might converge to local optimum
  • 37. Deriving Knowledge from Data at Scale Generative Models Mixture of Gaussians • Assumption: the data in each cluster is generated by a normal distribution • Find most probable location and shape of clusters given data Expectation-Maximization • Two step optimization procedure • Keeps estimates of cluster assignment probabilities for each instance • Might converge to local optimum
  • 38. Deriving Knowledge from Data at Scale BeyondMixtures of Gaussians Expectation-Maximization • Can be adjusted to all kinds of mixture models • E.g. use Naive Bayes as mixture model for text classification Self-Training • Learn model on labeled instances only • Apply model to unlabeled instances • Learn new model on all instances • Repeat until convergence
  • 39. Deriving Knowledge from Data at Scale TheLow DensityAssumption Assumption • The area between the two classes has low density • Does not assume any specific form of cluster Support Vector Machine • Decision boundary is linear • Maximizes margin to closest instances
  • 40. Deriving Knowledge from Data at Scale TheLow DensityAssumption Assumption • The area between the two classes has low density • Does not assume any specific form of cluster Support Vector Machine • Decision boundary is linear • Maximizes margin to closest instances
  • 41. Deriving Knowledge from Data at Scale TheLow DensityAssumption Assumption • The area between the two classes has low density • Does not assume any specific form of cluster Support Vector Machine • Decision boundary is linear • Maximizes margin to closest instances
  • 42. Deriving Knowledge from Data at Scale TheLow DensityAssumption Semi-Supervised SVM • Minimize distance to labeled and unlabeled instances • Parameter to fine-tune influence of unlabeled instances • Additional constraint: keep class balance correct Implementation • Simple extension of SVM • But non-convex optimization problem
  • 43. Deriving Knowledge from Data at Scale TheLow DensityAssumption Semi-Supervised SVM • Minimize distance to labeled and unlabeled instances • Parameter to fine-tune influence of unlabeled instances • Additional constraint: keep class balance correct Implementation • Simple extension of SVM • But non-convex optimization problem
  • 44. Deriving Knowledge from Data at Scale TheLow DensityAssumption Semi-Supervised SVM • Minimize distance to labeled and unlabeled instances • Parameter to fine-tune influence of unlabeled instances • Additional constraint: keep class balance correct Implementation • Simple extension of SVM • But non-convex optimization problem
  • 45. Deriving Knowledge from Data at Scale Semi-Supervised SVM Stochastic Gradient Descent • One run over the data in random order • Each misclassified or unlabeled instance moves classifier a bit • Steps get smaller over time Implementation on Hadoop • Mapper: send data to reducer in random order • Reducer: update linear classifier for unlabeled or misclassified instances • Many random runs to find best one
  • 46. Deriving Knowledge from Data at Scale Semi-Supervised SVM Stochastic Gradient Descent • One run over the data in random order • Each misclassified or unlabeled instance moves classifier a bit • Steps get smaller over time Implementation on Hadoop • Mapper: send data to reducer in random order • Reducer: update linear classifier for unlabeled or misclassified instances • Many random runs to find best one
  • 47. Deriving Knowledge from Data at Scale Semi-Supervised SVM Stochastic Gradient Descent • One run over the data in random order • Each misclassified or unlabeled instance moves classifier a bit • Steps get smaller over time Implementation on Hadoop • Mapper: send data to reducer in random order • Reducer: update linear classifier for unlabeled or misclassified instances • Many random runs to find best one
  • 48. Deriving Knowledge from Data at Scale Semi-Supervised SVM Stochastic Gradient Descent • One run over the data in random order • Each misclassified or unlabeled instance moves classifier a bit • Steps get smaller over time Implementation on Hadoop • Mapper: send data to reducer in random order • Reducer: update linear classifier for unlabeled or misclassified instances • Many random runs to find best one
  • 49. Deriving Knowledge from Data at Scale Semi-Supervised SVM Stochastic Gradient Descent • One run over the data in random order • Each misclassified or unlabeled instance moves classifier a bit • Steps get smaller over time Implementation on Hadoop • Mapper: send data to reducer in random order • Reducer: update linear classifier for unlabeled or misclassified instances • Many random runs to find best one
  • 50. Deriving Knowledge from Data at Scale TheManifoldAssumption The Assumption • Training data is (roughly) contained in a low dimensional manifold • One can perform learning in a more meaningful low-dimensional space • Avoids curse of dimensionality Similarity Graphs • Idea: compute similarity scores between instances • Create network where the nearest neighbors are connected
  • 51. Deriving Knowledge from Data at Scale TheManifoldAssumption The Assumption • Training data is (roughly) contained in a low dimensional manifold • One can perform learning in a more meaningful low-dimensional space • Avoids curse of dimensionality Similarity Graphs • Idea: compute similarity scores between instances • Create a network where the nearest neighbors are connected
  • 52. Deriving Knowledge from Data at Scale TheManifoldAssumption The Assumption • Training data is (roughly) contained in a low dimensional manifold • One can perform learning in a more meaningful low-dimensional space • Avoids curse of dimensionality SimilarityGraphs • Idea: compute similarity scores between instances • Create network where the nearest neighbors are connected
  • 53. Deriving Knowledge from Data at Scale Label Propagation MainIdea • Propagate label information to neighboring instances • Then repeat until convergence • Similar to PageRank Theory • Known to converge under weak conditions • Equivalent to matrix inversion
  • 54. Deriving Knowledge from Data at Scale Label Propagation MainIdea • Propagate label information to neighboring instances • Then repeat until convergence • Similar to PageRank Theory • Known to converge under weak conditions • Equivalent to matrix inversion
  • 55. Deriving Knowledge from Data at Scale Label Propagation MainIdea • Propagate label information to neighboring instances • Then repeat until convergence • Similar to PageRank Theory • Known to converge under weak conditions • Equivalent to matrix inversion
  • 56. Deriving Knowledge from Data at Scale Label Propagation MainIdea • Propagate label information to neighboring instances • Then repeat until convergence • Similar to PageRank Theory • Known to converge under weak conditions • Equivalent to matrix inversion
  • 57. Deriving Knowledge from Data at Scale Conclusion Semi-Supervised Learning • Only few training instances have labels • Unlabeled instances can still provide valuable signal Different assumptions lead to different approaches • Cluster assumption: generative models • Low density assumption: semi-supervised support vector machines • Manifold assumption: label propagation
  • 58. Deriving Knowledge from Data at Scale 10 Minute Break…
  • 59. Deriving Knowledge from Data at Scale Controlled Experiments
  • 60. Deriving Knowledge from Data at Scale • A • B
  • 61. Deriving Knowledge from Data at Scale OEC Overall Evaluation Criterion Picking a good OEC is key
  • 62. Deriving Knowledge from Data at Scale
  • 63. Deriving Knowledge from Data at Scale • Lesson #2: GET THE DATA
  • 64. Deriving Knowledge from Data at Scale
  • 65. Deriving Knowledge from Data at Scale • Lesson #2: Get the data!
  • 66. Deriving Knowledge from Data at Scale Lesson #3: Prepare to be humbled Left Elevator Right Elevator
  • 67. Deriving Knowledge from Data at Scale • Lesson #1 • Lesson #2 • Lesson #3 15% Bing
  • 68. Deriving Knowledge from Data at Scale • HiPPO stop the project From Greg Linden’s Blog: http://glinden.blogspot.com/2006/04/early-amazon-shopping-cart.html
  • 69. Deriving Knowledge from Data at Scale TED talk
  • 70. Deriving Knowledge from Data at Scale • Must run statistical tests to confirm differences are not due to chance • Best scientific way to prove causality, i.e., the changes in metrics are caused by changes introduced in the treatment(s)
  • 71. Deriving Knowledge from Data at Scale
  • 72. Deriving Knowledge from Data at Scale • Raise your right hand if you think A Wins • Raise your left hand if you think B Wins • Don’t raise your hand if you think they’re about the same A B
  • 73. Deriving Knowledge from Data at Scale • A was 8.5% better
  • 74. Deriving Knowledge from Data at Scale A B Differences: A has taller search box (overall size is the same), has magnifying glass icon, “popular searches” B has big search button • Raise your right hand if you think A Wins • Raise your left hand if you think B Wins • Don’t raise your hand if they are the about the same
  • 75. Deriving Knowledge from Data at Scale
  • 76. Deriving Knowledge from Data at Scale
  • 77. Deriving Knowledge from Data at Scale A B • Raise your right hand if you think A Wins • Raise your left hand if you think B Wins • Don’t raise your hand if they are the about the same
  • 78. Deriving Knowledge from Data at Scale get the data prepare to be humbled
  • 79. Deriving Knowledge from Data at Scale Any statistic that appears interesting is almost certainly a mistake  If something is “amazing,” find the flaw!  Examples  If you have a mandatory birth date field and people think it’s unnecessary, you’ll find lots of 11/11/11 or 01/01/01  If you have an optional drop down, do not default to the first alphabetical entry, or you’ll have lots jobs = Astronaut  The previous Office example assumes click maps to revenue. Seemed reasonable, but when the results look so extreme, find the flaw (conversion rate is not the same; see why?)
  • 80. Deriving Knowledge from Data at Scale Data Trumps Intuition
  • 81. Deriving Knowledge from Data at Scale Sir Ken Robinson
  • 82. Deriving Knowledge from Data at Scale • OEC = Overall Evaluation Criterion
  • 83. Deriving Knowledge from Data at Scale • Controlled Experiments in one slide • Examples: you’re the decision maker
  • 84. Deriving Knowledge from Data at Scale It is difficult to get a man to understand something when his salary depends upon his not understanding it. -- Upton Sinclair
  • 85. Deriving Knowledge from Data at Scale Hubris
  • 86. Deriving Knowledge from Data at Scale Cultural Stage 2 Insight through Measurement and Control • Semmelweis worked at Vienna’s General Hospital, an important teaching/research hospital, in the 1830s-40s • In 19th-century Europe, childbed fever killed more than a million women • Measurement: the mortality rate for women giving birth was • 15% in his ward, staffed by doctors and students • 2% in the ward at the hospital, attended by midwives
  • 87. Deriving Knowledge from Data at Scale Cultural Stage 2 Insight through Measurement and Control • He tried to control all differences • Birthing positions, ventilation, diet, even the way laundry was done • He was away for 4 months and death rate fell significantly when he was away. Could it be related to him? • Insight: • Doctors were performing autopsies each morning on cadavers • Conjecture: particles (called germs today) were being transmitted to healthy patients on the hands of the physicians • He experiments with cleansing agents • Chlorine and lime was effective: death rate fell from 18% to 1%
  • 88. Deriving Knowledge from Data at Scale Semmelweis Reflex • Semmelweis Reflex 2005 study: inadequate hand washing is one of the prime contributors to the 2 million health-care-associated infections and 90,000 related deaths annually in the United States
  • 89. Deriving Knowledge from Data at Scale Fundamental Understanding
  • 90. Deriving Knowledge from Data at Scale Hubris Measure and Control Accept Results avoid Semmelweis Reflex Fundamental Understanding
  • 91. Deriving Knowledge from Data at Scale • Controlled Experiments in one slide • Examples: you’re the decision maker • Cultural evolution: hubris, insight through measurement, Semmelweis reflex, fundamental understanding
  • 92. Deriving Knowledge from Data at Scale
  • 93. Deriving Knowledge from Data at Scale • Real Data for the city of Oldenburg, Germany • X-axis: stork population • Y-axis: human population What your mother told you about babies and storks when you were three is still not right, despite the strong correlational “evidence” Ornitholigische Monatsberichte 1936;44(2)
  • 94. Deriving Knowledge from Data at Scale Women have smaller palms and live 6 years longer on average But…don’t try to bandage your hands
  • 95. Deriving Knowledge from Data at Scale causal
  • 96. Deriving Knowledge from Data at Scale If you don't know where you are going, any road will take you there —Lewis Carroll
  • 97. Deriving Knowledge from Data at Scale
  • 98. Deriving Knowledge from Data at Scale before
  • 99. Deriving Knowledge from Data at Scale
  • 100. Deriving Knowledge from Data at Scale
  • 101. Deriving Knowledge from Data at Scale • Hippos kill more humans than any other (non-human) mammal (really) • OEC Get the data • Prepare to be humbled The less data, the stronger the opinions…
  • 102. Deriving Knowledge from Data at Scale Out of Class Reading Eight (8) page conference paper 40 page journal version…
  • 103. Deriving Knowledge from Data at Scale
  • 104. Deriving Knowledge from Data at Scale Course Project Due Oct. 25th
  • 105. Deriving Knowledge from Data at Scale Open Discussion on Course Project…
  • 106. Deriving Knowledge from Data at Scale
  • 107. Deriving Knowledge from Data at Scale
  • 108. Deriving Knowledge from Data at Scale Gallery of Experiments Contributed by the community
  • 109. Deriving Knowledge from Data at Scale Azure Machine Learning Studio
  • 110. Deriving Knowledge from Data at Scale Sample Experiments To help you get started
  • 111. Deriving Knowledge from Data at Scale Experiment Tools that you can use in your experiment. For feature selection, large set of machine learning algorithms
  • 112. Deriving Knowledge from Data at Scale
  • 113. Deriving Knowledge from Data at Scale Using classificatio n algorithms Evaluating the model Splitting to Training and Testing Datasets Getting Data For the Experiment
  • 114. Deriving Knowledge from Data at Scale http://gallery.azureml.net/browse/?tags=[%22Azure%20ML%20Book%22
  • 115. Deriving Knowledge from Data at Scale Customer Churn Model
  • 116. Deriving Knowledge from Data at Scale Deployed web service endpoints that can be consumed by applications and for batch processing
  • 117. Deriving Knowledge from Data at Scale
  • 118. Deriving Knowledge from Data at Scale Define Objective Access and Understand the Data Pre-processing Feature and/or Target construction 1. Define the objective and quantify it with a metric – optionally with constraints, if any. This typically requires domain knowledge. 2. Collect and understand the data, deal with the vagaries and biases in the data acquisition (missing data, outliers due to errors in the data collection process, more sophisticated biases due to the data collection procedure etc 3. Frame the problem in terms of a machine learning problem – classification, regression, ranking, clustering, forecasting, outlier detection etc. – some combination of domain knowledge and ML knowledge is useful. 4. Transform the raw data into a “modeling dataset”, with features, weights, targets etc., which can be used for modeling. Feature construction can often be improved with domain knowledge. Target must be identical (or a very good proxy) of the quantitative metric identified step 1.
  • 119. Deriving Knowledge from Data at Scale Feature selection Model training Model scoring Evaluation Train/ Test split 5. Train, test and evaluate, taking care to control bias/variance and ensure the metrics are reported with the right confidence intervals (cross-validation helps here), be vigilant against target leaks (which typically leads to unbelievably good test metrics) – this is the ML heavy step.
  • 120. Deriving Knowledge from Data at Scale Define Objective Access and Understand the data Pre-processing Feature and/or Target construction Feature selection Model training Model scoring Evaluation Train/ Test split 6. Iterate steps (2) – (5) until the test metrics are satisfactory
  • 121. Deriving Knowledge from Data at Scale Access Data Pre-processing Feature construction Model scoring
  • 122. Deriving Knowledge from Data at Scale
  • 123. Deriving Knowledge from Data at Scale
  • 124. Deriving Knowledge from Data at Scale Book Recommendation
  • 125. Deriving Knowledge from Data at Scale That’s all for our course….