SlideShare une entreprise Scribd logo
1  sur  50
Télécharger pour lire hors ligne
A field guide to the machine
learning zoo
Theodore Vasiloudis SICS/KTH
From idea to objective function
Formulating an ML problem
Formulating an ML problem
● Common aspects
Source: Xing (2015)
Formulating an ML problem
● Common aspects
○ Model (θ)
Source: Xing (2015)
Formulating an ML problem
● Common aspects
○ Model (θ)
○ Data (D)
Source: Xing (2015)
Formulating an ML problem
● Common aspects
○ Model (θ)
○ Data (D)
● Objective function: L(θ, D)
Source: Xing (2015)
Formulating an ML problem
● Common aspects
○ Model (θ)
○ Data (D)
● Objective function: L(θ, D)
● Prior knowledge: r(θ)
Source: Xing (2015)
Formulating an ML problem
● Common aspects
○ Model (θ)
○ Data (D)
● Objective function: L(θ, D)
● Prior knowledge: r(θ)
● ML program: f(θ, D) = L(θ, D) + r(θ)
Source: Xing (2015)
Formulating an ML problem
● Common aspects
○ Model (θ)
○ Data (D)
● Objective function: L(θ, D)
● Prior knowledge: r(θ)
● ML program: f(θ, D) = L(θ, D) + r(θ)
● ML Algorithm: How to optimize f(θ, D)
Source: Xing (2015)
Example: Improve retention at Twitter
● Goal: Reduce the churn of users on Twitter
● Assumption: Users churn because they don’t engage with the platform
● Idea: Increase the retweets, by promoting tweets more likely to be
retweeted
Example: Improve retention at Twitter
● Goal: Reduce the churn of users on Twitter
● Assumption: Users churn because they don’t engage with the platform
● Idea: Increase the retweets, by promoting tweets more likely to be
retweeted
● Data (D):
● Model (θ):
● Objective function - L(D, θ):
● Prior knowledge (Regularization):
● Algorithm:
Example: Improve retention at Twitter
● Goal: Reduce the churn of users on Twitter
● Assumption: Users churn because they don’t engage with the platform
● Idea: Increase the retweets, by promoting tweets more likely to be
retweeted
● Data (D): Features and labels, xi
, yi
● Model (θ):
● Objective function - L(D, θ):
● Prior knowledge (Regularization):
● Algorithm:
Example: Improve retention at Twitter
● Goal: Reduce the churn of users on Twitter
● Assumption: Users churn because they don’t engage with the platform
● Idea: Increase the retweets, by promoting tweets more likely to be
retweeted
● Data (D): Features and labels, xi
, yi
● Model (θ): Logistic regression, parameters w
○ p(y|x, w) = Bernouli(y | sigm(wΤ
x))
● Objective function - L(D, θ):
● Prior knowledge (Regularization):
● Algorithm:
Example: Improve retention at Twitter
● Goal: Reduce the churn of users on Twitter
● Assumption: Users churn because they don’t engage with the platform
● Idea: Increase the retweets, by promoting tweets more likely to be
retweeted
● Data (D): Features and labels, xi
, yi
● Model (θ): Logistic regression, parameters w
○ p(y|x, w) = Bernouli(y | sigm(wΤ
x))
● Objective function - L(D, θ): NLL(w) = Σ log(1 + exp(-y wΤ
xi
))
● Prior knowledge (Regularization): r(w) = λ*wΤ
w
● Algorithm:
Warning:
Notation abuse
Example: Improve retention at Twitter
● Goal: Reduce the churn of users on Twitter
● Assumption: Users churn because they don’t engage with the platform
● Idea: Increase the retweets, by promoting tweets more likely to be
retweeted
● Data (D): Features and labels, xi
, yi
● Model (θ): Logistic regression, parameters w
○ p(y|x, w) = Bernouli(y | sigm(wΤ
x))
● Objective function - L(D, θ): NLL(w) = Σ log(1 + exp(-y wΤ
xi
))
● Prior knowledge (Regularization): r(w) = λ*wΤ
w
● Algorithm: Gradient Descent
Data problems
Data problems
● GIGO: Garbage In - Garbage Out
Data readiness
Source: Lawrence (2017)
Data readiness
● Problem: “Data” as a concept is hard to reason about.
● Goal: Make the stakeholders aware of the state of the data at all stages
Source: Lawrence (2017)
Data readiness
Source: Lawrence (2017)
Data readiness
● Band C
○ Accessibility
Source: Lawrence (2017)
Data readiness
● Band C
○ Accessibility
● Band B
○ Representation and faithfulness
Source: Lawrence (2017)
Data readiness
● Band C
○ Accessibility
● Band B
○ Representation and faithfulness
● Band A
○ Data in context
Source: Lawrence (2017)
Data readiness
● Band C
○ “How long will it take to bring our user data to C1 level?”
● Band B
○ “Until we know the collection process we can’t move the data to B1.”
● Band A
○ “We realized that we would need location data in order to have an A1 dataset.”
Source: Lawrence (2017)
Data readiness
● Band C
○ “How long will it take to bring our user data to C1 level?”
● Band B
○ “Until we know the collection process we can’t move the data to B1.”
● Band A
○ “We realized that we would need location data in order to have an A1 dataset.”
Selecting algorithm & software:
“Easy” choices
Selecting algorithms
An ML algorithm “farm”
Source: scikit-learn.org
The neural network zoo
Source: Asimov
Institute (2016)
Selecting algorithms
● Always go for the simplest model you can afford
Selecting algorithms
● Always go for the simplest model you can afford
○ Your first model is more about getting the infrastructure right
Source: Zinkevich (2017)
Selecting algorithms
● Always go for the simplest model you can afford
○ Your first model is more about getting the infrastructure right
○ Simple models are usually interpretable. Interpretable models are easier to debug.
Source: Zinkevich (2017)
Selecting algorithms
● Always go for the simplest model you can afford
○ Your first model is more about getting the infrastructure right
○ Simple models are usually interpretable. Interpretable models are easier to debug.
○ Complex model erode boundaries
Source: Sculley et al. (2015)
Selecting algorithms
● Always go for the simplest model you can afford
○ Your first model is more about getting the infrastructure right
○ Simple models are usually interpretable. Interpretable models are easier to debug.
○ Complex model erode boundaries
■ CACE principle: Changing Anything Changes Everything
Source: Sculley et al. (2015)
Selecting software
The ML software zoo
Leaf
Your model vs. the world
What are the problems with ML systems?
Data ML Code Model
Expectation
What are the problems with ML systems?
Data ML Code Model
Reality
Sculley et al. (2015)
Things to watch out for
● Data dependencies
Things to watch out for
Sculley et al. (2015)
& Zinkevich (2017)
● Data dependencies
○ Unstable dependencies
Things to watch out for
Sculley et al. (2015)
& Zinkevich (2017)
● Data dependencies
○ Unstable dependencies
● Feedback loops
Things to watch out for
Sculley et al. (2015)
& Zinkevich (2017)
● Data dependencies
○ Unstable dependencies
● Feedback loops
○ Direct
Things to watch out for
Sculley et al. (2015)
& Zinkevich (2017)
● Data dependencies
○ Unstable dependencies
● Feedback loops
○ Direct
○ Indirect
Things to watch out for
Sculley et al. (2015)
& Zinkevich (2017)
Bringing it all together
Bringing it all together
● Define your problem as optimizing your objective function using data
● Determine (and monitor) the readiness of your data
● Don't spend too much time at first choosing an ML framework/algorithm
● Worry much more about what happens when your model meets the world.
Thank you.
@thvasilo
tvas@sics.se
Sources
● Google auto-replies: Shared photos, and text
● Silver et al. (2016): Mastering the game of Go
● Xing (2015): A new look at the system, algorithm and theory foundations of Distributed ML
● Lawrence (2017): Data readiness levels
● Asimov Institute (2016): The Neural Network Zoo
● Zinkevich (2017): Rules of Machine Learning - Best Practices for ML Engineering
● Sculley et al. (2015): Hidden Technical Debt in Machine Learning Systems

Contenu connexe

Similaire à A field guide the machine learning zoo

General introduction to AI ML DL DS
General introduction to AI ML DL DSGeneral introduction to AI ML DL DS
General introduction to AI ML DL DSRoopesh Kohad
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning SystemsXavier Amatriain
 
PyData SF 2016 --- Moving forward through the darkness
PyData SF 2016 --- Moving forward through the darknessPyData SF 2016 --- Moving forward through the darkness
PyData SF 2016 --- Moving forward through the darknessChia-Chi Chang
 
A Plan for Sustainable MIR Evaluation
A Plan for Sustainable MIR EvaluationA Plan for Sustainable MIR Evaluation
A Plan for Sustainable MIR EvaluationJulián Urbano
 
Online Machine Learning: introduction and examples
Online Machine Learning:  introduction and examplesOnline Machine Learning:  introduction and examples
Online Machine Learning: introduction and examplesFelipe
 
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systemsBIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systemsXavier Amatriain
 
Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...
Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...
Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...Codiax
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceSampath Kumar
 
Human in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AIHuman in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AIPramit Choudhary
 
D7 MarkPlus - Machine Learning Algorithm.pdf
D7 MarkPlus - Machine Learning Algorithm.pdfD7 MarkPlus - Machine Learning Algorithm.pdf
D7 MarkPlus - Machine Learning Algorithm.pdfikraizn
 
A few questions about large scale machine learning
A few questions about large scale machine learningA few questions about large scale machine learning
A few questions about large scale machine learningTheodoros Vasiloudis
 
DSDT Meetup April 2021
DSDT Meetup April 2021DSDT Meetup April 2021
DSDT Meetup April 2021DSDT_MTL
 
London TensorFlow Meetup - 18th July 2017
London TensorFlow Meetup - 18th July 2017London TensorFlow Meetup - 18th July 2017
London TensorFlow Meetup - 18th July 2017Daniel Ecer
 
An Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
An Empirical Comparison of Knowledge Graph Embeddings for Item RecommendationAn Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
An Empirical Comparison of Knowledge Graph Embeddings for Item RecommendationEnrico Palumbo
 
Improving How We Deliver Machine Learning Models (XCONF 2019)
Improving How We Deliver Machine Learning Models (XCONF 2019)Improving How We Deliver Machine Learning Models (XCONF 2019)
Improving How We Deliver Machine Learning Models (XCONF 2019)David Tan
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4jjexp
 

Similaire à A field guide the machine learning zoo (20)

General introduction to AI ML DL DS
General introduction to AI ML DL DSGeneral introduction to AI ML DL DS
General introduction to AI ML DL DS
 
L15.pptx
L15.pptxL15.pptx
L15.pptx
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
 
PyData SF 2016 --- Moving forward through the darkness
PyData SF 2016 --- Moving forward through the darknessPyData SF 2016 --- Moving forward through the darkness
PyData SF 2016 --- Moving forward through the darkness
 
A Plan for Sustainable MIR Evaluation
A Plan for Sustainable MIR EvaluationA Plan for Sustainable MIR Evaluation
A Plan for Sustainable MIR Evaluation
 
Online Machine Learning: introduction and examples
Online Machine Learning:  introduction and examplesOnline Machine Learning:  introduction and examples
Online Machine Learning: introduction and examples
 
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systemsBIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
 
Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...
Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...
Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Human in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AIHuman in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AI
 
D7 MarkPlus - Machine Learning Algorithm.pdf
D7 MarkPlus - Machine Learning Algorithm.pdfD7 MarkPlus - Machine Learning Algorithm.pdf
D7 MarkPlus - Machine Learning Algorithm.pdf
 
A few questions about large scale machine learning
A few questions about large scale machine learningA few questions about large scale machine learning
A few questions about large scale machine learning
 
DSDT Meetup April 2021
DSDT Meetup April 2021DSDT Meetup April 2021
DSDT Meetup April 2021
 
Data science
Data scienceData science
Data science
 
London TensorFlow Meetup - 18th July 2017
London TensorFlow Meetup - 18th July 2017London TensorFlow Meetup - 18th July 2017
London TensorFlow Meetup - 18th July 2017
 
Learning for Big Data-林軒田
Learning for Big Data-林軒田Learning for Big Data-林軒田
Learning for Big Data-林軒田
 
An Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
An Empirical Comparison of Knowledge Graph Embeddings for Item RecommendationAn Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
An Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
 
Improving How We Deliver Machine Learning Models (XCONF 2019)
Improving How We Deliver Machine Learning Models (XCONF 2019)Improving How We Deliver Machine Learning Models (XCONF 2019)
Improving How We Deliver Machine Learning Models (XCONF 2019)
 
C2_W1---.pdf
C2_W1---.pdfC2_W1---.pdf
C2_W1---.pdf
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
 

Dernier

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 

Dernier (20)

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 

A field guide the machine learning zoo

  • 1. A field guide to the machine learning zoo Theodore Vasiloudis SICS/KTH
  • 2. From idea to objective function
  • 4. Formulating an ML problem ● Common aspects Source: Xing (2015)
  • 5. Formulating an ML problem ● Common aspects ○ Model (θ) Source: Xing (2015)
  • 6. Formulating an ML problem ● Common aspects ○ Model (θ) ○ Data (D) Source: Xing (2015)
  • 7. Formulating an ML problem ● Common aspects ○ Model (θ) ○ Data (D) ● Objective function: L(θ, D) Source: Xing (2015)
  • 8. Formulating an ML problem ● Common aspects ○ Model (θ) ○ Data (D) ● Objective function: L(θ, D) ● Prior knowledge: r(θ) Source: Xing (2015)
  • 9. Formulating an ML problem ● Common aspects ○ Model (θ) ○ Data (D) ● Objective function: L(θ, D) ● Prior knowledge: r(θ) ● ML program: f(θ, D) = L(θ, D) + r(θ) Source: Xing (2015)
  • 10. Formulating an ML problem ● Common aspects ○ Model (θ) ○ Data (D) ● Objective function: L(θ, D) ● Prior knowledge: r(θ) ● ML program: f(θ, D) = L(θ, D) + r(θ) ● ML Algorithm: How to optimize f(θ, D) Source: Xing (2015)
  • 11. Example: Improve retention at Twitter ● Goal: Reduce the churn of users on Twitter ● Assumption: Users churn because they don’t engage with the platform ● Idea: Increase the retweets, by promoting tweets more likely to be retweeted
  • 12. Example: Improve retention at Twitter ● Goal: Reduce the churn of users on Twitter ● Assumption: Users churn because they don’t engage with the platform ● Idea: Increase the retweets, by promoting tweets more likely to be retweeted ● Data (D): ● Model (θ): ● Objective function - L(D, θ): ● Prior knowledge (Regularization): ● Algorithm:
  • 13. Example: Improve retention at Twitter ● Goal: Reduce the churn of users on Twitter ● Assumption: Users churn because they don’t engage with the platform ● Idea: Increase the retweets, by promoting tweets more likely to be retweeted ● Data (D): Features and labels, xi , yi ● Model (θ): ● Objective function - L(D, θ): ● Prior knowledge (Regularization): ● Algorithm:
  • 14. Example: Improve retention at Twitter ● Goal: Reduce the churn of users on Twitter ● Assumption: Users churn because they don’t engage with the platform ● Idea: Increase the retweets, by promoting tweets more likely to be retweeted ● Data (D): Features and labels, xi , yi ● Model (θ): Logistic regression, parameters w ○ p(y|x, w) = Bernouli(y | sigm(wΤ x)) ● Objective function - L(D, θ): ● Prior knowledge (Regularization): ● Algorithm:
  • 15. Example: Improve retention at Twitter ● Goal: Reduce the churn of users on Twitter ● Assumption: Users churn because they don’t engage with the platform ● Idea: Increase the retweets, by promoting tweets more likely to be retweeted ● Data (D): Features and labels, xi , yi ● Model (θ): Logistic regression, parameters w ○ p(y|x, w) = Bernouli(y | sigm(wΤ x)) ● Objective function - L(D, θ): NLL(w) = Σ log(1 + exp(-y wΤ xi )) ● Prior knowledge (Regularization): r(w) = λ*wΤ w ● Algorithm: Warning: Notation abuse
  • 16. Example: Improve retention at Twitter ● Goal: Reduce the churn of users on Twitter ● Assumption: Users churn because they don’t engage with the platform ● Idea: Increase the retweets, by promoting tweets more likely to be retweeted ● Data (D): Features and labels, xi , yi ● Model (θ): Logistic regression, parameters w ○ p(y|x, w) = Bernouli(y | sigm(wΤ x)) ● Objective function - L(D, θ): NLL(w) = Σ log(1 + exp(-y wΤ xi )) ● Prior knowledge (Regularization): r(w) = λ*wΤ w ● Algorithm: Gradient Descent
  • 18. Data problems ● GIGO: Garbage In - Garbage Out
  • 20. Data readiness ● Problem: “Data” as a concept is hard to reason about. ● Goal: Make the stakeholders aware of the state of the data at all stages Source: Lawrence (2017)
  • 22. Data readiness ● Band C ○ Accessibility Source: Lawrence (2017)
  • 23. Data readiness ● Band C ○ Accessibility ● Band B ○ Representation and faithfulness Source: Lawrence (2017)
  • 24. Data readiness ● Band C ○ Accessibility ● Band B ○ Representation and faithfulness ● Band A ○ Data in context Source: Lawrence (2017)
  • 25. Data readiness ● Band C ○ “How long will it take to bring our user data to C1 level?” ● Band B ○ “Until we know the collection process we can’t move the data to B1.” ● Band A ○ “We realized that we would need location data in order to have an A1 dataset.” Source: Lawrence (2017)
  • 26. Data readiness ● Band C ○ “How long will it take to bring our user data to C1 level?” ● Band B ○ “Until we know the collection process we can’t move the data to B1.” ● Band A ○ “We realized that we would need location data in order to have an A1 dataset.”
  • 27. Selecting algorithm & software: “Easy” choices
  • 29. An ML algorithm “farm” Source: scikit-learn.org
  • 30. The neural network zoo Source: Asimov Institute (2016)
  • 31. Selecting algorithms ● Always go for the simplest model you can afford
  • 32. Selecting algorithms ● Always go for the simplest model you can afford ○ Your first model is more about getting the infrastructure right Source: Zinkevich (2017)
  • 33. Selecting algorithms ● Always go for the simplest model you can afford ○ Your first model is more about getting the infrastructure right ○ Simple models are usually interpretable. Interpretable models are easier to debug. Source: Zinkevich (2017)
  • 34. Selecting algorithms ● Always go for the simplest model you can afford ○ Your first model is more about getting the infrastructure right ○ Simple models are usually interpretable. Interpretable models are easier to debug. ○ Complex model erode boundaries Source: Sculley et al. (2015)
  • 35. Selecting algorithms ● Always go for the simplest model you can afford ○ Your first model is more about getting the infrastructure right ○ Simple models are usually interpretable. Interpretable models are easier to debug. ○ Complex model erode boundaries ■ CACE principle: Changing Anything Changes Everything Source: Sculley et al. (2015)
  • 37. The ML software zoo Leaf
  • 38. Your model vs. the world
  • 39. What are the problems with ML systems? Data ML Code Model Expectation
  • 40. What are the problems with ML systems? Data ML Code Model Reality Sculley et al. (2015)
  • 41. Things to watch out for
  • 42. ● Data dependencies Things to watch out for Sculley et al. (2015) & Zinkevich (2017)
  • 43. ● Data dependencies ○ Unstable dependencies Things to watch out for Sculley et al. (2015) & Zinkevich (2017)
  • 44. ● Data dependencies ○ Unstable dependencies ● Feedback loops Things to watch out for Sculley et al. (2015) & Zinkevich (2017)
  • 45. ● Data dependencies ○ Unstable dependencies ● Feedback loops ○ Direct Things to watch out for Sculley et al. (2015) & Zinkevich (2017)
  • 46. ● Data dependencies ○ Unstable dependencies ● Feedback loops ○ Direct ○ Indirect Things to watch out for Sculley et al. (2015) & Zinkevich (2017)
  • 47. Bringing it all together
  • 48. Bringing it all together ● Define your problem as optimizing your objective function using data ● Determine (and monitor) the readiness of your data ● Don't spend too much time at first choosing an ML framework/algorithm ● Worry much more about what happens when your model meets the world.
  • 50. Sources ● Google auto-replies: Shared photos, and text ● Silver et al. (2016): Mastering the game of Go ● Xing (2015): A new look at the system, algorithm and theory foundations of Distributed ML ● Lawrence (2017): Data readiness levels ● Asimov Institute (2016): The Neural Network Zoo ● Zinkevich (2017): Rules of Machine Learning - Best Practices for ML Engineering ● Sculley et al. (2015): Hidden Technical Debt in Machine Learning Systems