SlideShare a Scribd company logo
1 of 20
Skillsto Master Machine Learning and
Data Science Project Cycle &
Strategiesto Avoid Common Pitfalls
MING LI, AMAZON
KDD 2019 WORKSHOP “INITIATIVE FOR ANALYTICS AND DATA SCIENCE STANDARDS”
It is rare to see machine learning and data
science presentations with topics focusing
on end-to-end project cycle. It is even less to
discuss project common pitfalls.
In this talk, we cover both!
End-to-End Machine Learning and
Data Science Project Cycle
Overview
 Model exploring and development
 Model training, validation, testing
 Model selection
Breakdown
o Python + sklearn
o R & various libs
o Keras + DL libs
o Spark MLlib
o Traditional machine
learning (ML) methods
o Deep learning methods
o General model training
and tuning process
 Data science formulation
 Data quality and availability
 Data preprocessing
 Feature engineering
Breakdown
o Statistical/ML thinking
o Idea abstraction
o Data generation process
o Data understanding
o Common preprocessing and
feature engineering ideas
o SQL
o Python: pandas
o R: tidyverse
o PySpark / SparkR
 Business problem definition and understanding
 Quantifying business value and define key metrics
 Computation resources assessment
 Key milestones and timeline
 Data security, privacy and legal review
Breakdown
o Business insight
o Collaboration
o Teamwork
o Planning
o Prototype
 A/B testing in production system
 Model deployment in production
environment
 Exception management
 Performance monitoring
Breakdown
o Production system
knowledge
o Launch plan
o Anomaly detection
o Dashboard
o Tableau
o R-Shiny
 Model tuning and re-training
 Model update and add-on
 Model failure and retirement
 …
Breakdown
o Quantify business
dynamics
o Consider different
scenario
o Plan and set criteria
o Auto ML
Breakdown  Business problem definition and
understanding
 Quantifying business value and define
key metrics
 Computation resources assessment
 Key milestones and timeline
 Data security, privacy and legal review
 Data science formulation
 Data quality and availability
 Data preprocessing
 Feature engineering
 Model exploring and development
 Model training, validation, testing
 Model selection
 A/B testing in production system
 Model deployment in production
environment
 Exception management
 Performance monitoring
 Model tuning and re-training
 Model update and add-on
 Model failure and retirement
Project
Cycle
Planning
Formulation
Modeling
Production
Post
Production
 Business teams
o Operation team
o Business analyst team
o Insight and reporting team
 Technology team
o Database and data warehouse team
o Data engineering team
o Infrastructure team
o Core machine learning team
o Software development team
o Visualization dashboard team
o Production implementation
 Project management team
 Program management team
 Product management team
 Senior leadership team
 Leaders across organizations
Cross Team
Collaboration
Agile-Style
Project
Management
Team Work
Common Pitfalls in Machine Learning and
Data Science Projects
Project Planning Stage
Solving the wrong problem
o Vague description of business needs
o Misalignment across many teams (Scientist, Developer, Operation, Project Managers etc.)
o Scientist team are not actively participating in the problem formulation process
Too optimistic about the timeline
o Project managers may not have past experience for ML and data science projects
o Many ML method-specific uncertainties are not accounted for at planning stage
o ML and data science projects are fundamentally different from each other and from software development
projects (such as online vs. offline model, batch model, real time training, re-training etc.)
Over promise on business value
o Unrealistic high expectation (i.e. advertisement vs actual product)
o Many assumptions about the project are usually not true
o Similar projects from other teams/companies are not evaluated thoroughly to set realistic expectation of time line
and outcome
Problem Formulation Stage
Too optimistic about standard statistical and ML methods
o Extra efforts are needed to abstract business problem into a set of analytics problems
o Standard methods are usually not enough to solve the business problems
Too optimistic about data availability and quality
o “Big data” is not a guarantee of good and relevant data, usually big and messy
o Ideal data for the business problem is almost always not available
o Unexpected efforts to bring the right data
o Under estimate effort to evaluate quality of data
Too optimistic about needed effort on data preprocessing
o Table or column descriptions are not detailed enough
o Lack in-depth understanding of the dataset
o Under estimate of date preprocessing (such as dealing missing data)
o Under estimate the effort for feature engineering
o Mismatch between different data sources (such as online vs offline, different tables etc.)
Modeling Stage
Un-representative data (such as lack of future outlook of what
will happen in production or biased data)
Too optimistic about model selection and hyper-parameter
tuning to reach desired performance
Overfitting and obsession for complicated models (heavy models
may leads poor production performance)
Take too long to fail
Productionization Stage
Bad production performance
oLack shadow mode dry run
oLack needed A/B testing
oData availability and stability issue in real time
oLack exception management on issues such as timeout and missing data
Fail to scale in real time applications
oComputation capacity limitation
oReal time data storage and processing limitation
oLatency constrains
oNot enough engineering resources (i.e. SDE, DE) during implementation
Post Production Stage
Missing necessary checkup
oLack model monitoring for key metrics
oLack exception notification
oLack model failures/timeout notification
oOnline feature not stored for future analysis
Production performance degradation
oNot aware of dynamic nature of the business problem
oNot aware of changing input data quality and availability
oLack model tuning and re-training plan
oLack model retirement or replacement plan
Strategies to Avoid Pitfalls
Be aware these pitfalls
Proactively discuss these pitfalls across teams
Be prepared when it happens
Have an end-to-end project cycle mindset
Ensure needed engineering recourses available
True collaboration among teams
Data Science Family Career Path
Skills Career Title Education
If you are good at all
three areas, you
become a "full-stack"
data scientist!
Each career title has
its own growth path
has promotion cycle.
Thank you!

More Related Content

What's hot

Challenges of Executing AI
Challenges of Executing AIChallenges of Executing AI
Challenges of Executing AI
Dr. Umesh Rao.Hodeghatta
 
Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0
Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0
Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0
Dr. Mohan K. Bavirisetty
 
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Simplilearn
 

What's hot (20)

Data Science Salon: Building a Data Science Culture
Data Science Salon: Building a Data Science CultureData Science Salon: Building a Data Science Culture
Data Science Salon: Building a Data Science Culture
 
Data science for business leaders executive program
Data science for business leaders executive programData science for business leaders executive program
Data science for business leaders executive program
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist Toolbox
 
Data Science Salon: Culture, Data Engineering and Hamburger Stands: Thoughts ...
Data Science Salon: Culture, Data Engineering and Hamburger Stands: Thoughts ...Data Science Salon: Culture, Data Engineering and Hamburger Stands: Thoughts ...
Data Science Salon: Culture, Data Engineering and Hamburger Stands: Thoughts ...
 
#Datacaeer - AI Guild workshop on data roles in industry with Adam Green
#Datacaeer - AI Guild workshop on data roles in industry with Adam Green#Datacaeer - AI Guild workshop on data roles in industry with Adam Green
#Datacaeer - AI Guild workshop on data roles in industry with Adam Green
 
Data Science Salon: Applying Machine Learning to Modernize Business Processes
Data Science Salon: Applying Machine Learning to Modernize Business ProcessesData Science Salon: Applying Machine Learning to Modernize Business Processes
Data Science Salon: Applying Machine Learning to Modernize Business Processes
 
Challenges of Executing AI
Challenges of Executing AIChallenges of Executing AI
Challenges of Executing AI
 
2016 Data Science Salary Survey
2016 Data Science Salary Survey2016 Data Science Salary Survey
2016 Data Science Salary Survey
 
How academic institutions best support PhDs and postdocs in the transition to...
How academic institutions best support PhDs and postdocs in the transition to...How academic institutions best support PhDs and postdocs in the transition to...
How academic institutions best support PhDs and postdocs in the transition to...
 
New professional careers in data
New professional careers in dataNew professional careers in data
New professional careers in data
 
Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0
Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0
Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0
 
Data science as a professional career
Data science as a professional careerData science as a professional career
Data science as a professional career
 
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
 
Exploring What a Typical Data Science Project Looks Like
Exploring What a Typical Data Science Project Looks LikeExploring What a Typical Data Science Project Looks Like
Exploring What a Typical Data Science Project Looks Like
 
1645 track 1 bress_using his laptop
1645 track 1 bress_using his laptop1645 track 1 bress_using his laptop
1645 track 1 bress_using his laptop
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data Science
 
Data scientist the sexiest job of the 21st century by thomas h davenport and ...
Data scientist the sexiest job of the 21st century by thomas h davenport and ...Data scientist the sexiest job of the 21st century by thomas h davenport and ...
Data scientist the sexiest job of the 21st century by thomas h davenport and ...
 
Leveraged Analytics at Scale
Leveraged Analytics at ScaleLeveraged Analytics at Scale
Leveraged Analytics at Scale
 
Dr. Gábor Kismihók: Labour Market driven Learning Analytics
Dr. Gábor Kismihók: Labour Market driven Learning AnalyticsDr. Gábor Kismihók: Labour Market driven Learning Analytics
Dr. Gábor Kismihók: Labour Market driven Learning Analytics
 
Delivering Value Through Business Analytics
Delivering Value Through Business AnalyticsDelivering Value Through Business Analytics
Delivering Value Through Business Analytics
 

Similar to KDD 2019 IADSS Workshop - Skills to Master Machine Learning and Data Science Project Cycle & Strategies to Avoid Common Pitfalls - Ming Li

Data Engineer vs Data Scientist vs Data Analyst.pptx
Data Engineer vs Data Scientist vs Data Analyst.pptxData Engineer vs Data Scientist vs Data Analyst.pptx
Data Engineer vs Data Scientist vs Data Analyst.pptx
CarolineRebeccaD
 
Warehouse components
Warehouse componentsWarehouse components
Warehouse components
ganblues
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
Paco Nathan
 
Kiran - Senior_System_Business_Consultant_ETL_DW
Kiran - Senior_System_Business_Consultant_ETL_DWKiran - Senior_System_Business_Consultant_ETL_DW
Kiran - Senior_System_Business_Consultant_ETL_DW
NagaRaghuKiranPenuju
 
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Rohit Dubey
 
Marie Drahorad - Sr. Business Analyst
Marie Drahorad - Sr. Business AnalystMarie Drahorad - Sr. Business Analyst
Marie Drahorad - Sr. Business Analyst
Marie Drahorad
 

Similar to KDD 2019 IADSS Workshop - Skills to Master Machine Learning and Data Science Project Cycle & Strategies to Avoid Common Pitfalls - Ming Li (20)

Data Engineer vs Data Scientist vs Data Analyst.pptx
Data Engineer vs Data Scientist vs Data Analyst.pptxData Engineer vs Data Scientist vs Data Analyst.pptx
Data Engineer vs Data Scientist vs Data Analyst.pptx
 
Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field
 
Warehouse components
Warehouse componentsWarehouse components
Warehouse components
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 
Bussiness analyst training in india
Bussiness analyst training in indiaBussiness analyst training in india
Bussiness analyst training in india
 
Agile Mumbai 2022 - Ashwinee Singh | Agile in AI or AI in Agile?
Agile Mumbai 2022 - Ashwinee Singh | Agile in AI or AI in Agile?Agile Mumbai 2022 - Ashwinee Singh | Agile in AI or AI in Agile?
Agile Mumbai 2022 - Ashwinee Singh | Agile in AI or AI in Agile?
 
Business Analyst Training in Hyderabad
Business Analyst Training in HyderabadBusiness Analyst Training in Hyderabad
Business Analyst Training in Hyderabad
 
Bussiness Analyst Online Training in Hyderabad
Bussiness Analyst Online Training in HyderabadBussiness Analyst Online Training in Hyderabad
Bussiness Analyst Online Training in Hyderabad
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area ML
 
Ajit Kumar
Ajit KumarAjit Kumar
Ajit Kumar
 
Ajit Kumar
Ajit KumarAjit Kumar
Ajit Kumar
 
Lean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science teamLean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science team
 
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfThe Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
 
Kiran - Senior_System_Business_Consultant_ETL_DW
Kiran - Senior_System_Business_Consultant_ETL_DWKiran - Senior_System_Business_Consultant_ETL_DW
Kiran - Senior_System_Business_Consultant_ETL_DW
 
ML Application Life Cycle
ML Application Life CycleML Application Life Cycle
ML Application Life Cycle
 
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
 
Marie Drahorad - Sr. Business Analyst
Marie Drahorad - Sr. Business AnalystMarie Drahorad - Sr. Business Analyst
Marie Drahorad - Sr. Business Analyst
 
Big data@work
Big data@workBig data@work
Big data@work
 

Recently uploaded

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
gajnagarg
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
gajnagarg
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 

Recently uploaded (20)

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 

KDD 2019 IADSS Workshop - Skills to Master Machine Learning and Data Science Project Cycle & Strategies to Avoid Common Pitfalls - Ming Li

  • 1. Skillsto Master Machine Learning and Data Science Project Cycle & Strategiesto Avoid Common Pitfalls MING LI, AMAZON KDD 2019 WORKSHOP “INITIATIVE FOR ANALYTICS AND DATA SCIENCE STANDARDS”
  • 2. It is rare to see machine learning and data science presentations with topics focusing on end-to-end project cycle. It is even less to discuss project common pitfalls. In this talk, we cover both!
  • 3. End-to-End Machine Learning and Data Science Project Cycle
  • 5.  Model exploring and development  Model training, validation, testing  Model selection Breakdown o Python + sklearn o R & various libs o Keras + DL libs o Spark MLlib o Traditional machine learning (ML) methods o Deep learning methods o General model training and tuning process
  • 6.  Data science formulation  Data quality and availability  Data preprocessing  Feature engineering Breakdown o Statistical/ML thinking o Idea abstraction o Data generation process o Data understanding o Common preprocessing and feature engineering ideas o SQL o Python: pandas o R: tidyverse o PySpark / SparkR
  • 7.  Business problem definition and understanding  Quantifying business value and define key metrics  Computation resources assessment  Key milestones and timeline  Data security, privacy and legal review Breakdown o Business insight o Collaboration o Teamwork o Planning o Prototype
  • 8.  A/B testing in production system  Model deployment in production environment  Exception management  Performance monitoring Breakdown o Production system knowledge o Launch plan o Anomaly detection o Dashboard o Tableau o R-Shiny
  • 9.  Model tuning and re-training  Model update and add-on  Model failure and retirement  … Breakdown o Quantify business dynamics o Consider different scenario o Plan and set criteria o Auto ML
  • 10. Breakdown  Business problem definition and understanding  Quantifying business value and define key metrics  Computation resources assessment  Key milestones and timeline  Data security, privacy and legal review  Data science formulation  Data quality and availability  Data preprocessing  Feature engineering  Model exploring and development  Model training, validation, testing  Model selection  A/B testing in production system  Model deployment in production environment  Exception management  Performance monitoring  Model tuning and re-training  Model update and add-on  Model failure and retirement Project Cycle Planning Formulation Modeling Production Post Production
  • 11.  Business teams o Operation team o Business analyst team o Insight and reporting team  Technology team o Database and data warehouse team o Data engineering team o Infrastructure team o Core machine learning team o Software development team o Visualization dashboard team o Production implementation  Project management team  Program management team  Product management team  Senior leadership team  Leaders across organizations Cross Team Collaboration Agile-Style Project Management Team Work
  • 12. Common Pitfalls in Machine Learning and Data Science Projects
  • 13. Project Planning Stage Solving the wrong problem o Vague description of business needs o Misalignment across many teams (Scientist, Developer, Operation, Project Managers etc.) o Scientist team are not actively participating in the problem formulation process Too optimistic about the timeline o Project managers may not have past experience for ML and data science projects o Many ML method-specific uncertainties are not accounted for at planning stage o ML and data science projects are fundamentally different from each other and from software development projects (such as online vs. offline model, batch model, real time training, re-training etc.) Over promise on business value o Unrealistic high expectation (i.e. advertisement vs actual product) o Many assumptions about the project are usually not true o Similar projects from other teams/companies are not evaluated thoroughly to set realistic expectation of time line and outcome
  • 14. Problem Formulation Stage Too optimistic about standard statistical and ML methods o Extra efforts are needed to abstract business problem into a set of analytics problems o Standard methods are usually not enough to solve the business problems Too optimistic about data availability and quality o “Big data” is not a guarantee of good and relevant data, usually big and messy o Ideal data for the business problem is almost always not available o Unexpected efforts to bring the right data o Under estimate effort to evaluate quality of data Too optimistic about needed effort on data preprocessing o Table or column descriptions are not detailed enough o Lack in-depth understanding of the dataset o Under estimate of date preprocessing (such as dealing missing data) o Under estimate the effort for feature engineering o Mismatch between different data sources (such as online vs offline, different tables etc.)
  • 15. Modeling Stage Un-representative data (such as lack of future outlook of what will happen in production or biased data) Too optimistic about model selection and hyper-parameter tuning to reach desired performance Overfitting and obsession for complicated models (heavy models may leads poor production performance) Take too long to fail
  • 16. Productionization Stage Bad production performance oLack shadow mode dry run oLack needed A/B testing oData availability and stability issue in real time oLack exception management on issues such as timeout and missing data Fail to scale in real time applications oComputation capacity limitation oReal time data storage and processing limitation oLatency constrains oNot enough engineering resources (i.e. SDE, DE) during implementation
  • 17. Post Production Stage Missing necessary checkup oLack model monitoring for key metrics oLack exception notification oLack model failures/timeout notification oOnline feature not stored for future analysis Production performance degradation oNot aware of dynamic nature of the business problem oNot aware of changing input data quality and availability oLack model tuning and re-training plan oLack model retirement or replacement plan
  • 18. Strategies to Avoid Pitfalls Be aware these pitfalls Proactively discuss these pitfalls across teams Be prepared when it happens Have an end-to-end project cycle mindset Ensure needed engineering recourses available True collaboration among teams
  • 19. Data Science Family Career Path Skills Career Title Education If you are good at all three areas, you become a "full-stack" data scientist! Each career title has its own growth path has promotion cycle.

Editor's Notes

  1. At problem formulation stage, the biggest pitfall is “Solving the wrong problem”, it sounds funny, but it is sadly true. One of the main reason behind this problem is that, data scientist are not involved in the first place for the initial discussion. Over promise on business values in another big problem in problem formulation stage, often times, Data scientist’s data-driven and fact-base voice is not heard much behind this problem.
  2. At modeling stage, these pitfalls are familiar to statisticians, such as un-representative data and in the most times, the ideal data is not available for a specific business problem, but very biased data will definitely lead to model failure.