Polong Lin is a Data Scientist at IBM. He is a regular speaker on data science and develops content for free data education on bigdatauniversity.com using open data tools on datascientistworkbench.com. Polong earned his M.Sc. at the Univ. of Tsukuba.
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
Polong Lin(林伯龍)/how to approach data science problems from start to end
1. How to Approach
Data Science Problems
from Start to End
Polong Lin
Data Scientist
IBM Analytics, Emerging Technologies
@polonglin
@bigdatau
台灣資料科學年會
2. • Free online courses
• Data Science & Data Engineering
• A communityinitiative led by IBM
• Certificates and Badges
• > 450,000 users
What is Big Data University (BDU)?
9. • Every project begins with business understanding.
• What is the project objective?
• What are we trying to do – what is our goal?
1. Formulate a clear question
2. Define problem and solution requirements
9
1. Business
Understanding
Flight delays: Create some solution that can help
users predict if a flight on a given day will be
delayed or not delayed
1. Business understanding
15. Data Preparation typically includes:
• Data cleaning
• Merging data
• Transforming data
• Feature engineering
• Text analysis
15
6. Data preparation
6. Data
Preparation
Flights are classified as “delayed” if >15 min late.
• Delayed? [True or False]
Does time of day for departure predict delays?
• Hour
18. Modeling is a:
• Highly iterative process
• Multiple models may be used and tested
18
Modelling
Modeling
Using inputs:
• Year
• Month
• Day of Month
• Hour of departure
• Distance
• Destination airport
Predict:
Delay (True/False)
Logistic Regression
20. • Once finalized, the model is deployed into a production environment.
• May be in a limited / test environment until model is proven
• Involves additional groups, skills, and technologies
• Solution owner
• Marketing
• Application developers and designers
• IT administration
• Feedback to assess model performance
• Gathering and analysis of feedback for assessment
of the model’s performance and impact
• Iterative process for model refinement and redeployment
• Accelerate through automated processes
20
Deployment
Feedback
Prediction
Interpretation
Justification
Testing
Deployment and feedback