SlideShare a Scribd company logo
1 of 24
Manoj Mishra
November 23, 2017
DataScience Lifecycle
 Introduction
 LifeCycle
 SkillTree
 Questions
AGENDA
DATA SCIENTIST
60%
19%
9%
7%
5%
Effort Organize & Clean Data
Collect data / Dataset
Data Mining to draw
pattern
Model Selection ,
training and refining
Other Tasks
Data
Acquisition
Data
Preparation
Hypothesis &
Modelling
DATA SCIENCE LIFE CYCLE
Evaluation &
Interpretation
DeploymentOperations Optimization
Business Understanding
DATA ACQUISITION
Static
• Feedback system
• CSV Data sets / text files
Live
• Logs data, memory dumps
• Sensors, controllers etc.
Virtual
• Data Virtualization
• Caching , Storing
DATA SAMPLE DATASET INVENTORY
PROJECT - PREDICTING FAILURE – PROACTIVE
MAINTENANCE
• Baseline normal operational
patterns by modelling the
unstructured Log data.
• Use Domain Experts to identify
patterns before failures.
• Use statistical measurements &
Machine Learning to determine
threshold.
• Identify patterns of activity to
anticipate and react to
circumstances that might
otherwise disrupt operations
SME -
Domain
DATA PREPARATION
• Need for Data Preparation
• Bad data or poor quality data can alter
accuracy & lead to incorrect Insights
• Gartner- Poor quality data costs an avg.
organization $13.5M / year.
• Dataset might contain discrepancies in the
names or codes.
• Dataset might contain outliers or errors.
• Dataset lacks your attributes of interest for
analysis.
• All in all the dataset is not qualitative but is
just quantitative.
• Steps Involved
DATA PREPARATION
• Includes steps to explore, preprocess, and condition data
• Create robust environment – analytics sandbox
• Data preparation tends to be the most labor-intensive step in the
analytics lifecycle
• Often at least 50 – 60% of the data science project’s time
• The data preparation phase is generally the most iterative and the one
that data scientists tend to underestimate most often 
Database :
Understand the data
Understand the Business
Airlines :
NYC FLIGHTS 13
e.g. tailnum :
A tail number refers to an
identification number painte
d on an aircraft, frequently
on the tail
Goal : Predicting
flight delays Modelling
PREDICTING FLIGHT DELAYS - NYC FLIGHTS 13
• Exploratory Data Analysis
of the flight data for
inbound and outbound
flights for year 2013 in
NYC.
• Find patterns, benchmark,
model and find
predictors.
• To predict the flight
delays for NYC Inbound/
Outbound flights.
• Ref. DataSet
PREDICTING FLIGHT DELAYS - NYC FLIGHTS 13
• Exploratory Data Analysis
of the flight data for
inbound and outbound
flights for year 2013 in
NYC.
• Find patterns, benchmark,
model and find
predictors.
• To predict the flight
delays for NYC Inbound/
Outbound flights.
• Ref. DataSet
OBSERVATIONS - NYC FLIGHTS 13
REF. DATASET
COMMON TOOLS - FOR DATA
PREPARATION
• Alpine Miner provides a graphical user interface for creating analytic
workflows
• OpenRefine (formerly Google Refine) is a free, open source tool for
working with messy data
• Similar to OpenRefine, Data Wrangler is an interactive tool for data
cleansing an transformation
• Alteryx and Informatica also can be tried.
HYPOTHESIS & MODELLING
• There are three main tasks addressed in this stage:
• Feature engineering: Create data features from the raw data to
facilitate model training.
• Model training: Find the model that answers the question most
accurately by comparing their success metrics.
• Determine if your model is suitable for production.
FEATURE SELECTION & ENGINEERING
Select
Features
Research
feature
relevance
Experiment
& Validate
Change the
feature set
if required
Go back to
feature
selection
FEATURE ENGINEERING
Date # footfalls in Dubai Mall
01/07/2017 124532
02/07/2017 65434
03/07/2017 12333
04/07/2017 60009
05/07/2017 46567
06/07/2017 98001
07/07/2017 146543
08/07/2017 112345
09/07/2017 76543
Date # footfalls in Dubai Mall IsHoliday?
01/07/2017 124532Yes
02/07/2017 65434No
03/07/2017 12333No
04/07/2017 60009No
05/07/2017 46567No
06/07/2017 98001No
07/07/2017 146543yes
08/07/2017 112345yes
09/07/2017 76543No
FEATURE ENGINEERING
Seasonality (holiday season):
Jun, Jul & Dec account for the
highest avg. delays
MODELLING
CREATE YOUR MODEL & EVALUATE
• Split the input data randomly for modeling into a training data set and a test data
set.
• Build the models by using the training data set.
• Evaluate the training and the test data set. Use a series of competing machine-
learning algorithms along with the various associated tuning parameters (known as
a parameter sweep) that are geared toward answering the question of interest with
the current data.
• Determine the “best” solution to answer the question by comparing the success
metrics between alternative methods.
CREATE YOUR MODEL & EVALUATE
• Supervised Learning
• Naive Bayes
• KNN
• Support Vector Machines
(SVM)
• Linear Regression
• Unsupervised Learning
• Principal Component Analysis.
• K Means
• Classification Metrics
• Accuracy Score
• Classification Report
• Confusion Matrix
• Regression Metrics
• Mean Absolute Error.
• Mean Squared Error
• R2 Score
• Clustering Metrics
• Adjusted Rand Index.
• Homogeneity
• V - measure
DEPLOYMENT
After you have a set of models that perform well, you can operationalize
them for other applications through APIs or other interface to consume
from various applications, such as:
• Online websites
• Spreadsheets
• Dashboards
• Line-of-business applications
• Back-end applications
THANK YOU !

More Related Content

What's hot

Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualizationDr. Hamdan Al-Sabri
 
Lecture2 big data life cycle
Lecture2 big data life cycleLecture2 big data life cycle
Lecture2 big data life cyclehktripathy
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data miningHadi Fadlallah
 
DATA WRANGLING presentation.pptx
DATA WRANGLING presentation.pptxDATA WRANGLING presentation.pptx
DATA WRANGLING presentation.pptxAbdullahAbbasi55
 
Data Science Project Lifecycle
Data Science Project LifecycleData Science Project Lifecycle
Data Science Project LifecycleJason Geng
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecasesSreenatha Reddy K R
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and predictionDataminingTools Inc
 
Data Modeling PPT
Data Modeling PPTData Modeling PPT
Data Modeling PPTTrinath
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notesMohit Saini
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classificationKrish_ver2
 
Data preparation
Data preparationData preparation
Data preparationTony Nguyen
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysisDataminingTools Inc
 

What's hot (20)

OLAP
OLAPOLAP
OLAP
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualization
 
Lecture2 big data life cycle
Lecture2 big data life cycleLecture2 big data life cycle
Lecture2 big data life cycle
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
 
DATA WRANGLING presentation.pptx
DATA WRANGLING presentation.pptxDATA WRANGLING presentation.pptx
DATA WRANGLING presentation.pptx
 
Lecture13 - Association Rules
Lecture13 - Association RulesLecture13 - Association Rules
Lecture13 - Association Rules
 
Data Science Project Lifecycle
Data Science Project LifecycleData Science Project Lifecycle
Data Science Project Lifecycle
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
Data Modeling PPT
Data Modeling PPTData Modeling PPT
Data Modeling PPT
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
 
Data preparation
Data preparationData preparation
Data preparation
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Ppt
PptPpt
Ppt
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysis
 

Similar to Data science life cycle

Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at NationwideDeploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at NationwideDatabricks
 
Data science workflow v1.1
Data science workflow v1.1Data science workflow v1.1
Data science workflow v1.1Jessie_N
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptxXanGwaps
 
So your boss says you need to learn data science
So your boss says you need to learn data scienceSo your boss says you need to learn data science
So your boss says you need to learn data scienceSusan Ibach
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Denodo
 
Integrating Azure Machine Learning and Predictive Analytics with SharePoint O...
Integrating Azure Machine Learning and Predictive Analytics with SharePoint O...Integrating Azure Machine Learning and Predictive Analytics with SharePoint O...
Integrating Azure Machine Learning and Predictive Analytics with SharePoint O...Bhakthi Liyanage
 
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...Databricks
 
Fractional Chief AI Officer Services For Hire
Fractional Chief AI Officer Services For HireFractional Chief AI Officer Services For Hire
Fractional Chief AI Officer Services For HireValue Amplify Consulting
 
Project Management Careers in Data Science
Project Management Careers in Data ScienceProject Management Careers in Data Science
Project Management Careers in Data ScienceGanes Kesari
 
Lecture 10 - DataMiningEngineering.ppt
Lecture 10 - DataMiningEngineering.pptLecture 10 - DataMiningEngineering.ppt
Lecture 10 - DataMiningEngineering.pptAsadkhan47384
 
What is the Value of SAS Analytics?
What is the Value of SAS Analytics?What is the Value of SAS Analytics?
What is the Value of SAS Analytics?SAS Canada
 
Mindfull - The Power of Predictive
Mindfull - The Power of PredictiveMindfull - The Power of Predictive
Mindfull - The Power of PredictiveMindfull_NZ
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureMark Kromer
 
Analytics in Your Enterprise
Analytics in Your EnterpriseAnalytics in Your Enterprise
Analytics in Your EnterpriseWSO2
 
The machine learning process: From ideation to deployment with Azure Machine ...
The machine learning process: From ideation to deployment with Azure Machine ...The machine learning process: From ideation to deployment with Azure Machine ...
The machine learning process: From ideation to deployment with Azure Machine ...Francesca Lazzeri, PhD
 
Audience Measurement: Nielsen Online vs Google Analytics
Audience Measurement: Nielsen Online vs Google AnalyticsAudience Measurement: Nielsen Online vs Google Analytics
Audience Measurement: Nielsen Online vs Google AnalyticsPrefix Technologies
 
Data kitchen 7 agile steps - big data fest 9-18-2015
Data kitchen   7 agile steps - big data fest 9-18-2015Data kitchen   7 agile steps - big data fest 9-18-2015
Data kitchen 7 agile steps - big data fest 9-18-2015DataKitchen
 

Similar to Data science life cycle (20)

Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at NationwideDeploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
 
Data science workflow v1.1
Data science workflow v1.1Data science workflow v1.1
Data science workflow v1.1
 
big-data-anallytics.pptx
big-data-anallytics.pptxbig-data-anallytics.pptx
big-data-anallytics.pptx
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
 
Internship Presentation.pdf
Internship Presentation.pdfInternship Presentation.pdf
Internship Presentation.pdf
 
So your boss says you need to learn data science
So your boss says you need to learn data scienceSo your boss says you need to learn data science
So your boss says you need to learn data science
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
Integrating Azure Machine Learning and Predictive Analytics with SharePoint O...
Integrating Azure Machine Learning and Predictive Analytics with SharePoint O...Integrating Azure Machine Learning and Predictive Analytics with SharePoint O...
Integrating Azure Machine Learning and Predictive Analytics with SharePoint O...
 
Vadlamudi saketh30 (ml)
Vadlamudi saketh30 (ml)Vadlamudi saketh30 (ml)
Vadlamudi saketh30 (ml)
 
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
 
Fractional Chief AI Officer Services For Hire
Fractional Chief AI Officer Services For HireFractional Chief AI Officer Services For Hire
Fractional Chief AI Officer Services For Hire
 
Project Management Careers in Data Science
Project Management Careers in Data ScienceProject Management Careers in Data Science
Project Management Careers in Data Science
 
Lecture 10 - DataMiningEngineering.ppt
Lecture 10 - DataMiningEngineering.pptLecture 10 - DataMiningEngineering.ppt
Lecture 10 - DataMiningEngineering.ppt
 
What is the Value of SAS Analytics?
What is the Value of SAS Analytics?What is the Value of SAS Analytics?
What is the Value of SAS Analytics?
 
Mindfull - The Power of Predictive
Mindfull - The Power of PredictiveMindfull - The Power of Predictive
Mindfull - The Power of Predictive
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
Analytics in Your Enterprise
Analytics in Your EnterpriseAnalytics in Your Enterprise
Analytics in Your Enterprise
 
The machine learning process: From ideation to deployment with Azure Machine ...
The machine learning process: From ideation to deployment with Azure Machine ...The machine learning process: From ideation to deployment with Azure Machine ...
The machine learning process: From ideation to deployment with Azure Machine ...
 
Audience Measurement: Nielsen Online vs Google Analytics
Audience Measurement: Nielsen Online vs Google AnalyticsAudience Measurement: Nielsen Online vs Google Analytics
Audience Measurement: Nielsen Online vs Google Analytics
 
Data kitchen 7 agile steps - big data fest 9-18-2015
Data kitchen   7 agile steps - big data fest 9-18-2015Data kitchen   7 agile steps - big data fest 9-18-2015
Data kitchen 7 agile steps - big data fest 9-18-2015
 

Recently uploaded

➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 

Recently uploaded (20)

➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 

Data science life cycle

  • 1. Manoj Mishra November 23, 2017 DataScience Lifecycle
  • 2.  Introduction  LifeCycle  SkillTree  Questions AGENDA
  • 3. DATA SCIENTIST 60% 19% 9% 7% 5% Effort Organize & Clean Data Collect data / Dataset Data Mining to draw pattern Model Selection , training and refining Other Tasks
  • 4. Data Acquisition Data Preparation Hypothesis & Modelling DATA SCIENCE LIFE CYCLE Evaluation & Interpretation DeploymentOperations Optimization Business Understanding
  • 5. DATA ACQUISITION Static • Feedback system • CSV Data sets / text files Live • Logs data, memory dumps • Sensors, controllers etc. Virtual • Data Virtualization • Caching , Storing
  • 7. PROJECT - PREDICTING FAILURE – PROACTIVE MAINTENANCE • Baseline normal operational patterns by modelling the unstructured Log data. • Use Domain Experts to identify patterns before failures. • Use statistical measurements & Machine Learning to determine threshold. • Identify patterns of activity to anticipate and react to circumstances that might otherwise disrupt operations SME - Domain
  • 8. DATA PREPARATION • Need for Data Preparation • Bad data or poor quality data can alter accuracy & lead to incorrect Insights • Gartner- Poor quality data costs an avg. organization $13.5M / year. • Dataset might contain discrepancies in the names or codes. • Dataset might contain outliers or errors. • Dataset lacks your attributes of interest for analysis. • All in all the dataset is not qualitative but is just quantitative. • Steps Involved
  • 9. DATA PREPARATION • Includes steps to explore, preprocess, and condition data • Create robust environment – analytics sandbox • Data preparation tends to be the most labor-intensive step in the analytics lifecycle • Often at least 50 – 60% of the data science project’s time • The data preparation phase is generally the most iterative and the one that data scientists tend to underestimate most often 
  • 10. Database : Understand the data Understand the Business Airlines : NYC FLIGHTS 13 e.g. tailnum : A tail number refers to an identification number painte d on an aircraft, frequently on the tail Goal : Predicting flight delays Modelling
  • 11. PREDICTING FLIGHT DELAYS - NYC FLIGHTS 13 • Exploratory Data Analysis of the flight data for inbound and outbound flights for year 2013 in NYC. • Find patterns, benchmark, model and find predictors. • To predict the flight delays for NYC Inbound/ Outbound flights. • Ref. DataSet
  • 12. PREDICTING FLIGHT DELAYS - NYC FLIGHTS 13 • Exploratory Data Analysis of the flight data for inbound and outbound flights for year 2013 in NYC. • Find patterns, benchmark, model and find predictors. • To predict the flight delays for NYC Inbound/ Outbound flights. • Ref. DataSet
  • 13. OBSERVATIONS - NYC FLIGHTS 13 REF. DATASET
  • 14. COMMON TOOLS - FOR DATA PREPARATION • Alpine Miner provides a graphical user interface for creating analytic workflows • OpenRefine (formerly Google Refine) is a free, open source tool for working with messy data • Similar to OpenRefine, Data Wrangler is an interactive tool for data cleansing an transformation • Alteryx and Informatica also can be tried.
  • 15. HYPOTHESIS & MODELLING • There are three main tasks addressed in this stage: • Feature engineering: Create data features from the raw data to facilitate model training. • Model training: Find the model that answers the question most accurately by comparing their success metrics. • Determine if your model is suitable for production.
  • 16. FEATURE SELECTION & ENGINEERING Select Features Research feature relevance Experiment & Validate Change the feature set if required Go back to feature selection
  • 17. FEATURE ENGINEERING Date # footfalls in Dubai Mall 01/07/2017 124532 02/07/2017 65434 03/07/2017 12333 04/07/2017 60009 05/07/2017 46567 06/07/2017 98001 07/07/2017 146543 08/07/2017 112345 09/07/2017 76543 Date # footfalls in Dubai Mall IsHoliday? 01/07/2017 124532Yes 02/07/2017 65434No 03/07/2017 12333No 04/07/2017 60009No 05/07/2017 46567No 06/07/2017 98001No 07/07/2017 146543yes 08/07/2017 112345yes 09/07/2017 76543No
  • 18. FEATURE ENGINEERING Seasonality (holiday season): Jun, Jul & Dec account for the highest avg. delays
  • 20. CREATE YOUR MODEL & EVALUATE • Split the input data randomly for modeling into a training data set and a test data set. • Build the models by using the training data set. • Evaluate the training and the test data set. Use a series of competing machine- learning algorithms along with the various associated tuning parameters (known as a parameter sweep) that are geared toward answering the question of interest with the current data. • Determine the “best” solution to answer the question by comparing the success metrics between alternative methods.
  • 21. CREATE YOUR MODEL & EVALUATE • Supervised Learning • Naive Bayes • KNN • Support Vector Machines (SVM) • Linear Regression • Unsupervised Learning • Principal Component Analysis. • K Means • Classification Metrics • Accuracy Score • Classification Report • Confusion Matrix • Regression Metrics • Mean Absolute Error. • Mean Squared Error • R2 Score • Clustering Metrics • Adjusted Rand Index. • Homogeneity • V - measure
  • 22. DEPLOYMENT After you have a set of models that perform well, you can operationalize them for other applications through APIs or other interface to consume from various applications, such as: • Online websites • Spreadsheets • Dashboards • Line-of-business applications • Back-end applications
  • 23.