SlideShare une entreprise Scribd logo
1  sur  17
DSA – 105 Introduction to
Data Science
Week 3 – Steps involved in Data Science
Ferdin Joe John Joseph, PhD
Faculty of Information Technology
Thai-Nichi Institute of Technology
Week 3
Agenda
• Steps involved in Data Science
Faculty of Information Technology, Thai - Nichi Institute of
Technology
2
Process in Data Science Life Cycle (DSLC)
Faculty of Information Technology, Thai - Nichi Institute of
Technology
3
Faculty of Information Technology, Thai - Nichi Institute of
Technology
4
1. Business Understanding
Use data science to answer five types of questions:
• How much or how many? (regression)
• Which category? (classification)
• Which group? (clustering)
• Is this weird? (anomaly detection)
• Which option should be taken? (recommendation)
Faculty of Information Technology, Thai - Nichi Institute of
Technology
5
Data Mining
Decide on database usage
• Data Collection strategies and process
• Using of SQL queries
• Usage of dataframe packages like pandas
• Usage of JSON
• Usage of softwares to store and manage data
Faculty of Information Technology, Thai - Nichi Institute of
Technology
6
Data Cleaning
• Also known as “Data Janitor” work. The most important component.
• Cleaner the data, better the decisions.
• It consumes atleast 50% of the entire process.
• Eg. Manage the datatype of the values and convert wherever needed,
i.e. numerical values stored as integer or strings.
• Eg. Consistent format and spelling for categorical data.
‘Male’ or ‘male’
Faculty of Information Technology, Thai - Nichi Institute of
Technology
7
Data Exploration
• Brainstorming on what to do with ‘cleaned’ data
• Understand the bias and patterns in data
• Analyze a random subset of data and visualize them
• Look for anomalies and outliers in the data’s pattern
• Create hypotheses about data and problem on how the solution has
to be given
Faculty of Information Technology, Thai - Nichi Institute of
Technology
8
Feature Engineering
• A feature is a measurable property or attribute of a phenomenon
being observed.
• Feature engineering is the process of using domain knowledge to
transform your raw data into informative features that represent the
business problem you are trying to solve.
• There are 2 tasks in feature engineering
• Feature Selection
• Feature Construction
Faculty of Information Technology, Thai - Nichi Institute of
Technology
9
Feature Selection
• Feature selection is the process of cutting down the features that add
more noise than information.
• This avoids the complexity due to high-dimensional spaces
• It has three methods
• Filter methods (apply statistical measure to assign scoring to each feature)
• Wrapper methods (frame the selection of features as a search problem and
use a heuristic to perform the search)
• Embedded methods (use machine learning to figure out which features
contribute best to the accuracy)
Faculty of Information Technology, Thai - Nichi Institute of
Technology
10
Feature Construction
• Involves creating new features from the ones that is already available.
• For example, if you have a feature for age, but your model only cares
about if a person is an adult or minor, you could threshold it at 18,
and assign different categories to instances above and below that
threshold.
Faculty of Information Technology, Thai - Nichi Institute of
Technology
11
Predictive Modelling
• Predictive modeling is where the machine learning finally comes into
your data science project.
• Based on the questions you asked in the business understanding
stage, this is where you decide which model to pick for your problem.
• The model that you end up training will be dependent on the size,
type and quality of your data, how much time and computational
resources you are willing to invest, and the type of output you intend
to derive.
• Trained model needs to be evaluated for its accuracy using validation
techniques like k-fold cross validation.
Faculty of Information Technology, Thai - Nichi Institute of
Technology
12
Predictive Modeling
• Percentage of correct classification is used to measure the accuracy of
classification model
• ROC curves are plotted for true positive rate against false positive rate
• Coefficient of determination, Mean Square Error (MSE) and average
absolute error gives the correctness of regression models
Faculty of Information Technology, Thai - Nichi Institute of
Technology
13
Data Visualisation
• Combines the fields of communication, psychology, statistics, and art.
• Communicating the data in a simple yet effective and visually pleasing
way.
• Jupyter notebooks are having lot of packages for visualization. Eg
Matplotlib
• Drag n Drop tools like Tableau and Plotly
Faculty of Information Technology, Thai - Nichi Institute of
Technology
14
Goals of Data Science Process
• The goal of this process is to continue to move a data science project
forward towards a clear engagement end point.
• We recognize that data science is a research activity and that progress
often entails an approach that moves two steps forward and one step
(or worse) backwards.
• Being able to clearly communicate this to customers can help avoid
misunderstanding and frustration for all parties involved, and increase
the odds of success.
Faculty of Information Technology, Thai - Nichi Institute of
Technology
15
Activity
• Perform Data Science Process on Olympic medal tally for events post
WW2
Faculty of Information Technology, Thai - Nichi Institute of
Technology
16
• Tools and Technologies in Data Science
Faculty of Information Technology, Thai - Nichi Institute of
Technology
17

Contenu connexe

Tendances

An insight into Educational Data Mining at Muğla Sıtkı Koçman University, Turkey
An insight into Educational Data Mining at Muğla Sıtkı Koçman University, TurkeyAn insight into Educational Data Mining at Muğla Sıtkı Koçman University, Turkey
An insight into Educational Data Mining at Muğla Sıtkı Koçman University, Turkey
strehlst
 
Data mining to predict academic performance.
Data mining to predict academic performance. Data mining to predict academic performance.
Data mining to predict academic performance.
Ranjith Gowda
 
Predicting students performance using classification techniques in data mining
Predicting students performance using classification techniques in data miningPredicting students performance using classification techniques in data mining
Predicting students performance using classification techniques in data mining
Lovely Professional University
 
Evaluation of Data Mining Techniques for Predicting Student’s Performance
Evaluation of Data Mining Techniques for Predicting Student’s PerformanceEvaluation of Data Mining Techniques for Predicting Student’s Performance
Evaluation of Data Mining Techniques for Predicting Student’s Performance
Lovely Professional University
 
Methodology it capstone projet
Methodology it capstone projetMethodology it capstone projet
Methodology it capstone projet
june briones
 

Tendances (20)

An insight into Educational Data Mining at Muğla Sıtkı Koçman University, Turkey
An insight into Educational Data Mining at Muğla Sıtkı Koçman University, TurkeyAn insight into Educational Data Mining at Muğla Sıtkı Koçman University, Turkey
An insight into Educational Data Mining at Muğla Sıtkı Koçman University, Turkey
 
Education data mining presentation
Education data mining presentationEducation data mining presentation
Education data mining presentation
 
Data Mining Techniques for School Failure and Dropout System
Data Mining Techniques for School Failure and Dropout SystemData Mining Techniques for School Failure and Dropout System
Data Mining Techniques for School Failure and Dropout System
 
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVEDATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
 
Advances in Learning Analytics and Educational Data Mining
Advances in Learning Analytics and Educational Data Mining Advances in Learning Analytics and Educational Data Mining
Advances in Learning Analytics and Educational Data Mining
 
Data mining to predict academic performance.
Data mining to predict academic performance. Data mining to predict academic performance.
Data mining to predict academic performance.
 
Predicting students performance using classification techniques in data mining
Predicting students performance using classification techniques in data miningPredicting students performance using classification techniques in data mining
Predicting students performance using classification techniques in data mining
 
A Nobel Approach On Educational Data Mining
A Nobel Approach On Educational Data MiningA Nobel Approach On Educational Data Mining
A Nobel Approach On Educational Data Mining
 
Predicting student performance using aggregated data sources
Predicting student performance using aggregated data sourcesPredicting student performance using aggregated data sources
Predicting student performance using aggregated data sources
 
Evaluation of Data Mining Techniques for Predicting Student’s Performance
Evaluation of Data Mining Techniques for Predicting Student’s PerformanceEvaluation of Data Mining Techniques for Predicting Student’s Performance
Evaluation of Data Mining Techniques for Predicting Student’s Performance
 
A critical review of literature in the kenyan context
A critical review of literature in the kenyan contextA critical review of literature in the kenyan context
A critical review of literature in the kenyan context
 
Methodology it capstone projet
Methodology it capstone projetMethodology it capstone projet
Methodology it capstone projet
 
Academic e-learning presentation
Academic e-learning presentationAcademic e-learning presentation
Academic e-learning presentation
 
Slides for for JSS "Happy Hour": Aligning software engineering education with...
Slides for for JSS "Happy Hour": Aligning software engineering education with...Slides for for JSS "Happy Hour": Aligning software engineering education with...
Slides for for JSS "Happy Hour": Aligning software engineering education with...
 
The Architecture of System for Predicting Student Performance based on the Da...
The Architecture of System for Predicting Student Performance based on the Da...The Architecture of System for Predicting Student Performance based on the Da...
The Architecture of System for Predicting Student Performance based on the Da...
 
Clustering Students of Computer in Terms of Level of Programming
Clustering Students of Computer in Terms of Level of ProgrammingClustering Students of Computer in Terms of Level of Programming
Clustering Students of Computer in Terms of Level of Programming
 
IRJET- Evaluation Technique of Student Performance in various Courses
IRJET- Evaluation Technique of Student Performance in various CoursesIRJET- Evaluation Technique of Student Performance in various Courses
IRJET- Evaluation Technique of Student Performance in various Courses
 
Students academic performance using clustering technique
Students academic performance using clustering techniqueStudents academic performance using clustering technique
Students academic performance using clustering technique
 
Educational Data Mining & Students Performance Prediction using SVM Techniques
Educational Data Mining & Students Performance Prediction using SVM TechniquesEducational Data Mining & Students Performance Prediction using SVM Techniques
Educational Data Mining & Students Performance Prediction using SVM Techniques
 
Student Performance Evaluation in Education Sector Using Prediction and Clust...
Student Performance Evaluation in Education Sector Using Prediction and Clust...Student Performance Evaluation in Education Sector Using Prediction and Clust...
Student Performance Evaluation in Education Sector Using Prediction and Clust...
 

Similaire à 2019 DSA 105 Introduction to Data Science Week 3

Similaire à 2019 DSA 105 Introduction to Data Science Week 3 (20)

Introduction to Data Science - Week 3 - Steps involved in Data Science
Introduction to Data Science - Week 3 - Steps involved in Data ScienceIntroduction to Data Science - Week 3 - Steps involved in Data Science
Introduction to Data Science - Week 3 - Steps involved in Data Science
 
Data Science Training and Placement
Data Science Training and PlacementData Science Training and Placement
Data Science Training and Placement
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabad
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
data science training and placement
data science training and placementdata science training and placement
data science training and placement
 
online data science training
online data science trainingonline data science training
online data science training
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 
data science online training in hyderabad
data science online training in hyderabaddata science online training in hyderabad
data science online training in hyderabad
 
Best data science training in Hyderabad
Best data science training in HyderabadBest data science training in Hyderabad
Best data science training in Hyderabad
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
 
Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)
 
Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)
 
Data science training in hydpdf converted (1)
Data science training in hydpdf  converted (1)Data science training in hydpdf  converted (1)
Data science training in hydpdf converted (1)
 

Plus de Ferdin Joe John Joseph PhD

Plus de Ferdin Joe John Joseph PhD (20)

Invited Talk DGTiCon 2022
Invited Talk DGTiCon 2022Invited Talk DGTiCon 2022
Invited Talk DGTiCon 2022
 
Week 12: Cloud AI- DSA 441 Cloud Computing
Week 12: Cloud AI- DSA 441 Cloud ComputingWeek 12: Cloud AI- DSA 441 Cloud Computing
Week 12: Cloud AI- DSA 441 Cloud Computing
 
Week 11: Cloud Native- DSA 441 Cloud Computing
Week 11: Cloud Native- DSA 441 Cloud ComputingWeek 11: Cloud Native- DSA 441 Cloud Computing
Week 11: Cloud Native- DSA 441 Cloud Computing
 
Week 10: Cloud Security- DSA 441 Cloud Computing
Week 10: Cloud Security- DSA 441 Cloud ComputingWeek 10: Cloud Security- DSA 441 Cloud Computing
Week 10: Cloud Security- DSA 441 Cloud Computing
 
Week 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud Computing
Week 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud ComputingWeek 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud Computing
Week 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud Computing
 
Week 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud Computing
Week 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud ComputingWeek 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud Computing
Week 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud Computing
 
Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...
Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...
Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...
 
Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...
Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...
Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...
 
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud Computing
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud ComputingWeek 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud Computing
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud Computing
 
Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...
Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...
Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...
 
Week 2: Virtualization and VM Ware - DSA 441 Cloud Computing
Week 2: Virtualization and VM Ware - DSA 441 Cloud ComputingWeek 2: Virtualization and VM Ware - DSA 441 Cloud Computing
Week 2: Virtualization and VM Ware - DSA 441 Cloud Computing
 
Week 1: Introduction to Cloud Computing - DSA 441 Cloud Computing
Week 1: Introduction to Cloud Computing - DSA 441 Cloud ComputingWeek 1: Introduction to Cloud Computing - DSA 441 Cloud Computing
Week 1: Introduction to Cloud Computing - DSA 441 Cloud Computing
 
Sept 6 2021 BTech Artificial Intelligence and Data Science curriculum
Sept 6 2021 BTech Artificial Intelligence and Data Science curriculumSept 6 2021 BTech Artificial Intelligence and Data Science curriculum
Sept 6 2021 BTech Artificial Intelligence and Data Science curriculum
 
Hadoop in Alibaba Cloud
Hadoop in Alibaba CloudHadoop in Alibaba Cloud
Hadoop in Alibaba Cloud
 
Cloud Computing Essentials in Alibaba Cloud
Cloud Computing Essentials in Alibaba CloudCloud Computing Essentials in Alibaba Cloud
Cloud Computing Essentials in Alibaba Cloud
 
Transforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approachTransforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approach
 
Week 11: Programming for Data Analysis
Week 11: Programming for Data AnalysisWeek 11: Programming for Data Analysis
Week 11: Programming for Data Analysis
 
Week 10: Programming for Data Analysis
Week 10: Programming for Data AnalysisWeek 10: Programming for Data Analysis
Week 10: Programming for Data Analysis
 
Week 9: Programming for Data Analysis
Week 9: Programming for Data AnalysisWeek 9: Programming for Data Analysis
Week 9: Programming for Data Analysis
 
Week 8: Programming for Data Analysis
Week 8: Programming for Data AnalysisWeek 8: Programming for Data Analysis
Week 8: Programming for Data Analysis
 

Dernier

➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 

Dernier (20)

➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 

2019 DSA 105 Introduction to Data Science Week 3

  • 1. DSA – 105 Introduction to Data Science Week 3 – Steps involved in Data Science Ferdin Joe John Joseph, PhD Faculty of Information Technology Thai-Nichi Institute of Technology
  • 2. Week 3 Agenda • Steps involved in Data Science Faculty of Information Technology, Thai - Nichi Institute of Technology 2
  • 3. Process in Data Science Life Cycle (DSLC) Faculty of Information Technology, Thai - Nichi Institute of Technology 3
  • 4. Faculty of Information Technology, Thai - Nichi Institute of Technology 4
  • 5. 1. Business Understanding Use data science to answer five types of questions: • How much or how many? (regression) • Which category? (classification) • Which group? (clustering) • Is this weird? (anomaly detection) • Which option should be taken? (recommendation) Faculty of Information Technology, Thai - Nichi Institute of Technology 5
  • 6. Data Mining Decide on database usage • Data Collection strategies and process • Using of SQL queries • Usage of dataframe packages like pandas • Usage of JSON • Usage of softwares to store and manage data Faculty of Information Technology, Thai - Nichi Institute of Technology 6
  • 7. Data Cleaning • Also known as “Data Janitor” work. The most important component. • Cleaner the data, better the decisions. • It consumes atleast 50% of the entire process. • Eg. Manage the datatype of the values and convert wherever needed, i.e. numerical values stored as integer or strings. • Eg. Consistent format and spelling for categorical data. ‘Male’ or ‘male’ Faculty of Information Technology, Thai - Nichi Institute of Technology 7
  • 8. Data Exploration • Brainstorming on what to do with ‘cleaned’ data • Understand the bias and patterns in data • Analyze a random subset of data and visualize them • Look for anomalies and outliers in the data’s pattern • Create hypotheses about data and problem on how the solution has to be given Faculty of Information Technology, Thai - Nichi Institute of Technology 8
  • 9. Feature Engineering • A feature is a measurable property or attribute of a phenomenon being observed. • Feature engineering is the process of using domain knowledge to transform your raw data into informative features that represent the business problem you are trying to solve. • There are 2 tasks in feature engineering • Feature Selection • Feature Construction Faculty of Information Technology, Thai - Nichi Institute of Technology 9
  • 10. Feature Selection • Feature selection is the process of cutting down the features that add more noise than information. • This avoids the complexity due to high-dimensional spaces • It has three methods • Filter methods (apply statistical measure to assign scoring to each feature) • Wrapper methods (frame the selection of features as a search problem and use a heuristic to perform the search) • Embedded methods (use machine learning to figure out which features contribute best to the accuracy) Faculty of Information Technology, Thai - Nichi Institute of Technology 10
  • 11. Feature Construction • Involves creating new features from the ones that is already available. • For example, if you have a feature for age, but your model only cares about if a person is an adult or minor, you could threshold it at 18, and assign different categories to instances above and below that threshold. Faculty of Information Technology, Thai - Nichi Institute of Technology 11
  • 12. Predictive Modelling • Predictive modeling is where the machine learning finally comes into your data science project. • Based on the questions you asked in the business understanding stage, this is where you decide which model to pick for your problem. • The model that you end up training will be dependent on the size, type and quality of your data, how much time and computational resources you are willing to invest, and the type of output you intend to derive. • Trained model needs to be evaluated for its accuracy using validation techniques like k-fold cross validation. Faculty of Information Technology, Thai - Nichi Institute of Technology 12
  • 13. Predictive Modeling • Percentage of correct classification is used to measure the accuracy of classification model • ROC curves are plotted for true positive rate against false positive rate • Coefficient of determination, Mean Square Error (MSE) and average absolute error gives the correctness of regression models Faculty of Information Technology, Thai - Nichi Institute of Technology 13
  • 14. Data Visualisation • Combines the fields of communication, psychology, statistics, and art. • Communicating the data in a simple yet effective and visually pleasing way. • Jupyter notebooks are having lot of packages for visualization. Eg Matplotlib • Drag n Drop tools like Tableau and Plotly Faculty of Information Technology, Thai - Nichi Institute of Technology 14
  • 15. Goals of Data Science Process • The goal of this process is to continue to move a data science project forward towards a clear engagement end point. • We recognize that data science is a research activity and that progress often entails an approach that moves two steps forward and one step (or worse) backwards. • Being able to clearly communicate this to customers can help avoid misunderstanding and frustration for all parties involved, and increase the odds of success. Faculty of Information Technology, Thai - Nichi Institute of Technology 15
  • 16. Activity • Perform Data Science Process on Olympic medal tally for events post WW2 Faculty of Information Technology, Thai - Nichi Institute of Technology 16
  • 17. • Tools and Technologies in Data Science Faculty of Information Technology, Thai - Nichi Institute of Technology 17