SlideShare une entreprise Scribd logo
1  sur  16
Télécharger pour lire hors ligne
DATA SCIENCE
PROJECT
METHODOLOGY
Sergey Shelpuk
sergey@shelpuk.com
DATA
SCIENCE
IS ALL
ABOUT
BUSINESS
TOP BIG DATA CHALLENGES
0 10 20 30 40 50 60
Determining how to get value from Big Data
Defining our strategy
Obtaining skills and capabilities needed
Integrating multiple data sources
Infrastructure and/or architecture
Risk and governance issues
Funding for Big Data related initiatives
Understanding what is Big Data
Leadership or organizational issues
Other
Top challenge Second Third© Gartner
METHODOLOGY IS A KEY TO
SUCCESS
Cross-Industry Standard Process for Data Mining (CRISP-DM)
BUSINESS
UNDERSTANDING
Determining Business Objectives
1. Gather background information
 Compiling the business background
 Defining business objectives
 Business success criteria
2. Assessing the situation
 Resource Inventory
 Requirements, Assumptions, and Constraints
 Risks and Contingencies
 Cost/Benefit Analysis
4. Determining data science goals
 Data science goals
 Data science success criteria
4. Producing a Project Plan
© IBM
EXAMPLE OF THE
PROJECT PLAN
Phase Time Resources Risks
Business
understanding
1 week All analysts Economic change
Data
understanding
3 weeks All analysts Data problems, technology
problems
Data preparation 5 weeks Data scientists, DB
engineers
Data problems, technology
problems
Modeling 2 weeks Data scientists Technology problems, inability
to build adequate model
Evaluation 1 week All analysts Economic change, inability to
implement results
Deployment 1 week Data scientist, DB
engineers,
implementation
team
Economic change, inability to
implement results
© IBM
READY FOR THE
DATA UNDERSTANDING?
From a business perspective:
 What does your business hope to gain from this project?
 How will you define the successful completion of our efforts?
 Do you have the budget and resources needed to reach our goals?
 Do you have access to all the data needed for this project?
 Have you and your team discussed the risks and contingencies associated with
this project?
 Do the results of your cost/benefit analysis make this project worthwhile?
From a data science perspective:
 How specifically can data mining help you meet your business goals?
 Do you have an idea about which data mining techniques might produce the best
results?
 How will you know when your results are accurate or effective enough? (Have we
set a measurement of data mining success?)
 How will the modeling results be deployed? Have you considered deployment in
your project plan?
 Does the project plan include all phases of CRISP-DM?
 Are risks and dependencies called out in the plan?© IBM
DATA
UNDERSTANDING
© IBM
1. Collect initial data
 Existing data
 Purchased data
 Additional data
2. Describe data
 Amount of data
 Value types
 Coding schemes
3. Explore data
4. Verify data quality
 Missing data
 Data errors
 Coding inconsistencies
 Bad metadata
READY FOR THE
DATA PREPARATION?
 Are all data sources clearly identified and accessed? Are you aware of
any problems or restrictions?
 Have you identified key attributes from the available data?
 Did these attributes help you to formulate hypotheses?
 Have you noted the size of all data sources?
 Are you able to use a subset of data where appropriate?
 Have you computed basic statistics for each attribute of interest? Did
meaningful information emerge?
 Did you use exploratory graphics to gain further insight into key
attributes? Did this insight reshape any of your hypotheses?
 What are the data quality issues for this project? Do you have a plan to
address these issues?
 Are the data preparation steps clear? For instance, do you know which
data sources to merge and which attributes to filter or select?
© IBM
DATA
PREPARATION
© IBM
1. Select right data
 Select training examples
 Select features
2. Clean data
 Fill in missed data
 Correct data errors
 Make coding consistent
2. Extend data
 Extend training examples
 Extend features
2. Format data
 Put data in a format for training the model
READY FOR THE
MODELING?
 Based upon your initial exploration and understanding, were you able to
select relevant subsets of data?
 Have you cleaned the data effectively or removed unsalvageable items?
Document any decisions in the final report.
 Are multiple data sets integrated properly? Were there any merging
problems that should be documented?
 Have you researched the requirements of the modeling tools that you
plan to use?
 Are there any formatting issues you can address before modeling? This
includes both required formatting concerns as well as tasks that may
reduce modeling time.
© IBM
MODELING
© IBM
1. Select modeling techniques
 Select data types available for analysis
 Select an algorithm or a model
Define modeling goals
 State specific modeling requirements
2. Build the model
 Set up hyperparameters
 Train the model
 Describe the result
3. Assess the model
READY FOR THE
EVALUATION?
 Are you able to understand the results of the models?
 Do the model results make sense to you from a purely logical
perspective? Are there apparent inconsistencies that need further
exploration?
 From your initial glance, do the results seem to address your
organization’s business question?
 Have you used analysis nodes and lift or gains charts to compare and
evaluate model accuracy?
 Have you explored more than one type of model and compared the
results?
 Are the results of your model deployable?
© IBM
EVALUATION
© IBM
1. Evaluate the results
 Are results presented clearly?
 Are there any novel findings?
 Can models and findings be applicable to business
goals?
 How well do the models and findings answer business
goals?
 What additional questions the modeling results have
risen?
2. Review the process
 Did the stage contribute to the value of the results?
 What went wrong and how it can be fixed?
 Are there alternative decisions which could have been
executed?
2. Determine the next steps
DEPLOYMENT
© IBM
1. Planning for deployment
 Summarize models and findings
 For each model create a deployment plan
 Identify any deployment problems and plan for
contingencies
2. Plan monitoring and maintenance
 Identify models and findings which require support
 How can the accuracy and validity be evaluated?
 How will you determine that a model has expired?
 What to do with the expired models?
2. Conduct a final project review
THANK YOU

Contenu connexe

Tendances

Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Databricks
 
CRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining ProjectsCRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining ProjectsMichał Łopuszyński
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with PythonDavis David
 
Data science life cycle
Data science life cycleData science life cycle
Data science life cycleManoj Mishra
 
The rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computingThe rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computingMinhazul Arefin
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining Sushil Kulkarni
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Tableau slideshare
Tableau slideshareTableau slideshare
Tableau slideshareSakshi Jain
 
Managing Data Integration Initiatives
Managing Data Integration InitiativesManaging Data Integration Initiatives
Managing Data Integration InitiativesAllinConsulting
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.pptneelamoberoi1030
 
Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Adrien Blind
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analyticsUmasree Raghunath
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 

Tendances (20)

Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
 
CRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining ProjectsCRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining Projects
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with Python
 
Data science life cycle
Data science life cycleData science life cycle
Data science life cycle
 
Data Analytics Life Cycle
Data Analytics Life CycleData Analytics Life Cycle
Data Analytics Life Cycle
 
The rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computingThe rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computing
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Real time analytics
Real time analyticsReal time analytics
Real time analytics
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Tableau slideshare
Tableau slideshareTableau slideshare
Tableau slideshare
 
Managing Data Integration Initiatives
Managing Data Integration InitiativesManaging Data Integration Initiatives
Managing Data Integration Initiatives
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.ppt
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Big_data_ppt
Big_data_ppt Big_data_ppt
Big_data_ppt
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
 
Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)
 
Data Engineering Basics
Data Engineering BasicsData Engineering Basics
Data Engineering Basics
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analytics
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 

Similaire à CRISP-DM: a data science project methodology

Group 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptxGroup 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptxellamangapis2003
 
Santander's Data Transformation
Santander's Data TransformationSantander's Data Transformation
Santander's Data TransformationUmran Rafi
 
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
[DSC Europe 22] The Making of a Data Organization - Denys HolovatyiDataScienceConferenc1
 
DAT 520 Final Project Guidelines and Rubric Overview .docx
DAT 520 Final Project Guidelines and Rubric  Overview .docxDAT 520 Final Project Guidelines and Rubric  Overview .docx
DAT 520 Final Project Guidelines and Rubric Overview .docxsimonithomas47935
 
Agile Mumbai 2022 - Ashwinee Singh | Agile in AI or AI in Agile?
Agile Mumbai 2022 - Ashwinee Singh | Agile in AI or AI in Agile?Agile Mumbai 2022 - Ashwinee Singh | Agile in AI or AI in Agile?
Agile Mumbai 2022 - Ashwinee Singh | Agile in AI or AI in Agile?AgileNetwork
 
Data Science.pdf
Data Science.pdfData Science.pdf
Data Science.pdfWinduGata3
 
Best practice for_agile_ds_projects
Best practice for_agile_ds_projectsBest practice for_agile_ds_projects
Best practice for_agile_ds_projectsKhalid Kahloot
 
The Value of Predictive Analytics and Decision Modeling
The Value of Predictive Analytics and Decision ModelingThe Value of Predictive Analytics and Decision Modeling
The Value of Predictive Analytics and Decision ModelingDecision Management Solutions
 
413451520-8-Steps-Successful-Enterprise-Data-Manag.pdf
413451520-8-Steps-Successful-Enterprise-Data-Manag.pdf413451520-8-Steps-Successful-Enterprise-Data-Manag.pdf
413451520-8-Steps-Successful-Enterprise-Data-Manag.pdfIsmailCassiem
 
Doing Analytics Right - Selecting Analytics
Doing Analytics Right - Selecting AnalyticsDoing Analytics Right - Selecting Analytics
Doing Analytics Right - Selecting AnalyticsTasktop
 
Technical Documentation 101 for Data Engineers.pdf
Technical Documentation 101 for Data Engineers.pdfTechnical Documentation 101 for Data Engineers.pdf
Technical Documentation 101 for Data Engineers.pdfShristi Shrestha
 
Egypt hackathon 2014 analytics & spss session
Egypt hackathon 2014   analytics & spss sessionEgypt hackathon 2014   analytics & spss session
Egypt hackathon 2014 analytics & spss sessionM Baddar
 
Sfeldman bbworld 07_going_enterprise (1)
Sfeldman bbworld 07_going_enterprise (1)Sfeldman bbworld 07_going_enterprise (1)
Sfeldman bbworld 07_going_enterprise (1)Steve Feldman
 
SAP Applications and the Modern Data Scientist - Predictive Analytics for the...
SAP Applications and the Modern Data Scientist - Predictive Analytics for the...SAP Applications and the Modern Data Scientist - Predictive Analytics for the...
SAP Applications and the Modern Data Scientist - Predictive Analytics for the...Dickinson + Associates
 

Similaire à CRISP-DM: a data science project methodology (20)

Big data@work
Big data@workBig data@work
Big data@work
 
Group 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptxGroup 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptx
 
Analytics
AnalyticsAnalytics
Analytics
 
Santander's Data Transformation
Santander's Data TransformationSantander's Data Transformation
Santander's Data Transformation
 
Data integration my_experience
Data integration my_experienceData integration my_experience
Data integration my_experience
 
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
 
DAT 520 Final Project Guidelines and Rubric Overview .docx
DAT 520 Final Project Guidelines and Rubric  Overview .docxDAT 520 Final Project Guidelines and Rubric  Overview .docx
DAT 520 Final Project Guidelines and Rubric Overview .docx
 
Agile Mumbai 2022 - Ashwinee Singh | Agile in AI or AI in Agile?
Agile Mumbai 2022 - Ashwinee Singh | Agile in AI or AI in Agile?Agile Mumbai 2022 - Ashwinee Singh | Agile in AI or AI in Agile?
Agile Mumbai 2022 - Ashwinee Singh | Agile in AI or AI in Agile?
 
Demystifying ML/AI
Demystifying ML/AIDemystifying ML/AI
Demystifying ML/AI
 
Data Science.pdf
Data Science.pdfData Science.pdf
Data Science.pdf
 
Best practice for_agile_ds_projects
Best practice for_agile_ds_projectsBest practice for_agile_ds_projects
Best practice for_agile_ds_projects
 
Data mining
Data miningData mining
Data mining
 
The Value of Predictive Analytics and Decision Modeling
The Value of Predictive Analytics and Decision ModelingThe Value of Predictive Analytics and Decision Modeling
The Value of Predictive Analytics and Decision Modeling
 
413451520-8-Steps-Successful-Enterprise-Data-Manag.pdf
413451520-8-Steps-Successful-Enterprise-Data-Manag.pdf413451520-8-Steps-Successful-Enterprise-Data-Manag.pdf
413451520-8-Steps-Successful-Enterprise-Data-Manag.pdf
 
Doing Analytics Right - Selecting Analytics
Doing Analytics Right - Selecting AnalyticsDoing Analytics Right - Selecting Analytics
Doing Analytics Right - Selecting Analytics
 
Technical Documentation 101 for Data Engineers.pdf
Technical Documentation 101 for Data Engineers.pdfTechnical Documentation 101 for Data Engineers.pdf
Technical Documentation 101 for Data Engineers.pdf
 
Egypt hackathon 2014 analytics & spss session
Egypt hackathon 2014   analytics & spss sessionEgypt hackathon 2014   analytics & spss session
Egypt hackathon 2014 analytics & spss session
 
Sfeldman bbworld 07_going_enterprise (1)
Sfeldman bbworld 07_going_enterprise (1)Sfeldman bbworld 07_going_enterprise (1)
Sfeldman bbworld 07_going_enterprise (1)
 
Big data and Hadoop Training Brochure
Big data and Hadoop Training Brochure Big data and Hadoop Training Brochure
Big data and Hadoop Training Brochure
 
SAP Applications and the Modern Data Scientist - Predictive Analytics for the...
SAP Applications and the Modern Data Scientist - Predictive Analytics for the...SAP Applications and the Modern Data Scientist - Predictive Analytics for the...
SAP Applications and the Modern Data Scientist - Predictive Analytics for the...
 

Plus de Sergey Shelpuk

Data science: A New Profession in IT
Data science: A New Profession in ITData science: A New Profession in IT
Data science: A New Profession in ITSergey Shelpuk
 
Machine Learning: Advanced Topics Overview
Machine Learning: Advanced Topics OverviewMachine Learning: Advanced Topics Overview
Machine Learning: Advanced Topics OverviewSergey Shelpuk
 
Artificial intelligence 2015: Quo Vadis?
Artificial intelligence 2015: Quo Vadis?Artificial intelligence 2015: Quo Vadis?
Artificial intelligence 2015: Quo Vadis?Sergey Shelpuk
 
Machine learning intro
Machine learning introMachine learning intro
Machine learning introSergey Shelpuk
 
How to take over the world with artificial intelligence final
How to take over the world with artificial intelligence finalHow to take over the world with artificial intelligence final
How to take over the world with artificial intelligence finalSergey Shelpuk
 
Object similarity with office laptop
Object similarity with office laptopObject similarity with office laptop
Object similarity with office laptopSergey Shelpuk
 

Plus de Sergey Shelpuk (8)

Data science: A New Profession in IT
Data science: A New Profession in ITData science: A New Profession in IT
Data science: A New Profession in IT
 
Buzzword scheme
Buzzword schemeBuzzword scheme
Buzzword scheme
 
Machine Learning: Advanced Topics Overview
Machine Learning: Advanced Topics OverviewMachine Learning: Advanced Topics Overview
Machine Learning: Advanced Topics Overview
 
Artificial intelligence 2015: Quo Vadis?
Artificial intelligence 2015: Quo Vadis?Artificial intelligence 2015: Quo Vadis?
Artificial intelligence 2015: Quo Vadis?
 
Machine learning intro
Machine learning introMachine learning intro
Machine learning intro
 
How to take over the world with artificial intelligence final
How to take over the world with artificial intelligence finalHow to take over the world with artificial intelligence final
How to take over the world with artificial intelligence final
 
Object similarity with office laptop
Object similarity with office laptopObject similarity with office laptop
Object similarity with office laptop
 
Data science for HR
Data science for HRData science for HR
Data science for HR
 

Dernier

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 

Dernier (20)

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 

CRISP-DM: a data science project methodology

  • 3. TOP BIG DATA CHALLENGES 0 10 20 30 40 50 60 Determining how to get value from Big Data Defining our strategy Obtaining skills and capabilities needed Integrating multiple data sources Infrastructure and/or architecture Risk and governance issues Funding for Big Data related initiatives Understanding what is Big Data Leadership or organizational issues Other Top challenge Second Third© Gartner
  • 4. METHODOLOGY IS A KEY TO SUCCESS Cross-Industry Standard Process for Data Mining (CRISP-DM)
  • 5. BUSINESS UNDERSTANDING Determining Business Objectives 1. Gather background information  Compiling the business background  Defining business objectives  Business success criteria 2. Assessing the situation  Resource Inventory  Requirements, Assumptions, and Constraints  Risks and Contingencies  Cost/Benefit Analysis 4. Determining data science goals  Data science goals  Data science success criteria 4. Producing a Project Plan © IBM
  • 6. EXAMPLE OF THE PROJECT PLAN Phase Time Resources Risks Business understanding 1 week All analysts Economic change Data understanding 3 weeks All analysts Data problems, technology problems Data preparation 5 weeks Data scientists, DB engineers Data problems, technology problems Modeling 2 weeks Data scientists Technology problems, inability to build adequate model Evaluation 1 week All analysts Economic change, inability to implement results Deployment 1 week Data scientist, DB engineers, implementation team Economic change, inability to implement results © IBM
  • 7. READY FOR THE DATA UNDERSTANDING? From a business perspective:  What does your business hope to gain from this project?  How will you define the successful completion of our efforts?  Do you have the budget and resources needed to reach our goals?  Do you have access to all the data needed for this project?  Have you and your team discussed the risks and contingencies associated with this project?  Do the results of your cost/benefit analysis make this project worthwhile? From a data science perspective:  How specifically can data mining help you meet your business goals?  Do you have an idea about which data mining techniques might produce the best results?  How will you know when your results are accurate or effective enough? (Have we set a measurement of data mining success?)  How will the modeling results be deployed? Have you considered deployment in your project plan?  Does the project plan include all phases of CRISP-DM?  Are risks and dependencies called out in the plan?© IBM
  • 8. DATA UNDERSTANDING © IBM 1. Collect initial data  Existing data  Purchased data  Additional data 2. Describe data  Amount of data  Value types  Coding schemes 3. Explore data 4. Verify data quality  Missing data  Data errors  Coding inconsistencies  Bad metadata
  • 9. READY FOR THE DATA PREPARATION?  Are all data sources clearly identified and accessed? Are you aware of any problems or restrictions?  Have you identified key attributes from the available data?  Did these attributes help you to formulate hypotheses?  Have you noted the size of all data sources?  Are you able to use a subset of data where appropriate?  Have you computed basic statistics for each attribute of interest? Did meaningful information emerge?  Did you use exploratory graphics to gain further insight into key attributes? Did this insight reshape any of your hypotheses?  What are the data quality issues for this project? Do you have a plan to address these issues?  Are the data preparation steps clear? For instance, do you know which data sources to merge and which attributes to filter or select? © IBM
  • 10. DATA PREPARATION © IBM 1. Select right data  Select training examples  Select features 2. Clean data  Fill in missed data  Correct data errors  Make coding consistent 2. Extend data  Extend training examples  Extend features 2. Format data  Put data in a format for training the model
  • 11. READY FOR THE MODELING?  Based upon your initial exploration and understanding, were you able to select relevant subsets of data?  Have you cleaned the data effectively or removed unsalvageable items? Document any decisions in the final report.  Are multiple data sets integrated properly? Were there any merging problems that should be documented?  Have you researched the requirements of the modeling tools that you plan to use?  Are there any formatting issues you can address before modeling? This includes both required formatting concerns as well as tasks that may reduce modeling time. © IBM
  • 12. MODELING © IBM 1. Select modeling techniques  Select data types available for analysis  Select an algorithm or a model Define modeling goals  State specific modeling requirements 2. Build the model  Set up hyperparameters  Train the model  Describe the result 3. Assess the model
  • 13. READY FOR THE EVALUATION?  Are you able to understand the results of the models?  Do the model results make sense to you from a purely logical perspective? Are there apparent inconsistencies that need further exploration?  From your initial glance, do the results seem to address your organization’s business question?  Have you used analysis nodes and lift or gains charts to compare and evaluate model accuracy?  Have you explored more than one type of model and compared the results?  Are the results of your model deployable? © IBM
  • 14. EVALUATION © IBM 1. Evaluate the results  Are results presented clearly?  Are there any novel findings?  Can models and findings be applicable to business goals?  How well do the models and findings answer business goals?  What additional questions the modeling results have risen? 2. Review the process  Did the stage contribute to the value of the results?  What went wrong and how it can be fixed?  Are there alternative decisions which could have been executed? 2. Determine the next steps
  • 15. DEPLOYMENT © IBM 1. Planning for deployment  Summarize models and findings  For each model create a deployment plan  Identify any deployment problems and plan for contingencies 2. Plan monitoring and maintenance  Identify models and findings which require support  How can the accuracy and validity be evaluated?  How will you determine that a model has expired?  What to do with the expired models? 2. Conduct a final project review