SlideShare une entreprise Scribd logo
1  sur  17
Key Principles of Data Mining Presentation by Tobie Muir (Data-Decisions) Henry Stewart Briefing: An Introduction to Marketing Analytics London, 23rd June 2010
What is data mining? “Data mining is the process of finding patterns in your data which you can use to do your business better” Alan Montgomery, formerly Managing Director, Integral Solutions Limited 		(now part of IBM/SPSS) ,[object Object]
These datasets can be incomprehensibly large – too large to analyse without the aid of computer-driven processes.
The role of data mining is to introduce (semi) automated computer-driven processes and statistical techniques, to extract meaningful patterns from such data with the goal of improving the business in question. A classic example in marketing is using DM insights to achieve revenue with less marketing budget.
For very large datasets data mining can focus on a sample within a dataset – instead of analysing millions (billions!) of records, which can be computationally expensive / slow – we analyse a subset of this data in the hope that patterns prevalent in the subset also apply to the entire dataset.
Careful analysis is then required to determine whether any patterns found are meaningful: they could be spurious, coincidental, or it may be such a pattern is only found in the subset. 2 Copyright © 2010 Data-Decisions Ltd
Where does data mining fit with BI tools? ,[object Object]
Business intelligence tools can also encompass the extraction, storage, visualisation and distribution of business information, not just the analysis of business data.
Leading BI tools will typically contain data mining capabilities as well as other more general activities including decision support systems, query and reporting, online analytical processing (OLAP), statistical analysis and forecasting.Business Intelligence ,[object Object]
Online analytical processing (OLAP)
Statistical analysis and forecasting
Query and ReportingData Mining 3 Copyright © 2010 Data-Decisions Ltd
Business Intelligence Data Mining 4 Copyright © 2010 Data-Decisions Ltd
The Relationship between Data Mining and Advanced Analytics Advanced Analytics Data Mining  Focus on Customers Everything else... ,[object Object]
Optimise Best Media Mix
Optimise Responses Customer Acquisition ,[object Object],Customer Retention ,[object Object]
Cross-Sell

Contenu connexe

Tendances (20)

Palm leaf character recognition using radon transform
Palm leaf character recognition using radon transformPalm leaf character recognition using radon transform
Palm leaf character recognition using radon transform
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
 
Prototype model
Prototype modelPrototype model
Prototype model
 
Evolutionary process models se.ppt
Evolutionary process models se.pptEvolutionary process models se.ppt
Evolutionary process models se.ppt
 
Software Process Models
Software Process ModelsSoftware Process Models
Software Process Models
 
RAD Model
RAD ModelRAD Model
RAD Model
 
Project Planning in Software Engineering
Project Planning in Software EngineeringProject Planning in Software Engineering
Project Planning in Software Engineering
 
Machine Learning: Bias and Variance Trade-off
Machine Learning: Bias and Variance Trade-offMachine Learning: Bias and Variance Trade-off
Machine Learning: Bias and Variance Trade-off
 
Cocomo model
Cocomo modelCocomo model
Cocomo model
 
MG6088 SOFTWARE PROJECT MANAGEMENT
MG6088 SOFTWARE PROJECT MANAGEMENTMG6088 SOFTWARE PROJECT MANAGEMENT
MG6088 SOFTWARE PROJECT MANAGEMENT
 
Tree pruning
 Tree pruning Tree pruning
Tree pruning
 
Case study
Case studyCase study
Case study
 
Google Play
Google PlayGoogle Play
Google Play
 
Process Models IN software Engineering
Process Models IN software EngineeringProcess Models IN software Engineering
Process Models IN software Engineering
 
Software reuse ppt.
Software reuse ppt.Software reuse ppt.
Software reuse ppt.
 
CS8078-Green Computing Question Bank
CS8078-Green Computing Question BankCS8078-Green Computing Question Bank
CS8078-Green Computing Question Bank
 
Network Simulator Tutorial
Network Simulator TutorialNetwork Simulator Tutorial
Network Simulator Tutorial
 
Decision tree
Decision treeDecision tree
Decision tree
 
Medical image analysis
Medical image analysisMedical image analysis
Medical image analysis
 

En vedette

What are the keys to effective internal marketing
What are the keys to effective internal marketingWhat are the keys to effective internal marketing
What are the keys to effective internal marketingSameer Mathur
 
Governing Big Data : Principles and practices
Governing Big Data : Principles and practicesGoverning Big Data : Principles and practices
Governing Big Data : Principles and practicesPiyush Malik
 
Intro to network Science
Intro to network ScienceIntro to network Science
Intro to network SciencePyData
 
SPSS Solutions
SPSS SolutionsSPSS Solutions
SPSS SolutionsPhi Jack
 
A Pragmatic Approach to Identity and Access Management
A Pragmatic Approach to Identity and Access ManagementA Pragmatic Approach to Identity and Access Management
A Pragmatic Approach to Identity and Access Managementhankgruenberg
 
Imperatives for market driven strategy
Imperatives for market driven strategyImperatives for market driven strategy
Imperatives for market driven strategyraju07a
 
Analytics et Big Data, une histoire de cubes...
Analytics et Big Data, une histoire de cubes...Analytics et Big Data, une histoire de cubes...
Analytics et Big Data, une histoire de cubes...Mathias Kluba
 
Data mining PPT
Data mining PPTData mining PPT
Data mining PPTKapil Rode
 
50 data principles for loosely coupled identity management v1 0
50 data principles for loosely coupled identity management v1 050 data principles for loosely coupled identity management v1 0
50 data principles for loosely coupled identity management v1 0Ganesh Prasad
 

En vedette (20)

Data mining
Data miningData mining
Data mining
 
What are the keys to effective internal marketing
What are the keys to effective internal marketingWhat are the keys to effective internal marketing
What are the keys to effective internal marketing
 
Governing Big Data : Principles and practices
Governing Big Data : Principles and practicesGoverning Big Data : Principles and practices
Governing Big Data : Principles and practices
 
Neural networks
Neural networksNeural networks
Neural networks
 
Preparing Data for Sharing: The FAIR Principles
Preparing Data for Sharing: The FAIR PrinciplesPreparing Data for Sharing: The FAIR Principles
Preparing Data for Sharing: The FAIR Principles
 
FAIR data overview
FAIR data overviewFAIR data overview
FAIR data overview
 
Data ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housingData ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housing
 
Data-ware Housing
Data-ware HousingData-ware Housing
Data-ware Housing
 
Intro to network Science
Intro to network ScienceIntro to network Science
Intro to network Science
 
SPSS Solutions
SPSS SolutionsSPSS Solutions
SPSS Solutions
 
A Pragmatic Approach to Identity and Access Management
A Pragmatic Approach to Identity and Access ManagementA Pragmatic Approach to Identity and Access Management
A Pragmatic Approach to Identity and Access Management
 
Imperatives for market driven strategy
Imperatives for market driven strategyImperatives for market driven strategy
Imperatives for market driven strategy
 
Do you have english class today
Do you have english class todayDo you have english class today
Do you have english class today
 
Network Science: Theory, Modeling and Applications
Network Science: Theory, Modeling and ApplicationsNetwork Science: Theory, Modeling and Applications
Network Science: Theory, Modeling and Applications
 
Analytics et Big Data, une histoire de cubes...
Analytics et Big Data, une histoire de cubes...Analytics et Big Data, une histoire de cubes...
Analytics et Big Data, une histoire de cubes...
 
Introduction data mining
Introduction data miningIntroduction data mining
Introduction data mining
 
Data mining PPT
Data mining PPTData mining PPT
Data mining PPT
 
Big Data
Big DataBig Data
Big Data
 
Data warehousing and Data mining
Data warehousing and Data mining Data warehousing and Data mining
Data warehousing and Data mining
 
50 data principles for loosely coupled identity management v1 0
50 data principles for loosely coupled identity management v1 050 data principles for loosely coupled identity management v1 0
50 data principles for loosely coupled identity management v1 0
 

Similaire à Key Principles Of Data Mining

Implementing Data Mesh WP LTIMindtree White Paper
Implementing Data Mesh WP LTIMindtree White PaperImplementing Data Mesh WP LTIMindtree White Paper
Implementing Data Mesh WP LTIMindtree White Papershashanksalunkhe12
 
PPT1-Buss Intel Analytics.pptx
PPT1-Buss Intel  Analytics.pptxPPT1-Buss Intel  Analytics.pptx
PPT1-Buss Intel Analytics.pptxssuser28b150
 
Data mining (prefinals)
Data mining (prefinals)Data mining (prefinals)
Data mining (prefinals)sadam33146
 
Big Data Analytics in light of Financial Industry
Big Data Analytics in light of Financial Industry Big Data Analytics in light of Financial Industry
Big Data Analytics in light of Financial Industry Capgemini
 
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptxLecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptxRATISHKUMAR32
 
Big data - The next best thing
Big data - The next best thingBig data - The next best thing
Big data - The next best thingBharath Rao
 
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...
Data Quality in  Data Warehouse and Business Intelligence Environments - Disc...Data Quality in  Data Warehouse and Business Intelligence Environments - Disc...
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...Alan D. Duncan
 
Integrating Analytics into the Operational Fabric of Your Business
Integrating Analytics into the Operational Fabric of Your BusinessIntegrating Analytics into the Operational Fabric of Your Business
Integrating Analytics into the Operational Fabric of Your BusinessIBM India Smarter Computing
 
Big dataplatform operationalstrategy
Big dataplatform operationalstrategyBig dataplatform operationalstrategy
Big dataplatform operationalstrategyHimanshu Bari
 
BIG DATA & BUSINESS ANALYTICS
BIG DATA & BUSINESS ANALYTICSBIG DATA & BUSINESS ANALYTICS
BIG DATA & BUSINESS ANALYTICSVikram Joshi
 
DATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptxDATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptxOTA13NayabNakhwa
 
Why Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A LieWhy Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A LieSunil Ranka
 
Egypt hackathon 2014 analytics & spss session
Egypt hackathon 2014   analytics & spss sessionEgypt hackathon 2014   analytics & spss session
Egypt hackathon 2014 analytics & spss sessionM Baddar
 
The Softer Skills Analysts need to make an impact
The Softer Skills Analysts need to make an impactThe Softer Skills Analysts need to make an impact
The Softer Skills Analysts need to make an impactPaul Laughlin
 
Data Elicitation corporate presentation (june 2014)
Data Elicitation corporate presentation (june 2014)Data Elicitation corporate presentation (june 2014)
Data Elicitation corporate presentation (june 2014)Yves-Marie Lemaître
 

Similaire à Key Principles Of Data Mining (20)

Implementing Data Mesh WP LTIMindtree White Paper
Implementing Data Mesh WP LTIMindtree White PaperImplementing Data Mesh WP LTIMindtree White Paper
Implementing Data Mesh WP LTIMindtree White Paper
 
PPT1-Buss Intel Analytics.pptx
PPT1-Buss Intel  Analytics.pptxPPT1-Buss Intel  Analytics.pptx
PPT1-Buss Intel Analytics.pptx
 
Data mining (prefinals)
Data mining (prefinals)Data mining (prefinals)
Data mining (prefinals)
 
Big Data Analytics in light of Financial Industry
Big Data Analytics in light of Financial Industry Big Data Analytics in light of Financial Industry
Big Data Analytics in light of Financial Industry
 
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptxLecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
 
Big data - The next best thing
Big data - The next best thingBig data - The next best thing
Big data - The next best thing
 
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...
Data Quality in  Data Warehouse and Business Intelligence Environments - Disc...Data Quality in  Data Warehouse and Business Intelligence Environments - Disc...
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...
 
Integrating Analytics into the Operational Fabric of Your Business
Integrating Analytics into the Operational Fabric of Your BusinessIntegrating Analytics into the Operational Fabric of Your Business
Integrating Analytics into the Operational Fabric of Your Business
 
Big dataplatform operationalstrategy
Big dataplatform operationalstrategyBig dataplatform operationalstrategy
Big dataplatform operationalstrategy
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
BIG DATA & BUSINESS ANALYTICS
BIG DATA & BUSINESS ANALYTICSBIG DATA & BUSINESS ANALYTICS
BIG DATA & BUSINESS ANALYTICS
 
9sight operational analytics white paper
9sight   operational analytics white paper9sight   operational analytics white paper
9sight operational analytics white paper
 
DATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptxDATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptx
 
Why Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A LieWhy Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A Lie
 
Egypt hackathon 2014 analytics & spss session
Egypt hackathon 2014   analytics & spss sessionEgypt hackathon 2014   analytics & spss session
Egypt hackathon 2014 analytics & spss session
 
data analysis-mining
data analysis-miningdata analysis-mining
data analysis-mining
 
The Softer Skills Analysts need to make an impact
The Softer Skills Analysts need to make an impactThe Softer Skills Analysts need to make an impact
The Softer Skills Analysts need to make an impact
 
Analytics
AnalyticsAnalytics
Analytics
 
Data Elicitation corporate presentation (june 2014)
Data Elicitation corporate presentation (june 2014)Data Elicitation corporate presentation (june 2014)
Data Elicitation corporate presentation (june 2014)
 
Cloud Analytics Playbook
Cloud Analytics PlaybookCloud Analytics Playbook
Cloud Analytics Playbook
 

Dernier

Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Dernier (20)

Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

Key Principles Of Data Mining

  • 1. Key Principles of Data Mining Presentation by Tobie Muir (Data-Decisions) Henry Stewart Briefing: An Introduction to Marketing Analytics London, 23rd June 2010
  • 2.
  • 3. These datasets can be incomprehensibly large – too large to analyse without the aid of computer-driven processes.
  • 4. The role of data mining is to introduce (semi) automated computer-driven processes and statistical techniques, to extract meaningful patterns from such data with the goal of improving the business in question. A classic example in marketing is using DM insights to achieve revenue with less marketing budget.
  • 5. For very large datasets data mining can focus on a sample within a dataset – instead of analysing millions (billions!) of records, which can be computationally expensive / slow – we analyse a subset of this data in the hope that patterns prevalent in the subset also apply to the entire dataset.
  • 6. Careful analysis is then required to determine whether any patterns found are meaningful: they could be spurious, coincidental, or it may be such a pattern is only found in the subset. 2 Copyright © 2010 Data-Decisions Ltd
  • 7.
  • 8. Business intelligence tools can also encompass the extraction, storage, visualisation and distribution of business information, not just the analysis of business data.
  • 9.
  • 12. Query and ReportingData Mining 3 Copyright © 2010 Data-Decisions Ltd
  • 13. Business Intelligence Data Mining 4 Copyright © 2010 Data-Decisions Ltd
  • 14.
  • 16.
  • 18. Up-SellCustomer Expansion 5 Copyright © 2010 Data-Decisions Ltd
  • 19. The CRISP data mining process CRISP stands for Cross-Industry Standard Process for Data Mining Developed by the CRISP-DM consortium, consisting of DaimlerChrysler (formally Daimler-Benz), SPSS (formally ISL), and NCR. The idea was to standardise the process of data mining across the industry – a common pattern for the process of data mining was established among all collaborators, and CRISP-DM was also a mechanism to introduce uniform terminology and differentiation. CRISP-DM 1.0 was rolled out in Aug 2000, including detailed documentation To the right is the standard six-part CRISP model for how the data mining process occurs from this document: The model highlights the relationships and interdependencies between all 6 phases – the data mining process is one that is dynamic 6 Copyright © 2010 Data-Decisions Ltd
  • 20. The CRISP data mining processPhase 1 and 2 1. Business understanding We begin by understanding the requirements of the project from the business perspective – what does the company in question want to achieve/ get out of this? What are the priorities? How will we the measure outcome? We conclude this phase by producing a preliminary (phase) plan to tackle the established objectives.   2. Data understanding The data understanding phase has two broad aims. The first is to test the data (on which the analysis will be based) in order to identify any quality issues. The second is to try and discover any initial insights into the data that might provide any additional meaningful information. Some basic data visualisation – scatter plots, bar charts, distribution analysis is a great way to get to grips with the data, spot any immediate patterns, as well as test the general data sufficiency, which leads logically onto the next phase, Data Preparation. 7 Copyright © 2010 Data-Decisions Ltd
  • 21. The CRISP data mining process Phase 3 and 4 3. Data preparation The data preparation phase does exactly as its name suggests: this is the phase when the initial (raw) data is modified to produce the final dataset upon which the analysis will take place. Data preparation covers all activities that turn the raw data into the final dataset, ready for the modelling phase, including merging separate datasets and further data pooling, table/record/attribute selection, missing values imputation, data cleaning and spurious data removal and transformation. It is also advisable to consider how to partition the data into modelling and testing segments (typically on a 70/30 split, depending on data volumes). Data preparation, in my experience, is the most time consuming, but absolutely ESSENTIAL, phase out of the entire CRISP process. 8 Copyright © 2010 Data-Decisions Ltd
  • 22. The CRISP data mining process Phase 3 and 4 4. Modelling The modelling phase is the heart of the CRISP model. This is the point when we take the modified dataset and apply (typically) several modelling techniques. We would want to use several techniques as no single technique is perfect, and the range of results gathered should overcome the limitations of any one particular model. There is some interaction between phases 3 and 4: different techniques may require the data in different forms, and so it may be necessary to prepare the data in multiple ways to prep it for the various models. We will cover some of the different modelling techniques later in the presentation. 9 Copyright © 2010 Data-Decisions Ltd
  • 23. The CRISP data mining process Phase 5 5. Evaluation There are many different techniques and methods for evaluating the models created during the modelling phase. First and foremost you are looking to compare the model error rates, or inversely, the model accuracy rates – this is estimated by how well the models perform on the test data (data that was omitted during the model building phase). There are a number of ways to measure this, but most methods simply amount to providing a score that allows you to choose the model with the lowest error rate. Lift charts provide a very effective way to visualise and compare model performances over the test set. This is also a good way to access whether you may need to combine models together to arrive at an overall better solution. 10 Copyright © 2010 Data-Decisions Ltd
  • 24.
  • 25. The CRISP data mining process Phase 6 6. Deployment The deployment phase consolidates the results that the Model produces in a form that is useable to the customer. It could be that the data mining exercise was undertaken with the aim of simply increasing the knowledge of the data, but even in this restricted remit, and more generally, any knowledge gained from the exercise must be presented in a way that is of use to the customer.  Depending on the nature of the data mining project undertaken, the deployment phase can vary from being simply a report generated all the way through to implementing a repeatable data mining process across the enterprise. It is not unusual for the customer to perform the deployment phase (as opposed to the data analyst), and in either case it is important that the customer understands the actions that need to be carried out in order to make best use of the models created. 12 Copyright © 2010 Data-Decisions Ltd
  • 26.
  • 33.
  • 36. Genetic algorithmsDecision-trees Bayes Clustering 13 Copyright © 2010 Data-Decisions Ltd
  • 37. How data mining models are built and applied 14 Copyright © 2010 Data-Decisions Ltd
  • 38.
  • 39. Models need to be evaluated to see that the results produced are compatible with the project objectives.
  • 40. No model is ever perfect, so should always be work-in-progress and subject to continuous on-going scheduled refinements and improvements.15 Copyright © 2010 Data-Decisions Ltd
  • 41. Conclusion “Data mining is the process of finding patterns in your data which you can use to do your business better” Data mining is a subset of a much larger sphere known as Business Intelligence, which includes data parsing, visualisation, OLAP and data warehousing Advanced analytics encompasses Data Mining but also includes non-customer focussed activities that require mathematical and statistical approaches CRISP is an established proven Data Mining framework Key emphasis in Data Mining must be on understanding – also never underestimate the importance or amount of work involved in data mining No model is ever perfect and is only the starting point for future iterative improvements 16 Copyright © 2010 Data-Decisions Ltd
  • 42.
  • 45.
  • 46. Applied Data Mining: Statistical Methods for Business and Industry (Paolo Giudici)
  • 47. Data Mining Techniques: for Marketing, Sales and Customer Relationship Management (Berry and Linoff)Tobie Muir (Managing Director) E. tobie@data-decisions.co.uk T. 0208 144 7422 /07903 525358 W. data-decisions.co.uk 17 Copyright © 2010 Data-Decisions Ltd

Notes de l'éditeur

  1. Data mining could be thought of as essentially ‘Customer analytics’, or more precisely, analytics instigated at the request of a customerwith the purpose of gaining insight (knowledge) of some data. Typically we view customer analytics as predictive and descriptive modelling, which isusually in relation to large CRM (Customer Relationship Management)/Marketing databases. It is often the case that data mining exercises model customers, however any entity for which there is data stored can be investigated. Others could include: households, websessions, calls, etc.http://www.thebusinessintelligenceguide.com/bi_tools/Difference_Between_Analytics_and_Advanced_Analytics.php
  2. http://www.spss.fi/pdf/crisp-dm.pdf
  3. At this point, we must consider if the model does indeed reflect the reality ofwhat it is we’re attempting to model, and (more importantly)that the model will in fact achieve the business objectives. Thus the model must be thoroughlyevaluated, and this includes reviewing the steps taken to construct themodel. In particular, it is essential that we ensure the model incorporatesevery important business issue. This may mean that the model needs to bereviewed and worked on – so we have some interaction between phases 4and 5. This phase typically concludes with a decision on how the datamining results achieved will be used. 
  4. http://msdn.microsoft.com/en-us/library/ms175428(SQL.100).aspx
  5. Data description and summarisationInitial exploratory data analysis can help to investigate and understand the data, and provide potential hypotheses for hidden information. Summarisation also plays a significant role in the presentation of final results.SegmentationA segmentation data mining analysis aims to separate the data into interesting and meaningful subgroups or classes, so that members of a subgroup share common characteristics. A classic example would be a shopping basket analysis where the segments of baskets depends on the items they contain.Concept descriptionsConcept description aims to give an understandable description of the concepts or classes. This is not done to produce complete models with high prediction accuracy, but instead it is done in order to gain insights. E.g. a company might be interested in learning more about their loyal and disloyal customers. From concept descriptions such is this, a company could then conclude what might be done in order to keep customers loyal, or transform disloyal customers into loyal ones. Concept description has close connections with both segmentation and classification. Segmentation could lead generating a concept or class of data without really any understandable description of the elements in that class. ClassificationClassification has connections to almost all other problem types. An example of this is the following: credit scoring attempts to assess the credit risk of a new customer. This problem can be transformed into a classification problem by partitioning customers into two new classes: good customers, and bad customers. This new model can then be used to assign prospective customers into one of the two classes available, and hence either accept or reject them.PredictionPrediction problems are similar to classification problems, with one major difference: in prediction, the target attribute (or class) is not a qualitative discrete attribute, but instead a continuous one. This means that the aim of a prediction model is to find and assign a numerical value of a target attribute for unseen objects.In particular, if the prediction model is dealing with time series data, then it is often referred to as forecasting.Dependency analysisDependency analysis consists of finding a model that describes significant dependencies (or associations) between data items or events. Dependencies can be used to predict the value of a data item given information on other items. Dependencies can be used for predictive modelling; however in general they are mostly used for understanding.