SlideShare une entreprise Scribd logo
1  sur  20
Data Mining
Introduction
intro
Data mining is a powerful new
technology with great potential to help
companies focus on the most important
information in the data they have
collected about the behavior of their
customers and potential customers.
Data collections in the real world






Ten largest transaction-processing
databases range from 3 to 18
Terabytes
Ten largest decision support databases
range from 10 to 29 Terabytes
Sizes have doubled / tripled between
2001 and end of 2003
Questions arise






Is there any new, unexpected and
potentially useful information contained
in this data?
Can we use historical data to predict
future outcomes?
(e.g. customer behavior, fraud
detection, etc.)
Some examples of data mining
1.

Telecommunications

Huge amount of data is collected daily
 Transactional data (about each phone call)
 Data on mobile phones, house based phones, Internet, etc.)
 Other customer data (billing, personal information, etc.)
 Additional data (network load, faults, etc.)
Questions arises
 Which customer group is highly profitable, which one is not?
 To which customers should we advertise what kind of special
offers?
 What kind of call rates would increase profit without loosing good
customers?
 How do customer profiles change over time?
 Fraud detection (stolen mobile phones or phone cards

Another
2. Health
 Different aspects of the health system
 Personal health records (at GPs, specialists, etc.)
 Hospital data (e.g. admission data, midwives data,
surgery data)
 Billing information (Medicare, PBS)
Questions
 Are doctors following the procedures (e.g. prescription of
medication)?
 Adverse drug reactions (analysis of different data
collections to find correlations)
 Are people committing fraud (e.g. doctor shoppers)
 Correlations between social and environmental issues
and people's health?
What is data mining?


Data Mining is the automated extraction
of previously unrealized information
from Large data sources for the
purpose of supporting business actions.
Some more definitions






Knowledge discovery in databases is the
non-trivial process of identifying valid, novel,
potentially useful, and ultimately
understandable patterns in data.
An information extraction activity whose goal
is to discover hidden facts contained in
databases.
Data mining, or knowledge discovery, is the
computer-assisted process of digging through
and analyzing enormous sets of data and
then extracting the meaning of the data.
Data mining process
Data mining process






Extract, transform, and load transaction
data onto the data warehouse system.
Store and manage the data in a
multidimensional database system.
Provide data access to business
analysts and information technology
professionals.
Data mining process




Analyze the data by application
software.
Present the data in a useful format,
such as a graph or table.
DM is multi disciplinary
What they do
Detect patterns in data: Rules, patterns,
classes, associations and functional
dependencies, outliers, data distributions,
clusters
How they do it



Search through data and pattern space,
non-parametric modelling, filtering,
aggregation
How well they do it
Errors and biases, over-fitting,
confounding effects, speed, scalability
Challenges in DM






Data size
 Size of data collections grows more than
linear, doubling every 18 months
 Scalable algorithms are needed
 Data complexity
Different types of data (free text, HTML, XML,
multimedia)
Dimensionality of the data increases (more
attributes)
Challenges contd..






The curse of dimensionality affects many
algorithms
(for example find nearest neighbors in high
dimensions)
Data quality
 Real world data is messy and dirty
(missing and out-of-date values,
typographical errors, different
coding/formats, etc.)
Why mine data?







Data is being recorded
Recorded data is being warehoused
Computing power is affordable
Competitive pressure is strong
Commercial DM products are available
It provides support for business
decisions
Value to business






Market segmentation - Identify the
common characteristics of customers
who buy the same products from your
company.
Customer churn - Predict which
customers are likely to leave your
company and go to a competitor.
Fraud detection - Identify which
transactions are most likely to be
fraudulent.
Value to business




Interactive marketing - Predict what each
individual accessing a Web site is most
likely interested in seeing.
Market basket analysis - Understand what
products or services are commonly
purchased together; e.g., beer and
diapers.
Value to business






Trend analysis - Reveal the difference
between a typical customer this month
and last.
Data mining can also effectively deal with
missing, inconsistent, and noisy data.
Direct marketing - Identify which prospects
should be included in a mailing list to
obtain the highest response rate.

Contenu connexe

Tendances

Significance of Data Mining
Significance of Data MiningSignificance of Data Mining
Significance of Data Mining8trackweb
 
Application areas of data mining
Application areas of data miningApplication areas of data mining
Application areas of data miningpriya jain
 
Presentation data mining
Presentation data miningPresentation data mining
Presentation data miningcegonsoft1999
 
Data mining Introduction
Data mining IntroductionData mining Introduction
Data mining IntroductionVijayasankariS
 
Data mining in Telecommunications
Data mining in TelecommunicationsData mining in Telecommunications
Data mining in TelecommunicationsMohsin Nadaf
 
Introduction to Big Data & Analytics
Introduction to Big Data & AnalyticsIntroduction to Big Data & Analytics
Introduction to Big Data & AnalyticsPrasad Chitta
 
Mejorar la toma de decisiones con Big Data
Mejorar la toma de decisiones con Big DataMejorar la toma de decisiones con Big Data
Mejorar la toma de decisiones con Big DataMiguel Ángel Gómez
 
Data mining tutorial
Data mining tutorialData mining tutorial
Data mining tutorialgrinu
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data MiningIOSR Journals
 
Data mining & data warehousing
Data mining & data warehousingData mining & data warehousing
Data mining & data warehousingShubha Brota Raha
 
BIG DATA BY SAIKIRAN PANJALA
BIG DATA BY SAIKIRAN PANJALABIG DATA BY SAIKIRAN PANJALA
BIG DATA BY SAIKIRAN PANJALASaikiran Panjala
 

Tendances (20)

Significance of Data Mining
Significance of Data MiningSignificance of Data Mining
Significance of Data Mining
 
Application areas of data mining
Application areas of data miningApplication areas of data mining
Application areas of data mining
 
Presentation data mining
Presentation data miningPresentation data mining
Presentation data mining
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
Big data overview
Big data overviewBig data overview
Big data overview
 
Data mining Introduction
Data mining IntroductionData mining Introduction
Data mining Introduction
 
Data mining in Telecommunications
Data mining in TelecommunicationsData mining in Telecommunications
Data mining in Telecommunications
 
Big data
Big dataBig data
Big data
 
Introduction to Big Data & Analytics
Introduction to Big Data & AnalyticsIntroduction to Big Data & Analytics
Introduction to Big Data & Analytics
 
Mejorar la toma de decisiones con Big Data
Mejorar la toma de decisiones con Big DataMejorar la toma de decisiones con Big Data
Mejorar la toma de decisiones con Big Data
 
Data mining
Data miningData mining
Data mining
 
Big data
Big dataBig data
Big data
 
Data mining
Data miningData mining
Data mining
 
Sample
Sample Sample
Sample
 
Data mining tutorial
Data mining tutorialData mining tutorial
Data mining tutorial
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data Mining
 
Data mining
Data miningData mining
Data mining
 
Data mining & data warehousing
Data mining & data warehousingData mining & data warehousing
Data mining & data warehousing
 
BIG DATA BY SAIKIRAN PANJALA
BIG DATA BY SAIKIRAN PANJALABIG DATA BY SAIKIRAN PANJALA
BIG DATA BY SAIKIRAN PANJALA
 

En vedette (20)

Data mining Introduction
Data mining IntroductionData mining Introduction
Data mining Introduction
 
Featured Speakers and Chefs
Featured Speakers and ChefsFeatured Speakers and Chefs
Featured Speakers and Chefs
 
Aγωγη του καταναλωτη
Aγωγη του καταναλωτηAγωγη του καταναλωτη
Aγωγη του καταναλωτη
 
2014 Volkl Ski Reviews by The-House.com
2014 Volkl Ski Reviews by The-House.com2014 Volkl Ski Reviews by The-House.com
2014 Volkl Ski Reviews by The-House.com
 
Gamification
GamificationGamification
Gamification
 
شهادات تاهلية1
شهادات تاهلية1شهادات تاهلية1
شهادات تاهلية1
 
Avalanche Survival Infographic
Avalanche Survival InfographicAvalanche Survival Infographic
Avalanche Survival Infographic
 
Selection of Human Resource
Selection of Human ResourceSelection of Human Resource
Selection of Human Resource
 
May loc nuoc home pure
May loc nuoc home pureMay loc nuoc home pure
May loc nuoc home pure
 
National parks
National parks National parks
National parks
 
Mapping your sense of place: Discovering, Understanding, Embracing
Mapping your sense of place: Discovering, Understanding, EmbracingMapping your sense of place: Discovering, Understanding, Embracing
Mapping your sense of place: Discovering, Understanding, Embracing
 
Langdon Liz artwork portfolio
Langdon Liz  artwork portfolioLangdon Liz  artwork portfolio
Langdon Liz artwork portfolio
 
Progettare Media Education nelle scuole
Progettare Media Education nelle scuoleProgettare Media Education nelle scuole
Progettare Media Education nelle scuole
 
Bejeweled
BejeweledBejeweled
Bejeweled
 
e-learning
e-learninge-learning
e-learning
 
Aboutmydaughter
AboutmydaughterAboutmydaughter
Aboutmydaughter
 
Teori Belajar Brunner
Teori Belajar BrunnerTeori Belajar Brunner
Teori Belajar Brunner
 
Question 2
Question 2Question 2
Question 2
 
English Comp 1 Who am I?
English Comp 1 Who am I?English Comp 1 Who am I?
English Comp 1 Who am I?
 
Presentation2
Presentation2Presentation2
Presentation2
 

Similaire à Data mining introduction

Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)yesheeka
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big dataHari Priya
 
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...IJSCAI Journal
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...gerogepatton
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...ijscai
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...gerogepatton
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...gerogepatton
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...gerogepatton
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...ijscai
 
Final ppt sec.data.coll
Final ppt sec.data.collFinal ppt sec.data.coll
Final ppt sec.data.collRam Sonawane
 
Big Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptxBig Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptxPrabhaJoshi4
 
Secondary Research in Applied Marketing Research
Secondary Research in Applied Marketing ResearchSecondary Research in Applied Marketing Research
Secondary Research in Applied Marketing ResearchKelly Page
 
Big Data Analytics and its Application in E-Commerce
Big Data Analytics and its Application in E-CommerceBig Data Analytics and its Application in E-Commerce
Big Data Analytics and its Application in E-CommerceUyoyo Edosio
 
Statistika dan Analisis Data
Statistika dan Analisis DataStatistika dan Analisis Data
Statistika dan Analisis Datakisti purwitosari
 

Similaire à Data mining introduction (20)

Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
Big data unit i
Big data unit iBig data unit i
Big data unit i
 
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
 
Final ppt sec.data.coll
Final ppt sec.data.collFinal ppt sec.data.coll
Final ppt sec.data.coll
 
Unit III.pdf
Unit III.pdfUnit III.pdf
Unit III.pdf
 
Big Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptxBig Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptx
 
Unit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdfUnit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdf
 
Unit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdfUnit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdf
 
Data mining and its applications!
Data mining and its applications!Data mining and its applications!
Data mining and its applications!
 
Secondary Research in Applied Marketing Research
Secondary Research in Applied Marketing ResearchSecondary Research in Applied Marketing Research
Secondary Research in Applied Marketing Research
 
Big Data Analytics and its Application in E-Commerce
Big Data Analytics and its Application in E-CommerceBig Data Analytics and its Application in E-Commerce
Big Data Analytics and its Application in E-Commerce
 
Statistika dan Analisis Data
Statistika dan Analisis DataStatistika dan Analisis Data
Statistika dan Analisis Data
 
Big Data Ethics
Big Data EthicsBig Data Ethics
Big Data Ethics
 

Plus de Niyitegekabilly

Plus de Niyitegekabilly (7)

Introduction to knowledge management
Introduction to knowledge managementIntroduction to knowledge management
Introduction to knowledge management
 
Data mining techniques and dss
Data mining techniques and dssData mining techniques and dss
Data mining techniques and dss
 
Data wirehouse
Data wirehouseData wirehouse
Data wirehouse
 
Introduction to knowledge management
Introduction to knowledge managementIntroduction to knowledge management
Introduction to knowledge management
 
JAVA PROGRAMMINGD
JAVA PROGRAMMINGDJAVA PROGRAMMINGD
JAVA PROGRAMMINGD
 
Birasa 1
Birasa 1Birasa 1
Birasa 1
 
JAVA PROGRAMMING
JAVA PROGRAMMING JAVA PROGRAMMING
JAVA PROGRAMMING
 

Dernier

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Dernier (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Data mining introduction

  • 2. intro Data mining is a powerful new technology with great potential to help companies focus on the most important information in the data they have collected about the behavior of their customers and potential customers.
  • 3. Data collections in the real world    Ten largest transaction-processing databases range from 3 to 18 Terabytes Ten largest decision support databases range from 10 to 29 Terabytes Sizes have doubled / tripled between 2001 and end of 2003
  • 4. Questions arise    Is there any new, unexpected and potentially useful information contained in this data? Can we use historical data to predict future outcomes? (e.g. customer behavior, fraud detection, etc.)
  • 5. Some examples of data mining 1. Telecommunications Huge amount of data is collected daily  Transactional data (about each phone call)  Data on mobile phones, house based phones, Internet, etc.)  Other customer data (billing, personal information, etc.)  Additional data (network load, faults, etc.) Questions arises  Which customer group is highly profitable, which one is not?  To which customers should we advertise what kind of special offers?  What kind of call rates would increase profit without loosing good customers?  How do customer profiles change over time?  Fraud detection (stolen mobile phones or phone cards 
  • 6. Another 2. Health  Different aspects of the health system  Personal health records (at GPs, specialists, etc.)  Hospital data (e.g. admission data, midwives data, surgery data)  Billing information (Medicare, PBS) Questions  Are doctors following the procedures (e.g. prescription of medication)?  Adverse drug reactions (analysis of different data collections to find correlations)  Are people committing fraud (e.g. doctor shoppers)  Correlations between social and environmental issues and people's health?
  • 7. What is data mining?  Data Mining is the automated extraction of previously unrealized information from Large data sources for the purpose of supporting business actions.
  • 8. Some more definitions    Knowledge discovery in databases is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. An information extraction activity whose goal is to discover hidden facts contained in databases. Data mining, or knowledge discovery, is the computer-assisted process of digging through and analyzing enormous sets of data and then extracting the meaning of the data.
  • 10. Data mining process    Extract, transform, and load transaction data onto the data warehouse system. Store and manage the data in a multidimensional database system. Provide data access to business analysts and information technology professionals.
  • 11. Data mining process   Analyze the data by application software. Present the data in a useful format, such as a graph or table.
  • 12. DM is multi disciplinary
  • 13. What they do Detect patterns in data: Rules, patterns, classes, associations and functional dependencies, outliers, data distributions, clusters
  • 14. How they do it  Search through data and pattern space, non-parametric modelling, filtering, aggregation How well they do it Errors and biases, over-fitting, confounding effects, speed, scalability
  • 15. Challenges in DM    Data size  Size of data collections grows more than linear, doubling every 18 months  Scalable algorithms are needed  Data complexity Different types of data (free text, HTML, XML, multimedia) Dimensionality of the data increases (more attributes)
  • 16. Challenges contd..    The curse of dimensionality affects many algorithms (for example find nearest neighbors in high dimensions) Data quality  Real world data is messy and dirty (missing and out-of-date values, typographical errors, different coding/formats, etc.)
  • 17. Why mine data?       Data is being recorded Recorded data is being warehoused Computing power is affordable Competitive pressure is strong Commercial DM products are available It provides support for business decisions
  • 18. Value to business    Market segmentation - Identify the common characteristics of customers who buy the same products from your company. Customer churn - Predict which customers are likely to leave your company and go to a competitor. Fraud detection - Identify which transactions are most likely to be fraudulent.
  • 19. Value to business   Interactive marketing - Predict what each individual accessing a Web site is most likely interested in seeing. Market basket analysis - Understand what products or services are commonly purchased together; e.g., beer and diapers.
  • 20. Value to business    Trend analysis - Reveal the difference between a typical customer this month and last. Data mining can also effectively deal with missing, inconsistent, and noisy data. Direct marketing - Identify which prospects should be included in a mailing list to obtain the highest response rate.