SlideShare une entreprise Scribd logo
1  sur  17
Data Mining
Chris Nelson
CS 157 A
Fall 2007
Data Mining
 New buzzword, old idea.
 Inferring new information from already
collected data.
 Traditionally job of Data Analysts
 Computers have changed this.
Far more efficient to comb through data using
a machine than eyeballing statistical data.
Data Mining – Two Main Components
 Wikipedia definition: “Data mining is the entire process of applying
computer-based methodology, including new techniques for knowledge
discovery, from data.”
 Knowledge Discovery
Concrete information gleaned from known data. Data you may not have
known, but which is supported by recorded facts.
(ie: Diapers and beer example from previous presentation)
 Knowledge Prediction
Uses known data to forecast future trends, events, etc. (ie: Stock market
predictions)
 Wikipedia note: "some data mining systems such as neural networks are
inherently geared towards prediction and pattern recognition, rather than
knowledge discovery.“ These include applications in AI and Symbol
analysis
Data Mining vs. Data Analysis
 In terms of software and the marketing thereof
Data Mining != Data Analysis
 Data Mining implies software uses some intelligence
over simple grouping and partitioning of data to
infer new information.
 Data Analysis is more in line with standard
statistical software (ie: web stats). These usually
present information about subsets and relations
within the recorded data set (ie: browser/search
engine usage, average visit time, etc. )
Data Mining Subtypes
 Data Dredging
The process of scanning a data set for relations and then
coming up with a hypothesis for existence of those relations.
 MetaData
Data that describes other data. Can describe an individual
element, or a collection of elements.
Wikipedia example: “In a library, where the data is the
content of the titles stocked, metadata about a title would
typically include a description of the content, the author, the
publication date and the physical location”
 Applications for Data Dredging in business include Market
and Risk Analysis, as well as trading strategies.
 Applications for Science include disaster prediction.
Propositional vs. Relational Data
 Old data mining methods relied on Propositional Data, or
data that was related to a single, central element, that could
be represented in a vector format. (ie: the purchasing history
of a single user. Amazon uses such vectors in its related item
suggestions [a multidimensional dot product])
 Current, advanced data mining methods rely on Relational
Data, or data that can be stored and modeled easily through
use of relational databases. An example of this would be data
used to represent interpersonal relations.
 Relational Data is more interesting than Propositional data to
miners in the sense that an entity, and all the entities to which
it is related, factor into the data inference process.
Key Component of Data Mining
 Whether Knowledge Discovery or Knowledge
Prediction, data mining takes information that was
once quite difficult to detect and presents it in an
easily understandable format (ie: graphical or
statistical)
 Data mining Techniques involve sophisticated
algorithms, including Decision Tree Classifications,
Association detection, and Clustering.
 Since Data mining is not on test, I will keep things
superficial.
Uses of Data Mining
 AI/Machine Learning
Combinatorial/Game Data Mining
Good for analyzing winning strategies to games, and thus
developing intelligent AI opponents. (ie: Chess)
 Business Strategies
Market Basket Analysis
Identify customer demographics, preferences, and purchasing
patterns.
 Risk Analysis
Product Defect Analysis
Analyze product defect rates for given plants and predict
possible complications (read: lawsuits) down the line.
Uses of Data Mining (Continued)
 User Behavior Validation
Fraud Detection
In the realm of cell phones
Comparing phone activity to calling records.
Can help detect calls made on cloned phones.
Similarly, with credit cards, comparing
purchases with historical purchases. Can
detect activity with stolen cards.
Uses of Data Mining (Continued)
 Health and Science
Protein Folding
Predicting protein interactions and functionality within
biological cells. Applications of this research include
determining causes and possible cures for Alzheimers,
Parkinson's, and some cancers (caused by protein "misfolds")
Extra-Terrestrial Intelligence
Scanning Satellite receptions for possible transmissions from
other planets.
 For more information see Stanford’s Folding@home and
SETI@home projects. Both involve participation in a widely
distributed computer application.
Sources of Data for Mining
 Databases (most obvious)
 Text Documents
 Computer Simulations
 Social Networks
Privacy Concerns
 Mining of public and government databases is done,
though people have, and continue to raise concerns.
 Wiki quote:
"data mining gives information that would not be
available otherwise. It must be properly interpreted
to be useful. When the data collected involves
individual people, there are many questions
concerning privacy, legality, and ethics."
Prevalence of Data Mining
 Your data is already being mined, whether you like it or not.
 Many web services require that you allow access to your information [for
data mining] in order to use the service.
 Google mines email data in Gmail accounts to present account owners
with ads.
 Facebook requires users to allow access to info from non-Facebook
pages. Facebook privacy policy:
"We may use information about you that we collect from other sources,
including but not limited to newspapers and Internet sources such as
blogs, instant messaging services and other users of Facebook, to
supplement your profile.
 This allows access to your blog RSS feed (rather innocuous), as well as
information obtained through partner sites (worthy of concern).
Data Mining Controversies
 Latest one: Facebook's Beacon Advertising program
(Just popped on Slashdot within the last week)
 What Beacon does:
“when you engage in consumer activity at a
[Facebook] partner website, such as Amazon, eBay,
or the New York Times, not only will Facebook
record that activity, but your Facebook connections
will also be informed of your purchases or actions.”
[taken from
http://trickytrickywhiteboy.blogspot.com/2007/11/be
ware-of-facebooks-beacon.html]
Controversies continued
 Implications: "Thus where Facebook used to be collecting data only
within the confines of its own website, it will now extend that ability to
harvest data across other websites that it partners with. Some of the
companies that have signed on to participate on the advertising side
include Coca-Cola, Sony, Verizon, Comcast, Ebay — and the CBC. The
initial list of 44 partner websites participating on the data collection side
include the New York Times, Blockbuster, Amazon, eBay, LiveJournal,
and Epicurious.”
[Remember the privacy policy on the previous slide]
 Verdict is still out. This may violate an old (100+ years) New York law
prohibiting advertising using endorsements without the endorsee’s
consent.
 Facebook currently offers users no way to opt out of Beacon (once it has
been activated ?). Users can close the accounts, but account data is never
deleted.
Bottom Line
 Data obtained through Data Mining is
incredibly valuable
 Companies are understandably reluctant to
give up data they have obtained.
 Expect to see prevalence of Data Mining and
(possibly subversive) methods increase in
years to come.
Recommended Resources and
Works Consulted
 Wikipedia Data Mining entry
http://en.wikipedia.org/wiki/Data_mining
 "Privacy is Dead - Get Over It: Revisited"
Steve Rambam's Hope Number Six lecture
http://www.hopenumbersix.net/speakers.html#pid2
 Facebook's Faux Pas
http://www.newsweek.com/id/69275
 Beware of Facebook’s Beacon
http://trickytrickywhiteboy.blogspot.com/2007/11/beware-of-facebooks-beacon.html
 Facebook Data Mining guide
http://saunderslog.com/2007/11/25/facebook-market-research-secrets/
 Data Mining in Social Networks
http://kdl.cs.umass.edu/papers/jensen-neville-nas2002.pdf

Contenu connexe

Tendances

Importance of Data Mining
Importance of Data MiningImportance of Data Mining
Importance of Data MiningScottperrone
 
Data Mining Techniques
Data Mining TechniquesData Mining Techniques
Data Mining TechniquesSanzid Kawsar
 
Information Technology Data Mining
Information Technology Data MiningInformation Technology Data Mining
Information Technology Data Miningsamiksha sharma
 
Data warehousing and data mining
Data warehousing and data miningData warehousing and data mining
Data warehousing and data miningSnehali Chake
 
Data Mining – analyse Bank Marketing Data Set by WEKA.
Data Mining – analyse Bank Marketing Data Set by WEKA.Data Mining – analyse Bank Marketing Data Set by WEKA.
Data Mining – analyse Bank Marketing Data Set by WEKA.Mateusz Brzoska
 
Business intelligence concepts & application
Business intelligence concepts & applicationBusiness intelligence concepts & application
Business intelligence concepts & applicationnandini patil
 
Application areas of data mining
Application areas of data miningApplication areas of data mining
Application areas of data miningpriya jain
 
Data Mining & Data Warehousing
Data Mining & Data WarehousingData Mining & Data Warehousing
Data Mining & Data WarehousingAAKANKSHA JAIN
 
Ch 1 Intro to Data Mining
Ch 1 Intro to Data MiningCh 1 Intro to Data Mining
Ch 1 Intro to Data MiningSushil Kulkarni
 
Data warehouse Vs Big Data
Data warehouse Vs Big Data Data warehouse Vs Big Data
Data warehouse Vs Big Data Lisette ZOUNON
 
Introduction to Big Data & Analytics
Introduction to Big Data & AnalyticsIntroduction to Big Data & Analytics
Introduction to Big Data & AnalyticsPrasad Chitta
 
Significance of Data Mining
Significance of Data MiningSignificance of Data Mining
Significance of Data Mining8trackweb
 

Tendances (20)

Big data
Big dataBig data
Big data
 
Data Mining
Data MiningData Mining
Data Mining
 
Data mining
Data miningData mining
Data mining
 
Importance of Data Mining
Importance of Data MiningImportance of Data Mining
Importance of Data Mining
 
Data Mining
Data MiningData Mining
Data Mining
 
Data Mining Techniques
Data Mining TechniquesData Mining Techniques
Data Mining Techniques
 
Information Technology Data Mining
Information Technology Data MiningInformation Technology Data Mining
Information Technology Data Mining
 
Data warehousing and data mining
Data warehousing and data miningData warehousing and data mining
Data warehousing and data mining
 
Data Mining – analyse Bank Marketing Data Set by WEKA.
Data Mining – analyse Bank Marketing Data Set by WEKA.Data Mining – analyse Bank Marketing Data Set by WEKA.
Data Mining – analyse Bank Marketing Data Set by WEKA.
 
Data mining and its applications!
Data mining and its applications!Data mining and its applications!
Data mining and its applications!
 
Business intelligence concepts & application
Business intelligence concepts & applicationBusiness intelligence concepts & application
Business intelligence concepts & application
 
Application areas of data mining
Application areas of data miningApplication areas of data mining
Application areas of data mining
 
Data Mining
Data MiningData Mining
Data Mining
 
Data Mining & Data Warehousing
Data Mining & Data WarehousingData Mining & Data Warehousing
Data Mining & Data Warehousing
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
Ch 1 Intro to Data Mining
Ch 1 Intro to Data MiningCh 1 Intro to Data Mining
Ch 1 Intro to Data Mining
 
Data warehouse Vs Big Data
Data warehouse Vs Big Data Data warehouse Vs Big Data
Data warehouse Vs Big Data
 
Introduction to Big Data & Analytics
Introduction to Big Data & AnalyticsIntroduction to Big Data & Analytics
Introduction to Big Data & Analytics
 
Big data overview
Big data overviewBig data overview
Big data overview
 
Significance of Data Mining
Significance of Data MiningSignificance of Data Mining
Significance of Data Mining
 

En vedette

Data mining slides
Data mining slidesData mining slides
Data mining slidessmj
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial Salah Amean
 
Data Donderdag - Making your own smart ‘machine learning’ thermostat
Data Donderdag - Making your own smart ‘machine learning’ thermostatData Donderdag - Making your own smart ‘machine learning’ thermostat
Data Donderdag - Making your own smart ‘machine learning’ thermostatNiek Temme
 
An example of discovering simple patterns using basic data mining
An example of discovering simple patterns using basic data miningAn example of discovering simple patterns using basic data mining
An example of discovering simple patterns using basic data miningEoin Brazil
 
Zen & the art of data mining
Zen & the art of data miningZen & the art of data mining
Zen & the art of data miningheinestien
 
A secure cloud computing based framework for big information management syste...
A secure cloud computing based framework for big information management syste...A secure cloud computing based framework for big information management syste...
A secure cloud computing based framework for big information management syste...Pawan Arya
 
Seminar smart machine
Seminar smart machineSeminar smart machine
Seminar smart machineGarima Nanda
 
Difference between data warehouse and data mining
Difference between data warehouse and data miningDifference between data warehouse and data mining
Difference between data warehouse and data miningmaxonlinetr
 
Smart machines presentation, Oct 2014
Smart machines presentation, Oct 2014Smart machines presentation, Oct 2014
Smart machines presentation, Oct 2014Immo Salo
 
Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Harish Chand
 
Data mining & data warehousing
Data mining & data warehousingData mining & data warehousing
Data mining & data warehousingShubha Brota Raha
 
Data Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data VisualisationData Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data VisualisationSunderland City Council
 
Elastic Web Mining
Elastic Web MiningElastic Web Mining
Elastic Web MiningKen Krugler
 
Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data WarehousingAmdocs
 

En vedette (18)

Data mining slides
Data mining slidesData mining slides
Data mining slides
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
 
What is a Smart Machine?
What is a Smart Machine?What is a Smart Machine?
What is a Smart Machine?
 
Data Donderdag - Making your own smart ‘machine learning’ thermostat
Data Donderdag - Making your own smart ‘machine learning’ thermostatData Donderdag - Making your own smart ‘machine learning’ thermostat
Data Donderdag - Making your own smart ‘machine learning’ thermostat
 
Mis cloud computing
Mis cloud computingMis cloud computing
Mis cloud computing
 
An example of discovering simple patterns using basic data mining
An example of discovering simple patterns using basic data miningAn example of discovering simple patterns using basic data mining
An example of discovering simple patterns using basic data mining
 
Zen & the art of data mining
Zen & the art of data miningZen & the art of data mining
Zen & the art of data mining
 
A secure cloud computing based framework for big information management syste...
A secure cloud computing based framework for big information management syste...A secure cloud computing based framework for big information management syste...
A secure cloud computing based framework for big information management syste...
 
Seminar smart machine
Seminar smart machineSeminar smart machine
Seminar smart machine
 
Introduction data mining
Introduction data miningIntroduction data mining
Introduction data mining
 
Difference between data warehouse and data mining
Difference between data warehouse and data miningDifference between data warehouse and data mining
Difference between data warehouse and data mining
 
Smart machines presentation, Oct 2014
Smart machines presentation, Oct 2014Smart machines presentation, Oct 2014
Smart machines presentation, Oct 2014
 
Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)
 
Data mining & data warehousing
Data mining & data warehousingData mining & data warehousing
Data mining & data warehousing
 
Data Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data VisualisationData Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data Visualisation
 
IT6601 MOBILE COMPUTING
IT6601 MOBILE COMPUTINGIT6601 MOBILE COMPUTING
IT6601 MOBILE COMPUTING
 
Elastic Web Mining
Elastic Web MiningElastic Web Mining
Elastic Web Mining
 
Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data Warehousing
 

Similaire à Data mining

Data-Mining-ppt (1).pptx
Data-Mining-ppt (1).pptxData-Mining-ppt (1).pptx
Data-Mining-ppt (1).pptxParvathyparu25
 
Data-Mining-ppt.pptx
Data-Mining-ppt.pptxData-Mining-ppt.pptx
Data-Mining-ppt.pptxayush309565
 
Big data-analytics-changing-way-organizations-conducting-business
Big data-analytics-changing-way-organizations-conducting-businessBig data-analytics-changing-way-organizations-conducting-business
Big data-analytics-changing-way-organizations-conducting-businessAmit Bhargava
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data MiningIOSR Journals
 
Big Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and IssuesBig Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and IssuesKaran Deep Singh
 
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTION
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTIONETHICAL ISSUES WITH CUSTOMER DATA COLLECTION
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTIONPranav Godse
 
Data mining by_ashok
Data mining by_ashokData mining by_ashok
Data mining by_ashokAshok Kumar
 
The future of big data analytics
The future of big data analyticsThe future of big data analytics
The future of big data analyticsAhmed Banafa
 
A Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining PresentationA Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining Presentationmillerca2
 
Data Mining and Knowledge Discovery in Business Databases
Data Mining and Knowledge Discovery in Business DatabasesData Mining and Knowledge Discovery in Business Databases
Data Mining and Knowledge Discovery in Business Databasesbutest
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big dataHari Priya
 
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...Onyebuchi nosiri
 
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...Onyebuchi nosiri
 
Social network architecture - Part 3. Big data - Machine learning
Social network architecture - Part 3. Big data - Machine learningSocial network architecture - Part 3. Big data - Machine learning
Social network architecture - Part 3. Big data - Machine learningPhu Luong Trong
 

Similaire à Data mining (20)

Data Mining
Data MiningData Mining
Data Mining
 
data.2.pptx
data.2.pptxdata.2.pptx
data.2.pptx
 
Data-Mining-ppt (1).pptx
Data-Mining-ppt (1).pptxData-Mining-ppt (1).pptx
Data-Mining-ppt (1).pptx
 
Data-Mining-ppt.pptx
Data-Mining-ppt.pptxData-Mining-ppt.pptx
Data-Mining-ppt.pptx
 
Big data-analytics-changing-way-organizations-conducting-business
Big data-analytics-changing-way-organizations-conducting-businessBig data-analytics-changing-way-organizations-conducting-business
Big data-analytics-changing-way-organizations-conducting-business
 
Data mining
Data miningData mining
Data mining
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data Mining
 
Big Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and IssuesBig Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and Issues
 
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTION
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTIONETHICAL ISSUES WITH CUSTOMER DATA COLLECTION
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTION
 
Data mining by_ashok
Data mining by_ashokData mining by_ashok
Data mining by_ashok
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
The future of big data analytics
The future of big data analyticsThe future of big data analytics
The future of big data analytics
 
A Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining PresentationA Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining Presentation
 
Data Mining and Knowledge Discovery in Business Databases
Data Mining and Knowledge Discovery in Business DatabasesData Mining and Knowledge Discovery in Business Databases
Data Mining and Knowledge Discovery in Business Databases
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
 
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
 
An introduction to data mining
An introduction to data miningAn introduction to data mining
An introduction to data mining
 
Social network architecture - Part 3. Big data - Machine learning
Social network architecture - Part 3. Big data - Machine learningSocial network architecture - Part 3. Big data - Machine learning
Social network architecture - Part 3. Big data - Machine learning
 

Dernier

Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesVijayaLaxmi84
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptxmary850239
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptxmary850239
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6Vanessa Camilleri
 

Dernier (20)

prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their uses
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Development
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6
 

Data mining

  • 1. Data Mining Chris Nelson CS 157 A Fall 2007
  • 2. Data Mining  New buzzword, old idea.  Inferring new information from already collected data.  Traditionally job of Data Analysts  Computers have changed this. Far more efficient to comb through data using a machine than eyeballing statistical data.
  • 3. Data Mining – Two Main Components  Wikipedia definition: “Data mining is the entire process of applying computer-based methodology, including new techniques for knowledge discovery, from data.”  Knowledge Discovery Concrete information gleaned from known data. Data you may not have known, but which is supported by recorded facts. (ie: Diapers and beer example from previous presentation)  Knowledge Prediction Uses known data to forecast future trends, events, etc. (ie: Stock market predictions)  Wikipedia note: "some data mining systems such as neural networks are inherently geared towards prediction and pattern recognition, rather than knowledge discovery.“ These include applications in AI and Symbol analysis
  • 4. Data Mining vs. Data Analysis  In terms of software and the marketing thereof Data Mining != Data Analysis  Data Mining implies software uses some intelligence over simple grouping and partitioning of data to infer new information.  Data Analysis is more in line with standard statistical software (ie: web stats). These usually present information about subsets and relations within the recorded data set (ie: browser/search engine usage, average visit time, etc. )
  • 5. Data Mining Subtypes  Data Dredging The process of scanning a data set for relations and then coming up with a hypothesis for existence of those relations.  MetaData Data that describes other data. Can describe an individual element, or a collection of elements. Wikipedia example: “In a library, where the data is the content of the titles stocked, metadata about a title would typically include a description of the content, the author, the publication date and the physical location”  Applications for Data Dredging in business include Market and Risk Analysis, as well as trading strategies.  Applications for Science include disaster prediction.
  • 6. Propositional vs. Relational Data  Old data mining methods relied on Propositional Data, or data that was related to a single, central element, that could be represented in a vector format. (ie: the purchasing history of a single user. Amazon uses such vectors in its related item suggestions [a multidimensional dot product])  Current, advanced data mining methods rely on Relational Data, or data that can be stored and modeled easily through use of relational databases. An example of this would be data used to represent interpersonal relations.  Relational Data is more interesting than Propositional data to miners in the sense that an entity, and all the entities to which it is related, factor into the data inference process.
  • 7. Key Component of Data Mining  Whether Knowledge Discovery or Knowledge Prediction, data mining takes information that was once quite difficult to detect and presents it in an easily understandable format (ie: graphical or statistical)  Data mining Techniques involve sophisticated algorithms, including Decision Tree Classifications, Association detection, and Clustering.  Since Data mining is not on test, I will keep things superficial.
  • 8. Uses of Data Mining  AI/Machine Learning Combinatorial/Game Data Mining Good for analyzing winning strategies to games, and thus developing intelligent AI opponents. (ie: Chess)  Business Strategies Market Basket Analysis Identify customer demographics, preferences, and purchasing patterns.  Risk Analysis Product Defect Analysis Analyze product defect rates for given plants and predict possible complications (read: lawsuits) down the line.
  • 9. Uses of Data Mining (Continued)  User Behavior Validation Fraud Detection In the realm of cell phones Comparing phone activity to calling records. Can help detect calls made on cloned phones. Similarly, with credit cards, comparing purchases with historical purchases. Can detect activity with stolen cards.
  • 10. Uses of Data Mining (Continued)  Health and Science Protein Folding Predicting protein interactions and functionality within biological cells. Applications of this research include determining causes and possible cures for Alzheimers, Parkinson's, and some cancers (caused by protein "misfolds") Extra-Terrestrial Intelligence Scanning Satellite receptions for possible transmissions from other planets.  For more information see Stanford’s Folding@home and SETI@home projects. Both involve participation in a widely distributed computer application.
  • 11. Sources of Data for Mining  Databases (most obvious)  Text Documents  Computer Simulations  Social Networks
  • 12. Privacy Concerns  Mining of public and government databases is done, though people have, and continue to raise concerns.  Wiki quote: "data mining gives information that would not be available otherwise. It must be properly interpreted to be useful. When the data collected involves individual people, there are many questions concerning privacy, legality, and ethics."
  • 13. Prevalence of Data Mining  Your data is already being mined, whether you like it or not.  Many web services require that you allow access to your information [for data mining] in order to use the service.  Google mines email data in Gmail accounts to present account owners with ads.  Facebook requires users to allow access to info from non-Facebook pages. Facebook privacy policy: "We may use information about you that we collect from other sources, including but not limited to newspapers and Internet sources such as blogs, instant messaging services and other users of Facebook, to supplement your profile.  This allows access to your blog RSS feed (rather innocuous), as well as information obtained through partner sites (worthy of concern).
  • 14. Data Mining Controversies  Latest one: Facebook's Beacon Advertising program (Just popped on Slashdot within the last week)  What Beacon does: “when you engage in consumer activity at a [Facebook] partner website, such as Amazon, eBay, or the New York Times, not only will Facebook record that activity, but your Facebook connections will also be informed of your purchases or actions.” [taken from http://trickytrickywhiteboy.blogspot.com/2007/11/be ware-of-facebooks-beacon.html]
  • 15. Controversies continued  Implications: "Thus where Facebook used to be collecting data only within the confines of its own website, it will now extend that ability to harvest data across other websites that it partners with. Some of the companies that have signed on to participate on the advertising side include Coca-Cola, Sony, Verizon, Comcast, Ebay — and the CBC. The initial list of 44 partner websites participating on the data collection side include the New York Times, Blockbuster, Amazon, eBay, LiveJournal, and Epicurious.” [Remember the privacy policy on the previous slide]  Verdict is still out. This may violate an old (100+ years) New York law prohibiting advertising using endorsements without the endorsee’s consent.  Facebook currently offers users no way to opt out of Beacon (once it has been activated ?). Users can close the accounts, but account data is never deleted.
  • 16. Bottom Line  Data obtained through Data Mining is incredibly valuable  Companies are understandably reluctant to give up data they have obtained.  Expect to see prevalence of Data Mining and (possibly subversive) methods increase in years to come.
  • 17. Recommended Resources and Works Consulted  Wikipedia Data Mining entry http://en.wikipedia.org/wiki/Data_mining  "Privacy is Dead - Get Over It: Revisited" Steve Rambam's Hope Number Six lecture http://www.hopenumbersix.net/speakers.html#pid2  Facebook's Faux Pas http://www.newsweek.com/id/69275  Beware of Facebook’s Beacon http://trickytrickywhiteboy.blogspot.com/2007/11/beware-of-facebooks-beacon.html  Facebook Data Mining guide http://saunderslog.com/2007/11/25/facebook-market-research-secrets/  Data Mining in Social Networks http://kdl.cs.umass.edu/papers/jensen-neville-nas2002.pdf