SlideShare une entreprise Scribd logo
1  sur  16
Data Mining
Dr. Kamal Gulati
Data Mining
 New buzzword, old idea.
 Inferring new information from already
collected data.
 Traditionally job of Data Analysts
 Computers have changed this.
Far more efficient to comb through data using
a machine than eyeballing statistical data.
Data Mining – Two Main Components
 Wikipedia definition: “Data mining is the entire process of applying
computer-based methodology, including new techniques for knowledge
discovery, from data.”
 Knowledge Discovery
Concrete information gleaned from known data. Data you may not have
known, but which is supported by recorded facts.
(ie: Diapers and beer example from previous presentation)
 Knowledge Prediction
Uses known data to forecast future trends, events, etc. (ie: Stock market
predictions)
 Wikipedia note: "some data mining systems such as neural networks are
inherently geared towards prediction and pattern recognition, rather than
knowledge discovery.“ These include applications in AI and Symbol
analysis
Data Mining vs. Data Analysis
 In terms of software and the marketing thereof
Data Mining != Data Analysis
 Data Mining implies software uses some intelligence
over simple grouping and partitioning of data to
infer new information.
 Data Analysis is more in line with standard statistical
software (ie: web stats). These usually present
information about subsets and relations within the
recorded data set (ie: browser/search engine usage,
average visit time, etc. )
Data Mining Subtypes
 Data Dredging
The process of scanning a data set for relations and then
coming up with a hypothesis for existence of those relations.
 MetaData
Data that describes other data. Can describe an individual
element, or a collection of elements.
Wikipedia example: “In a library, where the data is the
content of the titles stocked, metadata about a title would
typically include a description of the content, the author, the
publication date and the physical location”
 Applications for Data Dredging in business include Market
and Risk Analysis, as well as trading strategies.
 Applications for Science include disaster prediction.
Propositional vs. Relational Data
 Old data mining methods relied on Propositional Data, or data
that was related to a single, central element, that could be
represented in a vector format. (ie: the purchasing history of a
single user. Amazon uses such vectors in its related item
suggestions [a multidimensional dot product])
 Current, advanced data mining methods rely on Relational
Data, or data that can be stored and modeled easily through
use of relational databases. An example of this would be data
used to represent interpersonal relations.
 Relational Data is more interesting than Propositional data to
miners in the sense that an entity, and all the entities to which
it is related, factor into the data inference process.
Key Component of Data Mining
 Whether Knowledge Discovery or Knowledge
Prediction, data mining takes information that was
once quite difficult to detect and presents it in an
easily understandable format (ie: graphical or
statistical)
 Data mining Techniques involve sophisticated
algorithms, including Decision Tree Classifications,
Association detection, and Clustering.
 Since Data mining is not on test, I will keep things
superficial.
Uses of Data Mining
 AI/Machine Learning
Combinatorial/Game Data Mining
Good for analyzing winning strategies to games, and thus
developing intelligent AI opponents. (ie: Chess)
 Business Strategies
Market Basket Analysis
Identify customer demographics, preferences, and purchasing
patterns.
 Risk Analysis
Product Defect Analysis
Analyze product defect rates for given plants and predict
possible complications (read: lawsuits) down the line.
Uses of Data Mining (Continued)
 User Behavior Validation
Fraud Detection
In the realm of cell phones
Comparing phone activity to calling records.
Can help detect calls made on cloned phones.
Similarly, with credit cards, comparing
purchases with historical purchases. Can
detect activity with stolen cards.
Uses of Data Mining (Continued)
 Health and Science
Protein Folding
Predicting protein interactions and functionality within
biological cells. Applications of this research include
determining causes and possible cures for Alzheimers,
Parkinson's, and some cancers (caused by protein "misfolds")
Extra-Terrestrial Intelligence
Scanning Satellite receptions for possible transmissions from
other planets.
 For more information see Stanford’s Folding@home and
SETI@home projects. Both involve participation in a widely
distributed computer application.
Sources of Data for Mining
 Databases (most obvious)
 Text Documents
 Computer Simulations
 Social Networks
Privacy Concerns
 Mining of public and government databases is done,
though people have, and continue to raise concerns.
 Wiki quote:
"data mining gives information that would not be
available otherwise. It must be properly interpreted
to be useful. When the data collected involves
individual people, there are many questions
concerning privacy, legality, and ethics."
Prevalence of Data Mining
 Your data is already being mined, whether you like it or not.
 Many web services require that you allow access to your information [for
data mining] in order to use the service.
 Google mines email data in Gmail accounts to present account owners
with ads.
 Facebook requires users to allow access to info from non-Facebook pages.
Facebook privacy policy:
"We may use information about you that we collect from other sources,
including but not limited to newspapers and Internet sources such as
blogs, instant messaging services and other users of Facebook, to
supplement your profile.
 This allows access to your blog RSS feed (rather innocuous), as well as
information obtained through partner sites (worthy of concern).
Data Mining Controversies
 Latest one: Facebook's Beacon Advertising program
(Just popped on Slashdot within the last week)
 What Beacon does:
“when you engage in consumer activity at a
[Facebook] partner website, such as Amazon, eBay,
or the New York Times, not only will Facebook
record that activity, but your Facebook connections
will also be informed of your purchases or actions.”
[taken from
http://trickytrickywhiteboy.blogspot.com/2007/11/be
ware-of-facebooks-beacon.html]
Controversies continued
 Implications: "Thus where Facebook used to be collecting data only
within the confines of its own website, it will now extend that ability to
harvest data across other websites that it partners with. Some of the
companies that have signed on to participate on the advertising side
include Coca-Cola, Sony, Verizon, Comcast, Ebay — and the CBC. The
initial list of 44 partner websites participating on the data collection side
include the New York Times, Blockbuster, Amazon, eBay, LiveJournal,
and Epicurious.”
[Remember the privacy policy on the previous slide]
 Verdict is still out. This may violate an old (100+ years) New York law
prohibiting advertising using endorsements without the endorsee’s
consent.
 Facebook currently offers users no way to opt out of Beacon (once it has
been activated ?). Users can close the accounts, but account data is never
deleted.
Bottom Line
 Data obtained through Data Mining is
incredibly valuable
 Companies are understandably reluctant to
give up data they have obtained.
 Expect to see prevalence of Data Mining and
(possibly subversive) methods increase in
years to come.

Contenu connexe

Tendances

Data mining slides
Data mining slidesData mining slides
Data mining slidessmj
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and workAmr Abd El Latief
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classificationKrish_ver2
 
Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Usama Fayyaz
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingankur bhalla
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning Gopal Sakarkar
 
Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data miningEr. Nawaraj Bhandari
 
Big data and data science overview
Big data and data science overviewBig data and data science overview
Big data and data science overviewColleen Farrelly
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introductionBasma Gamal
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataSalah Amean
 
Data mining
Data mining Data mining
Data mining AthiraR23
 

Tendances (20)

Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and work
 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
 
Data Mining
Data MiningData Mining
Data Mining
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
 
2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
 
Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
hierarchical methods
hierarchical methodshierarchical methods
hierarchical methods
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data mining
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
 
Data mining
Data miningData mining
Data mining
 
Big data and data science overview
Big data and data science overviewBig data and data science overview
Big data and data science overview
 
Data mining primitives
Data mining primitivesData mining primitives
Data mining primitives
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introduction
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, data
 
Data mining
Data mining Data mining
Data mining
 

Similaire à Data Mining

Data-Mining-ppt (1).pptx
Data-Mining-ppt (1).pptxData-Mining-ppt (1).pptx
Data-Mining-ppt (1).pptxParvathyparu25
 
Data-Mining-ppt.pptx
Data-Mining-ppt.pptxData-Mining-ppt.pptx
Data-Mining-ppt.pptxayush309565
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data MiningIOSR Journals
 
The future of big data analytics
The future of big data analyticsThe future of big data analytics
The future of big data analyticsAhmed Banafa
 
Big Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and IssuesBig Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and IssuesKaran Deep Singh
 
Big data-analytics-changing-way-organizations-conducting-business
Big data-analytics-changing-way-organizations-conducting-businessBig data-analytics-changing-way-organizations-conducting-business
Big data-analytics-changing-way-organizations-conducting-businessAmit Bhargava
 
Big Data Analytics and its Application in E-Commerce
Big Data Analytics and its Application in E-CommerceBig Data Analytics and its Application in E-Commerce
Big Data Analytics and its Application in E-CommerceUyoyo Edosio
 
A Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining PresentationA Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining Presentationmillerca2
 
Data-Mining-ppt (1).pdf
Data-Mining-ppt (1).pdfData-Mining-ppt (1).pdf
Data-Mining-ppt (1).pdfParvathyparu25
 
Data Mining and Knowledge Discovery in Business Databases
Data Mining and Knowledge Discovery in Business DatabasesData Mining and Knowledge Discovery in Business Databases
Data Mining and Knowledge Discovery in Business Databasesbutest
 
Data mining by_ashok
Data mining by_ashokData mining by_ashok
Data mining by_ashokAshok Kumar
 
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTION
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTIONETHICAL ISSUES WITH CUSTOMER DATA COLLECTION
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTIONPranav Godse
 

Similaire à Data Mining (20)

Data mining
Data miningData mining
Data mining
 
data.2.pptx
data.2.pptxdata.2.pptx
data.2.pptx
 
Data-Mining-ppt (1).pptx
Data-Mining-ppt (1).pptxData-Mining-ppt (1).pptx
Data-Mining-ppt (1).pptx
 
Data-Mining-ppt.pptx
Data-Mining-ppt.pptxData-Mining-ppt.pptx
Data-Mining-ppt.pptx
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data Mining
 
The future of big data analytics
The future of big data analyticsThe future of big data analytics
The future of big data analytics
 
Big Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and IssuesBig Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and Issues
 
Data mining
Data miningData mining
Data mining
 
Big data-analytics-changing-way-organizations-conducting-business
Big data-analytics-changing-way-organizations-conducting-businessBig data-analytics-changing-way-organizations-conducting-business
Big data-analytics-changing-way-organizations-conducting-business
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
Big Data Analytics and its Application in E-Commerce
Big Data Analytics and its Application in E-CommerceBig Data Analytics and its Application in E-Commerce
Big Data Analytics and its Application in E-Commerce
 
A Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining PresentationA Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining Presentation
 
Data-Mining-ppt (1).pdf
Data-Mining-ppt (1).pdfData-Mining-ppt (1).pdf
Data-Mining-ppt (1).pdf
 
Social Data Mining
Social Data MiningSocial Data Mining
Social Data Mining
 
Data Mining and Knowledge Discovery in Business Databases
Data Mining and Knowledge Discovery in Business DatabasesData Mining and Knowledge Discovery in Business Databases
Data Mining and Knowledge Discovery in Business Databases
 
Data mining by_ashok
Data mining by_ashokData mining by_ashok
Data mining by_ashok
 
Big Data: 8 facts and 8 fictions
Big Data: 8 facts and 8 fictionsBig Data: 8 facts and 8 fictions
Big Data: 8 facts and 8 fictions
 
Big data
Big data Big data
Big data
 
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTION
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTIONETHICAL ISSUES WITH CUSTOMER DATA COLLECTION
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTION
 

Plus de Amity University | FMS - DU | IMT | Stratford University | KKMI International Institute | AIMA | DTU

Plus de Amity University | FMS - DU | IMT | Stratford University | KKMI International Institute | AIMA | DTU (20)

All About DBMS - Interview Question and Answers
All About DBMS - Interview Question and AnswersAll About DBMS - Interview Question and Answers
All About DBMS - Interview Question and Answers
 
Concept of Governance - Management of Operational Risk for IT Officers/Execut...
Concept of Governance - Management of Operational Risk for IT Officers/Execut...Concept of Governance - Management of Operational Risk for IT Officers/Execut...
Concept of Governance - Management of Operational Risk for IT Officers/Execut...
 
Emerging Technologies in IT
Emerging Technologies in ITEmerging Technologies in IT
Emerging Technologies in IT
 
Introduction to DBMS - Notes in Layman...
Introduction to DBMS - Notes in Layman...Introduction to DBMS - Notes in Layman...
Introduction to DBMS - Notes in Layman...
 
Fundamentals of DBMS
Fundamentals of DBMSFundamentals of DBMS
Fundamentals of DBMS
 
CASE (Computer Aided Software Design)
CASE (Computer Aided Software Design)CASE (Computer Aided Software Design)
CASE (Computer Aided Software Design)
 
SOFTWARE RELIABILITY AND QUALITY ASSURANCE
SOFTWARE RELIABILITY AND QUALITY ASSURANCESOFTWARE RELIABILITY AND QUALITY ASSURANCE
SOFTWARE RELIABILITY AND QUALITY ASSURANCE
 
Software Testing (Contd..) SDLC Model
Software Testing (Contd..) SDLC ModelSoftware Testing (Contd..) SDLC Model
Software Testing (Contd..) SDLC Model
 
Software Testing - SDLC Model
Software Testing - SDLC ModelSoftware Testing - SDLC Model
Software Testing - SDLC Model
 
Coding - SDLC Model
Coding - SDLC ModelCoding - SDLC Model
Coding - SDLC Model
 
Software Design - SDLC Model
Software Design - SDLC ModelSoftware Design - SDLC Model
Software Design - SDLC Model
 
Models of SDLC (Contd..) & Feasibility Study
Models of SDLC (Contd..)  & Feasibility StudyModels of SDLC (Contd..)  & Feasibility Study
Models of SDLC (Contd..) & Feasibility Study
 
Models of SDLC (Software Development Life Cycle / Program Development Life Cy...
Models of SDLC (Software Development Life Cycle / Program Development Life Cy...Models of SDLC (Software Development Life Cycle / Program Development Life Cy...
Models of SDLC (Software Development Life Cycle / Program Development Life Cy...
 
Introduction to Software Engineering
Introduction to Software EngineeringIntroduction to Software Engineering
Introduction to Software Engineering
 
CLOUD SECURITY IN INSURANCE INDUSTRY WITH RESPECT TO INDIAN MARKET
CLOUD SECURITY IN INSURANCE INDUSTRY WITH RESPECT TO INDIAN MARKETCLOUD SECURITY IN INSURANCE INDUSTRY WITH RESPECT TO INDIAN MARKET
CLOUD SECURITY IN INSURANCE INDUSTRY WITH RESPECT TO INDIAN MARKET
 
Application Software
Application SoftwareApplication Software
Application Software
 
Application Software – Horizontal & Vertical Software
Application Software – Horizontal & Vertical SoftwareApplication Software – Horizontal & Vertical Software
Application Software – Horizontal & Vertical Software
 
Software: Systems and Application Software
Software:  Systems and Application SoftwareSoftware:  Systems and Application Software
Software: Systems and Application Software
 
Programming Languages
Programming LanguagesProgramming Languages
Programming Languages
 
Number Codes and Registers
Number Codes and RegistersNumber Codes and Registers
Number Codes and Registers
 

Dernier

How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 

Dernier (20)

How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 

Data Mining

  • 2. Data Mining  New buzzword, old idea.  Inferring new information from already collected data.  Traditionally job of Data Analysts  Computers have changed this. Far more efficient to comb through data using a machine than eyeballing statistical data.
  • 3. Data Mining – Two Main Components  Wikipedia definition: “Data mining is the entire process of applying computer-based methodology, including new techniques for knowledge discovery, from data.”  Knowledge Discovery Concrete information gleaned from known data. Data you may not have known, but which is supported by recorded facts. (ie: Diapers and beer example from previous presentation)  Knowledge Prediction Uses known data to forecast future trends, events, etc. (ie: Stock market predictions)  Wikipedia note: "some data mining systems such as neural networks are inherently geared towards prediction and pattern recognition, rather than knowledge discovery.“ These include applications in AI and Symbol analysis
  • 4. Data Mining vs. Data Analysis  In terms of software and the marketing thereof Data Mining != Data Analysis  Data Mining implies software uses some intelligence over simple grouping and partitioning of data to infer new information.  Data Analysis is more in line with standard statistical software (ie: web stats). These usually present information about subsets and relations within the recorded data set (ie: browser/search engine usage, average visit time, etc. )
  • 5. Data Mining Subtypes  Data Dredging The process of scanning a data set for relations and then coming up with a hypothesis for existence of those relations.  MetaData Data that describes other data. Can describe an individual element, or a collection of elements. Wikipedia example: “In a library, where the data is the content of the titles stocked, metadata about a title would typically include a description of the content, the author, the publication date and the physical location”  Applications for Data Dredging in business include Market and Risk Analysis, as well as trading strategies.  Applications for Science include disaster prediction.
  • 6. Propositional vs. Relational Data  Old data mining methods relied on Propositional Data, or data that was related to a single, central element, that could be represented in a vector format. (ie: the purchasing history of a single user. Amazon uses such vectors in its related item suggestions [a multidimensional dot product])  Current, advanced data mining methods rely on Relational Data, or data that can be stored and modeled easily through use of relational databases. An example of this would be data used to represent interpersonal relations.  Relational Data is more interesting than Propositional data to miners in the sense that an entity, and all the entities to which it is related, factor into the data inference process.
  • 7. Key Component of Data Mining  Whether Knowledge Discovery or Knowledge Prediction, data mining takes information that was once quite difficult to detect and presents it in an easily understandable format (ie: graphical or statistical)  Data mining Techniques involve sophisticated algorithms, including Decision Tree Classifications, Association detection, and Clustering.  Since Data mining is not on test, I will keep things superficial.
  • 8. Uses of Data Mining  AI/Machine Learning Combinatorial/Game Data Mining Good for analyzing winning strategies to games, and thus developing intelligent AI opponents. (ie: Chess)  Business Strategies Market Basket Analysis Identify customer demographics, preferences, and purchasing patterns.  Risk Analysis Product Defect Analysis Analyze product defect rates for given plants and predict possible complications (read: lawsuits) down the line.
  • 9. Uses of Data Mining (Continued)  User Behavior Validation Fraud Detection In the realm of cell phones Comparing phone activity to calling records. Can help detect calls made on cloned phones. Similarly, with credit cards, comparing purchases with historical purchases. Can detect activity with stolen cards.
  • 10. Uses of Data Mining (Continued)  Health and Science Protein Folding Predicting protein interactions and functionality within biological cells. Applications of this research include determining causes and possible cures for Alzheimers, Parkinson's, and some cancers (caused by protein "misfolds") Extra-Terrestrial Intelligence Scanning Satellite receptions for possible transmissions from other planets.  For more information see Stanford’s Folding@home and SETI@home projects. Both involve participation in a widely distributed computer application.
  • 11. Sources of Data for Mining  Databases (most obvious)  Text Documents  Computer Simulations  Social Networks
  • 12. Privacy Concerns  Mining of public and government databases is done, though people have, and continue to raise concerns.  Wiki quote: "data mining gives information that would not be available otherwise. It must be properly interpreted to be useful. When the data collected involves individual people, there are many questions concerning privacy, legality, and ethics."
  • 13. Prevalence of Data Mining  Your data is already being mined, whether you like it or not.  Many web services require that you allow access to your information [for data mining] in order to use the service.  Google mines email data in Gmail accounts to present account owners with ads.  Facebook requires users to allow access to info from non-Facebook pages. Facebook privacy policy: "We may use information about you that we collect from other sources, including but not limited to newspapers and Internet sources such as blogs, instant messaging services and other users of Facebook, to supplement your profile.  This allows access to your blog RSS feed (rather innocuous), as well as information obtained through partner sites (worthy of concern).
  • 14. Data Mining Controversies  Latest one: Facebook's Beacon Advertising program (Just popped on Slashdot within the last week)  What Beacon does: “when you engage in consumer activity at a [Facebook] partner website, such as Amazon, eBay, or the New York Times, not only will Facebook record that activity, but your Facebook connections will also be informed of your purchases or actions.” [taken from http://trickytrickywhiteboy.blogspot.com/2007/11/be ware-of-facebooks-beacon.html]
  • 15. Controversies continued  Implications: "Thus where Facebook used to be collecting data only within the confines of its own website, it will now extend that ability to harvest data across other websites that it partners with. Some of the companies that have signed on to participate on the advertising side include Coca-Cola, Sony, Verizon, Comcast, Ebay — and the CBC. The initial list of 44 partner websites participating on the data collection side include the New York Times, Blockbuster, Amazon, eBay, LiveJournal, and Epicurious.” [Remember the privacy policy on the previous slide]  Verdict is still out. This may violate an old (100+ years) New York law prohibiting advertising using endorsements without the endorsee’s consent.  Facebook currently offers users no way to opt out of Beacon (once it has been activated ?). Users can close the accounts, but account data is never deleted.
  • 16. Bottom Line  Data obtained through Data Mining is incredibly valuable  Companies are understandably reluctant to give up data they have obtained.  Expect to see prevalence of Data Mining and (possibly subversive) methods increase in years to come.