SlideShare une entreprise Scribd logo
1  sur  17
Top (10) challenging problems
in data mining
Supervised by:
Dr. Ali Haroun
Prepared by :
Ahmed Ramzi Rashid
Ahmed Sedeeq Baker
Master
2017-3-11
Suggestions
Outlines :
2
Introduction
Top 10 challenging Problems in data mining
Conclusions
Introduction (1-1) :
Data mining is sorting through data to
identify patterns and establish relat-
ionships.
Data mining parameters include :
- Association;
- Sequence or path analysis;
- Classification;
- Clustering;
- Forecasting.
Introduction (1-2) :
4
Data is
Very
complex
So we have top 10 challenging
Problems in data mining
There is a
different
Way to extract
The information
A huge
amount
of data
Data is
power
Many
algorithms
- Top 10 challenging Problems in data mining (DM) :
1- Developing a Unifying Theory of Data Mining :
The developers could not have a structure that contains
the different datamining algorithms .
Knowledge
To be
verified
Types of dataset Selection criterion Unified (DM) process
Numeric
Categorical
Multimedia
Text
Akaike
information
criterion
Clustering
Classification
Association
- Top 10 challenging Problems in data mining (DM) :
2- Scaling Up for High Dimensional Data and High
Speed Data Streams :
The problem begins
when the data becomes
huge and complex
we need ultra-high
dimensional
classification
problems
(millions or billions
of features )
Rather than
we need
Ultra-high
speed data
stream
• In this problem we
want to see how to
efficiently and predict
the direction of these
data .
• In any design we must
take care of this three
master steps:
7
Practical
design
Predictor
Information
Learner
(1) QIANG YANG ,10 Challenging problems in data mining research , International Journal of Information Technology & Decision
Making , Vol. 5, No. 4 (2006) 597–604 .
- Top 10 challenging Problems in data mining (DM) :
3- Mining Sequence Data and Time Series Data :
• We have complex knowledge when we have mining data
from multiple relation.
• In most domains, the object of interest are not
independent of each other.
• The objects are not of a single type.
8
HTML has a tree structure
(nested tags)
Text has a list structure
(sequence of words)
Hyperlinks graph structure
(Linked pages)
Example
domains
Worldwide
Web
(1) Jarosław Stepaniuk , Rough – Granular in Knowledge Discovery and Data Mining , Volume 152 of the series , pp 99-110 .
- Top 10 challenging Problems in data mining (DM) :
4. Mining Complex Knowledge from Complex Data :
5.1 : Community and social
networks :
• when we say community we must
take important topics that are
mining of social networks .
• The challenging to identify the
problem is :
 It’s critical .
 Distributed .
 Snapshot .
9
5.2 : Mining in and for computer
networks — high-speed mining
of high-speed streams :
• This part studies how to provide
a Good algorithm are and how
to detecte an attack .
• DoS (Denial of Service) how to
detected it and how to
discriminate .
We will discuss two part in this problem:
(1) Qiang Yang, Hong Kong , 10 Challenging Problems in Data Mining Research , ICDM 2005 , pp 8.
- Top 10 challenging Problems in data mining (DM) :
5. Data Mining in a Network Setting :
• Need to correlate the data
seen at the various probes
(such as in a sensor
network).
• The important problem is
how to mine across
multiple heterogeneous
data sources.
• The goal is to minimize the
amount of data shipped
between the various sites,
by combining data mining
with game theory.
10
(1) Rao , Dr. S Vidyavathi , Distributed data mining and mining multi – agent data , International Journal on Computer Science and
Engineering ,Vol. 02, No. 04, 2010, 1237-1244 .
- Top 10 challenging Problems in data mining (DM) :
6. Distributed Data Mining and Mining Multi-Agent
Data :
11
• The world today is “resource-driven”.
• So how we could have a best understand and
hence utilize about our environment .
• The researchers try to solve these problems :
- Bioinformatics . - Spatial data .
- Earthquakes . - Land slide .
- Biological sequence . - Cancer prediction .
() Pooja Shrivastava & Dr. Manoj Shukla , A Brief Survey On Data mining For Biological and Environmental Problems , International
Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 , pp630-631 .
- Top 10 challenging Problems in data mining (DM) :
7. Data Mining for Biological and Environmental
Problems :
Data
cleaning
• how to merge visual
interactive and
automatic (DM)
techniques together.
12
• how to perform
systematic
documentation of
data cleaning .
• Help users to avoid
mistakes in (DM).
• Create a methodology
in (DM) .
() QiangYang , 10 Challenging Problems in Data Mining Research , ICDM 2005 , pp 11 .
- Top 10 challenging Problems in data mining (DM) :
8. Data Mining Process-Related Problems :
Automate
(DM)
operations
Combine
techniques
13
Knowledge integrity challenges
Knowledge integrity challenges
The challenges facing researchers
Data are being mined
Develop efficient algorithm to
compare (before & after) knowledge
contents .
Not just evaluates the knowledge integrity
But also measures to evaluate the
knowledge integrity of individual patterns.
How to mined the data with
Ensure the user’s privacy
Develop algorithms for estimating
the impact of the data.
() QIANG YANG , 10 CHALLENGING PROBLEMS IN DATA MINING RESEARCH , International Journal of Information
Technology & Decision Making Vol. 5, No. 4 (2006) , pp603.
- Top 10 challenging Problems in data mining (DM) :
9. Security, Privacy, and Data Integrity :
14
Sampling
Correct the
bias
Deal with
special data
Sampling and model
building are not optimal .
Here is the problem that how
to correct the bias as we can.
Deal with unbalanced and
cost – sensitive data .
Obtaining
these costs
relied on
sampling
method .
() QIANG YANG , 10 CHALLENGING PROBLEMS IN DATA MINING RESEARCH , International Journal of Information
Technology & Decision Making Vol. 5, No. 4 (2006) , pp 603-604 .
- Top 10 challenging Problems in data mining (DM) :
10. Dealing with Non-Static, Unbalanced and Cost-
Sensitive Data:
Conclusions :
• The presentation highlights on the
most important 10 problems in data
mining but in concise manner .
• The order of the sequence list does
not reflect their level of important .
15
• We must try to work hard to overcome these problems ,
because nowadays the one who owns the information
he has the power .
16
Suggestions :
17

Contenu connexe

Tendances

CHAPTER 6 REQUIREMENTS MODELING: SCENARIO based Model , Class based moddel
CHAPTER 6 REQUIREMENTS MODELING: SCENARIO based Model , Class based moddelCHAPTER 6 REQUIREMENTS MODELING: SCENARIO based Model , Class based moddel
CHAPTER 6 REQUIREMENTS MODELING: SCENARIO based Model , Class based moddelmohamed khalaf alla mohamedain
 
Integrity Constraints
Integrity ConstraintsIntegrity Constraints
Integrity Constraintsmadhav bansal
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data WarehouseShanthi Mukkavilli
 
Dbms Introduction and Basics
Dbms Introduction and BasicsDbms Introduction and Basics
Dbms Introduction and BasicsSHIKHA GAUTAM
 
1. Introduction to DBMS
1. Introduction to DBMS1. Introduction to DBMS
1. Introduction to DBMSkoolkampus
 
Complete dbms notes
Complete dbms notesComplete dbms notes
Complete dbms notesTanya Makkar
 
Integrity Constraints
Integrity ConstraintsIntegrity Constraints
Integrity ConstraintsMegha yadav
 
Multidimensional Database Design & Architecture
Multidimensional Database Design & ArchitectureMultidimensional Database Design & Architecture
Multidimensional Database Design & Architecturehasanshan
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLRamakant Soni
 
Web Mining & Text Mining
Web Mining & Text MiningWeb Mining & Text Mining
Web Mining & Text MiningHemant Sharma
 
multi dimensional data model
multi dimensional data modelmulti dimensional data model
multi dimensional data modelmoni sindhu
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olapData Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olap
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olapSalah Amean
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data miningYashwant Rautela
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data miningDataminingTools Inc
 

Tendances (20)

CHAPTER 6 REQUIREMENTS MODELING: SCENARIO based Model , Class based moddel
CHAPTER 6 REQUIREMENTS MODELING: SCENARIO based Model , Class based moddelCHAPTER 6 REQUIREMENTS MODELING: SCENARIO based Model , Class based moddel
CHAPTER 6 REQUIREMENTS MODELING: SCENARIO based Model , Class based moddel
 
Data mining
Data miningData mining
Data mining
 
Integrity Constraints
Integrity ConstraintsIntegrity Constraints
Integrity Constraints
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
Dbms Introduction and Basics
Dbms Introduction and BasicsDbms Introduction and Basics
Dbms Introduction and Basics
 
1. Introduction to DBMS
1. Introduction to DBMS1. Introduction to DBMS
1. Introduction to DBMS
 
Complete dbms notes
Complete dbms notesComplete dbms notes
Complete dbms notes
 
Integrity Constraints
Integrity ConstraintsIntegrity Constraints
Integrity Constraints
 
Distributed DBMS - Unit 6 - Query Processing
Distributed DBMS - Unit 6 - Query ProcessingDistributed DBMS - Unit 6 - Query Processing
Distributed DBMS - Unit 6 - Query Processing
 
Normalization in DBMS
Normalization in DBMSNormalization in DBMS
Normalization in DBMS
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Multidimensional Database Design & Architecture
Multidimensional Database Design & ArchitectureMultidimensional Database Design & Architecture
Multidimensional Database Design & Architecture
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
Presentations on web database
Presentations on web databasePresentations on web database
Presentations on web database
 
Web Mining & Text Mining
Web Mining & Text MiningWeb Mining & Text Mining
Web Mining & Text Mining
 
multi dimensional data model
multi dimensional data modelmulti dimensional data model
multi dimensional data model
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olapData Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olap
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
 
Rdbms
RdbmsRdbms
Rdbms
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
 

Similaire à Top (10) challenging problems in data mining

Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1DanWooster1
 
01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.pptadmsoyadm4
 
Unit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptUnit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptPadmajaLaksh
 
Data Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesData Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesKathirvel Ayyaswamy
 
Data Mining mod1 ppt.pdf bca sixth semester notes
Data Mining mod1 ppt.pdf bca sixth semester notesData Mining mod1 ppt.pdf bca sixth semester notes
Data Mining mod1 ppt.pdf bca sixth semester notesasnaparveen414
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introductionbutest
 
Ci2004-10.doc
Ci2004-10.docCi2004-10.doc
Ci2004-10.docbutest
 
Dm sei-tutorial-v7
Dm sei-tutorial-v7Dm sei-tutorial-v7
Dm sei-tutorial-v7CS, NcState
 
A review on data mining
A  review on data miningA  review on data mining
A review on data miningEr. Nancy
 
How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?andrea huang
 

Similaire à Top (10) challenging problems in data mining (20)

Chapter 1. Introduction.ppt
Chapter 1. Introduction.pptChapter 1. Introduction.ppt
Chapter 1. Introduction.ppt
 
Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1
 
Data Mining Intro
Data Mining IntroData Mining Intro
Data Mining Intro
 
data mining
data miningdata mining
data mining
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
10 problems 06
10 problems 0610 problems 06
10 problems 06
 
Unit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptUnit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.ppt
 
DBMS
DBMSDBMS
DBMS
 
isd314-01
isd314-01isd314-01
isd314-01
 
Data Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesData Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research Opportunities
 
Data Mining mod1 ppt.pdf bca sixth semester notes
Data Mining mod1 ppt.pdf bca sixth semester notesData Mining mod1 ppt.pdf bca sixth semester notes
Data Mining mod1 ppt.pdf bca sixth semester notes
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
 
Ci2004-10.doc
Ci2004-10.docCi2004-10.doc
Ci2004-10.doc
 
10probs.ppt
10probs.ppt10probs.ppt
10probs.ppt
 
Dm sei-tutorial-v7
Dm sei-tutorial-v7Dm sei-tutorial-v7
Dm sei-tutorial-v7
 
A review on data mining
A  review on data miningA  review on data mining
A review on data mining
 
How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?
 
01datamining.pdf
01datamining.pdf01datamining.pdf
01datamining.pdf
 

Dernier

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 

Dernier (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 

Top (10) challenging problems in data mining

  • 1. Top (10) challenging problems in data mining Supervised by: Dr. Ali Haroun Prepared by : Ahmed Ramzi Rashid Ahmed Sedeeq Baker Master 2017-3-11
  • 2. Suggestions Outlines : 2 Introduction Top 10 challenging Problems in data mining Conclusions
  • 3. Introduction (1-1) : Data mining is sorting through data to identify patterns and establish relat- ionships. Data mining parameters include : - Association; - Sequence or path analysis; - Classification; - Clustering; - Forecasting.
  • 4. Introduction (1-2) : 4 Data is Very complex So we have top 10 challenging Problems in data mining There is a different Way to extract The information A huge amount of data Data is power Many algorithms
  • 5. - Top 10 challenging Problems in data mining (DM) : 1- Developing a Unifying Theory of Data Mining : The developers could not have a structure that contains the different datamining algorithms . Knowledge To be verified Types of dataset Selection criterion Unified (DM) process Numeric Categorical Multimedia Text Akaike information criterion Clustering Classification Association
  • 6. - Top 10 challenging Problems in data mining (DM) : 2- Scaling Up for High Dimensional Data and High Speed Data Streams : The problem begins when the data becomes huge and complex we need ultra-high dimensional classification problems (millions or billions of features ) Rather than we need Ultra-high speed data stream
  • 7. • In this problem we want to see how to efficiently and predict the direction of these data . • In any design we must take care of this three master steps: 7 Practical design Predictor Information Learner (1) QIANG YANG ,10 Challenging problems in data mining research , International Journal of Information Technology & Decision Making , Vol. 5, No. 4 (2006) 597–604 . - Top 10 challenging Problems in data mining (DM) : 3- Mining Sequence Data and Time Series Data :
  • 8. • We have complex knowledge when we have mining data from multiple relation. • In most domains, the object of interest are not independent of each other. • The objects are not of a single type. 8 HTML has a tree structure (nested tags) Text has a list structure (sequence of words) Hyperlinks graph structure (Linked pages) Example domains Worldwide Web (1) Jarosław Stepaniuk , Rough – Granular in Knowledge Discovery and Data Mining , Volume 152 of the series , pp 99-110 . - Top 10 challenging Problems in data mining (DM) : 4. Mining Complex Knowledge from Complex Data :
  • 9. 5.1 : Community and social networks : • when we say community we must take important topics that are mining of social networks . • The challenging to identify the problem is :  It’s critical .  Distributed .  Snapshot . 9 5.2 : Mining in and for computer networks — high-speed mining of high-speed streams : • This part studies how to provide a Good algorithm are and how to detecte an attack . • DoS (Denial of Service) how to detected it and how to discriminate . We will discuss two part in this problem: (1) Qiang Yang, Hong Kong , 10 Challenging Problems in Data Mining Research , ICDM 2005 , pp 8. - Top 10 challenging Problems in data mining (DM) : 5. Data Mining in a Network Setting :
  • 10. • Need to correlate the data seen at the various probes (such as in a sensor network). • The important problem is how to mine across multiple heterogeneous data sources. • The goal is to minimize the amount of data shipped between the various sites, by combining data mining with game theory. 10 (1) Rao , Dr. S Vidyavathi , Distributed data mining and mining multi – agent data , International Journal on Computer Science and Engineering ,Vol. 02, No. 04, 2010, 1237-1244 . - Top 10 challenging Problems in data mining (DM) : 6. Distributed Data Mining and Mining Multi-Agent Data :
  • 11. 11 • The world today is “resource-driven”. • So how we could have a best understand and hence utilize about our environment . • The researchers try to solve these problems : - Bioinformatics . - Spatial data . - Earthquakes . - Land slide . - Biological sequence . - Cancer prediction . () Pooja Shrivastava & Dr. Manoj Shukla , A Brief Survey On Data mining For Biological and Environmental Problems , International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 , pp630-631 . - Top 10 challenging Problems in data mining (DM) : 7. Data Mining for Biological and Environmental Problems :
  • 12. Data cleaning • how to merge visual interactive and automatic (DM) techniques together. 12 • how to perform systematic documentation of data cleaning . • Help users to avoid mistakes in (DM). • Create a methodology in (DM) . () QiangYang , 10 Challenging Problems in Data Mining Research , ICDM 2005 , pp 11 . - Top 10 challenging Problems in data mining (DM) : 8. Data Mining Process-Related Problems : Automate (DM) operations Combine techniques
  • 13. 13 Knowledge integrity challenges Knowledge integrity challenges The challenges facing researchers Data are being mined Develop efficient algorithm to compare (before & after) knowledge contents . Not just evaluates the knowledge integrity But also measures to evaluate the knowledge integrity of individual patterns. How to mined the data with Ensure the user’s privacy Develop algorithms for estimating the impact of the data. () QIANG YANG , 10 CHALLENGING PROBLEMS IN DATA MINING RESEARCH , International Journal of Information Technology & Decision Making Vol. 5, No. 4 (2006) , pp603. - Top 10 challenging Problems in data mining (DM) : 9. Security, Privacy, and Data Integrity :
  • 14. 14 Sampling Correct the bias Deal with special data Sampling and model building are not optimal . Here is the problem that how to correct the bias as we can. Deal with unbalanced and cost – sensitive data . Obtaining these costs relied on sampling method . () QIANG YANG , 10 CHALLENGING PROBLEMS IN DATA MINING RESEARCH , International Journal of Information Technology & Decision Making Vol. 5, No. 4 (2006) , pp 603-604 . - Top 10 challenging Problems in data mining (DM) : 10. Dealing with Non-Static, Unbalanced and Cost- Sensitive Data:
  • 15. Conclusions : • The presentation highlights on the most important 10 problems in data mining but in concise manner . • The order of the sequence list does not reflect their level of important . 15
  • 16. • We must try to work hard to overcome these problems , because nowadays the one who owns the information he has the power . 16 Suggestions :
  • 17. 17

Notes de l'éditeur

  1. Association - looking for patterns where one event is connected to another event Sequence or path analysis - looking for patterns where one event leads to another later event Classification - looking for new patterns (May result in a change in the way the data is organized but that's ok) Clustering - finding and visually documenting groups of facts not previously known Forecasting - discovering patterns in data that can lead to reasonable predictions about the future (This area of data mining is known as predictive analytics.)
  2. Some of the key issues that need to be addressed in the design of a practical data miner for noisy time series include: • Information/search agents to get information: Use of wrong, too many, or too little searchcriteria;possiblyinconsistentinformationfrommanysources;semantic analysis of (meta-) information; assimilation of information into inputs to predictor agents. • Learner/miner to modify information selection criteria: apportioning of biases to feedback; developing rules for Search Agents to collect information; developing rules for Information Agents to assimilate information. • Predictor agents to predict trends : Incorporation of qualitative information ; multi objective optimization not in closed form .
  3. Mining graphs Data that are not i.i.d. (independent and identically distributed) 1-many objects are not independent of each other, and are not of a single type . 2-mine the rich structure of relations among objects . 3- E.g.: interlinked Web pages, social networks, metabolic networks in the cell . Integration of data mining and knowledge inference . The biggest gap: unable to relate the results of mining to the real-world decisions they affect -all they can do is hand the results back to the user. More research on interestingness of knowledge .
  4. First, it’s critical to have the right characterization of the notion of “community” that is to be detected. Second, the entities/nodes involved are distributed in real-life applications, and hence distributed means of identification will be desired. Third, a snapshot-based dataset may not be able to capture the real picture .
  5. One result that we sure about it if we don't solve the privacy issue , data mining will become a derogatory term to the general public . Develop algorithms for estimating the impact that certain modifications of the data have on the statistical significance of individual patterns obtainable by board classes of data mining algorithms .
  6. Historical action in sampling and model building are not optimal , but they are not chosen randomly to . This gives the following challenging phenomenon for the data collection process . A challenging problem is how to correct the bias as much as possible .
  7. Many opinions of researchers who worked in this field are summarized .