SlideShare une entreprise Scribd logo
1  sur  25
DATA MINING ON BIG DATA
Presented by - Swapnil H. Chaudhari
Guided by
Prof. B. R. Mandre
DEPARTMENT OF COMPUTER ENGINEERING
SSVPS’s B. S. DEORE COLLEGE OF ENGINEERING, DHULE
2013 - 2014
18-Jan-16
OBJECTIVE :
 Brief introduction on Big Data
 What is Data Mining
 Rise of big data
 Big Data Characteristics: HASE Theorem
 Data Mining Challenges with Big Data
 A Big Data processing framework
18-Jan-16
2
BIG DATA AND DATA MINING
 Big Data concern large-volume, complex, growing data sets with
multiple, autonomous sources.
 Data Mining is Process of semi-automatically analyzing large
databases to find patterns that are:
 valid: hold on new data with some certainty
 useful: should be possible to act on the item
 understandable: humans should be able to interpret the
pattern
 Also known as Knowledge Discovery in Databases (KDD)
18-Jan-16
3
HOW BIG IS THE BIG DATA?
4
- What is big today maybe not big tomorrow
- Fast growing Big data can challenge our current technology in some
manner
- Volume
- Communication
- Speed of Generating
- Meaningful Analysis
BIG DATA VECTORS (4VS)
- Volume
amount of data
- Velocity
Speed rate in collecting or acquiring or generating or
processing of data
- Variety
different data type such as audio, video, image data (mostly
unstructured data)
- Variability
semantics, or the variability of meaning in language.
[Gartner 2012] 5
EXAMPLES:
 Government
 On 4 October 2012, the first presidential debate between President
Barack Obama and Governor Mitt Romney triggered more than 10
million tweets within 2 hours
 Private Sector
 Walmart handles more than 1 million customer transactions every hour,
which is imported into databases estimated to contain more than 2.5
petabytes of data
 Facebook handles 40 billion photos from its user base.
 Flickr, a public picture sharing site, which received 1.8 million photos
per day, on average, from February to March 2012 [5]. Assuming the
size of each photo is 2 megabytes (MB), this requires 3.6 terabytes (TB)
storage every single day.
18-Jan-16
6
BIG DATA CHARACTERISTICS: HASE THEOREM
 HACE Theorem. Big Data starts with large volume,
Heterogeneous, Autonomous sources with distributed and
decentralized control, and seeks to explore Complex and
Evolving relationships among data [1].
18-Jan-16
7
Fig. The blind men and the giant elephant: the localized (limited) view of each blind man leads to a
biased conclusion.
18-Jan-16
8
BIG DATA CHARACTERISTICS
 Huge Data with Heterogeneous and Diverse Dimensionality.
 Autonomous Sources with Distributed and Decentralized
Control.
 Complex and Evolving Relationships.
18-Jan-16
9
CONCEPTUAL VIEW OF THE BIG DATA
PROCESSING FRAMEWORK
Fig. A Big Data processing framework
18-Jan-16
10
A BIG DATA PROCESSING FRAMEWORK:
 Tier I :- which focuses on low-level data accessing and
computing.
 Tier II:- which concentrates on high-level semantics, application
domain knowledge, and user privacy issues.
 Tier III:- challenges on actual mining algorithms.
18-Jan-16
11
TIER I: BIG DATA MINING PLATFORM
 TIRE I -which focuses on low-level data accessing and
computing.
 One of the most important characteristics of Big Data is to carry
out computing on the petabyte (PB), even the exabyte (EB)-level
data with a complex computing process.
18-Jan-16
12
Small scale data mining tasks:
 a single desktop computer, which contains hard disk and CPU
processors, is sufficient to fulfill the data mining goals.
Medium scale data mining tasks:
 Common solutions are to rely on parallel computing [3], [4]
or collective mining [2] parallel computing programming.
Big Data mining tasks:
with a data mining task being deployed by running some
parallel programming tools, such as MapReduce or Enterprise
Control Language (ECL), on a large number of computing
nodes (i.e., clusters).
18-Jan-16
13
MAPREDUCE TECHNIQUE
 MapReduce is programming model for distributed system.
 MapReduce program execute in three stages
 Map
 Shuffle
 Reduce
 MapReduce is a batch-oriented parallel computing model[7]
18-Jan-16
14
MAPREDUCE ALGORITHM
18-Jan-16
15
FLOW OF MAP REDUCE FUNCTION
Fig. MapReduce Technique[IBM.COM]
18-Jan-16
16
EXAMPLE : WORD COUNT
18-Jan-16
17
Fig. MapReduce Technique for word count [IBM.COM]
18-Jan-16
18
18-Jan-16
19
TIER II: BIG DATA SEMANTICS AND
APPLICATIONKNOWLEDGE
 Information Sharing and Data Privacy
 To protect privacy, two common approaches are to
1. restrict access to the data
such as adding certification or access control to the data entries, so
sensitive information is accessible by a limited group of users only
2. anonymize data fields
sensitive information cannot be pinpointed to an individual
record [15].
18-Jan-16
20
TIER II: BIG DATA SEMANTICS AND
APPLICATIONKNOWLEDGE
 Domain and Application Knowledge
 Domain and application knowledge [28] provides essential
information for designing Big Data mining algorithms and
systems.
 Help identify right features for modeling the underlying data.
 Help design achievable business objectives by using Big Data
analytical techniques
18-Jan-16
21
TIER III: BIG DATA MINING ALGORITHMS
 Local Learning and Model Fusion for Multiple Information
Sources
 Mining from Sparse, Uncertain, and Incomplete Data
 Mining Complex and Dynamic Data
18-Jan-16
22
CONCLUSION
 To explore Big Data, we have analyzed several challenges at the
data, model, and system levels.
 To support Big Data mining, high-performance computing
platforms are required, which impose systematic designs to
unleash the full power of the Big Data.
18-Jan-16
23
REFERENCES
1. Xindong wu, Xingquan zhu, Gong-qing wu, Wei ding, “Data Mining With Big Data” IEEE transactions on
knowledge and data engineering, vol. 26, no. 1, january 2014
2. B. Brown, M. Chuiu and J. Manyika, “Are you ready for the era of Big Data?” McKinsey Quarterly, Oct
2011, McKinsey Global Institute
3. C. Bizer, P. Bonez, M. L. Bordie and O. Erling, “The Meaningful Use of Big Data: Four Perspective Four
Challenges” SIGMOD Vol. 40, No. 4, December 2011
4. D. Boyd and K. Crawford, “Six Provation for Big Data” A Decade in Internet Time: Symposium on the
Dynamics of the Internet and Society, September 2011, Oxford Internet Institute
5. D. Agrawal, S. Das and A. E. Abbadi, “Big Data and Cloud Computing: Current State and Future
Opportunities” ETDB 2011, Uppsala, Sweden
6. D. Agrawal, S. Das and A. E. Abbadi, “Big Data and Cloud Computing: New Wine or Just New Bottles?”
VLDB 2010, Vol. 3, No. 2
7. F. J. Alexander, A. Hoisie and A. Szalay, “Big Data” IEEE Computing in Science and Engineering
journal 2011
8. O. Trelles, P Prins, M. Snir and R. C. Jansen, “Big Data, but are we ready?” Nature Reviews, Feb 2011
9. K. Bakhshi, “Considerations for Big data: Architecture and approach” Aerospace Conference, 2012 IEEE
10. S. Lohr, “The Age of Big Data” Thr New York times Publication, February 2012
11. M. Nielsen, “Aguide to the day of big data”, Nature, vol. 462, December 2009
24
18-Jan-16
25

Contenu connexe

Tendances (20)

Big Data
Big DataBig Data
Big Data
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its Challenges
 
Big Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data ScientistsBig Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data Scientists
 
Big data Ppt
Big data PptBig data Ppt
Big data Ppt
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
 
Big data
Big dataBig data
Big data
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
 
Big data
Big dataBig data
Big data
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data
Big dataBig data
Big data
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data
Big dataBig data
Big data
 
Data mining & big data presentation 01
Data mining & big data presentation 01Data mining & big data presentation 01
Data mining & big data presentation 01
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data and its applications
Big data and its applicationsBig data and its applications
Big data and its applications
 
Big Data & Data Mining
Big Data & Data MiningBig Data & Data Mining
Big Data & Data Mining
 
Overview of Big data(ppt)
Overview of Big data(ppt)Overview of Big data(ppt)
Overview of Big data(ppt)
 
Big data
Big dataBig data
Big data
 

En vedette (7)

Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
 
Big Tools for Big Data
Big Tools for Big DataBig Tools for Big Data
Big Tools for Big Data
 
Data Mining- Big Data landscape
Data Mining- Big Data landscapeData Mining- Big Data landscape
Data Mining- Big Data landscape
 
Data mining and_big_data_web
Data mining and_big_data_webData mining and_big_data_web
Data mining and_big_data_web
 
Introduction to Data Mining and Big Data Analytics
Introduction to Data Mining and Big Data AnalyticsIntroduction to Data Mining and Big Data Analytics
Introduction to Data Mining and Big Data Analytics
 
Big Data v Data Mining
Big Data v Data MiningBig Data v Data Mining
Big Data v Data Mining
 

Similaire à Data mining on big data

A Model Design of Big Data Processing using HACE Theorem
A Model Design of Big Data Processing using HACE TheoremA Model Design of Big Data Processing using HACE Theorem
A Model Design of Big Data Processing using HACE TheoremAnthonyOtuonye
 
Map Reduce in Big fata
Map Reduce in Big fataMap Reduce in Big fata
Map Reduce in Big fataSuraj Sawant
 
SWOT of Bigdata Security Using Machine Learning Techniques
SWOT of Bigdata Security Using Machine Learning TechniquesSWOT of Bigdata Security Using Machine Learning Techniques
SWOT of Bigdata Security Using Machine Learning Techniquesijistjournal
 
A Novel Framework for Big Data Processing in a Data-driven Society
A Novel Framework for Big Data Processing in a Data-driven SocietyA Novel Framework for Big Data Processing in a Data-driven Society
A Novel Framework for Big Data Processing in a Data-driven SocietyAnthonyOtuonye
 
History of Big Data
History of Big DataHistory of Big Data
History of Big DataHEXANIKA
 
A Survey on Big Data Mining Challenges
A Survey on Big Data Mining ChallengesA Survey on Big Data Mining Challenges
A Survey on Big Data Mining ChallengesEditor IJMTER
 
BIG Data and Methodology-A review
BIG Data and Methodology-A reviewBIG Data and Methodology-A review
BIG Data and Methodology-A reviewShilpa Soi
 
Moving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and PerspectivesMoving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and PerspectivesIJRESJOURNAL
 
Big data Mining Using Very-Large-Scale Data Processing Platforms
Big data Mining Using Very-Large-Scale Data Processing PlatformsBig data Mining Using Very-Large-Scale Data Processing Platforms
Big data Mining Using Very-Large-Scale Data Processing PlatformsIJERA Editor
 
Review of big data analytics (bda) architecture trends and analysis
Review of big data analytics (bda) architecture   trends and analysis Review of big data analytics (bda) architecture   trends and analysis
Review of big data analytics (bda) architecture trends and analysis Conference Papers
 
Unit 1 Introduction to Data Analytics .pptx
Unit 1 Introduction to Data Analytics .pptxUnit 1 Introduction to Data Analytics .pptx
Unit 1 Introduction to Data Analytics .pptxvipulkondekar
 
research publish journal
research publish journalresearch publish journal
research publish journalrikaseorika
 
research publish journal
research publish journalresearch publish journal
research publish journalrikaseorika
 
ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MINING
ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MININGISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MINING
ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MININGcscpconf
 
Issues, challenges, and solutions
Issues, challenges, and solutionsIssues, challenges, and solutions
Issues, challenges, and solutionscsandit
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWieijjournal
 

Similaire à Data mining on big data (20)

A Model Design of Big Data Processing using HACE Theorem
A Model Design of Big Data Processing using HACE TheoremA Model Design of Big Data Processing using HACE Theorem
A Model Design of Big Data Processing using HACE Theorem
 
Map Reduce in Big fata
Map Reduce in Big fataMap Reduce in Big fata
Map Reduce in Big fata
 
SWOT of Bigdata Security Using Machine Learning Techniques
SWOT of Bigdata Security Using Machine Learning TechniquesSWOT of Bigdata Security Using Machine Learning Techniques
SWOT of Bigdata Security Using Machine Learning Techniques
 
Big data survey
Big data surveyBig data survey
Big data survey
 
A Novel Framework for Big Data Processing in a Data-driven Society
A Novel Framework for Big Data Processing in a Data-driven SocietyA Novel Framework for Big Data Processing in a Data-driven Society
A Novel Framework for Big Data Processing in a Data-driven Society
 
History of Big Data
History of Big DataHistory of Big Data
History of Big Data
 
A Survey on Big Data Mining Challenges
A Survey on Big Data Mining ChallengesA Survey on Big Data Mining Challenges
A Survey on Big Data Mining Challenges
 
BIG Data and Methodology-A review
BIG Data and Methodology-A reviewBIG Data and Methodology-A review
BIG Data and Methodology-A review
 
Moving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and PerspectivesMoving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and Perspectives
 
Big data Mining Using Very-Large-Scale Data Processing Platforms
Big data Mining Using Very-Large-Scale Data Processing PlatformsBig data Mining Using Very-Large-Scale Data Processing Platforms
Big data Mining Using Very-Large-Scale Data Processing Platforms
 
Review of big data analytics (bda) architecture trends and analysis
Review of big data analytics (bda) architecture   trends and analysis Review of big data analytics (bda) architecture   trends and analysis
Review of big data analytics (bda) architecture trends and analysis
 
LITERATURE SURVEY ON BIG DATA AND PRESERVING PRIVACY FOR THE BIG DATA IN CLOUD
LITERATURE SURVEY ON BIG DATA AND PRESERVING PRIVACY FOR THE BIG DATA IN CLOUDLITERATURE SURVEY ON BIG DATA AND PRESERVING PRIVACY FOR THE BIG DATA IN CLOUD
LITERATURE SURVEY ON BIG DATA AND PRESERVING PRIVACY FOR THE BIG DATA IN CLOUD
 
Unit 1 Introduction to Data Analytics .pptx
Unit 1 Introduction to Data Analytics .pptxUnit 1 Introduction to Data Analytics .pptx
Unit 1 Introduction to Data Analytics .pptx
 
research publish journal
research publish journalresearch publish journal
research publish journal
 
research publish journal
research publish journalresearch publish journal
research publish journal
 
Big data
Big dataBig data
Big data
 
ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MINING
ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MININGISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MINING
ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MINING
 
Issues, challenges, and solutions
Issues, challenges, and solutionsIssues, challenges, and solutions
Issues, challenges, and solutions
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEW
 

Dernier

Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01KreezheaRecto
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...tanu pandey
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 

Dernier (20)

Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 

Data mining on big data

  • 1. DATA MINING ON BIG DATA Presented by - Swapnil H. Chaudhari Guided by Prof. B. R. Mandre DEPARTMENT OF COMPUTER ENGINEERING SSVPS’s B. S. DEORE COLLEGE OF ENGINEERING, DHULE 2013 - 2014 18-Jan-16
  • 2. OBJECTIVE :  Brief introduction on Big Data  What is Data Mining  Rise of big data  Big Data Characteristics: HASE Theorem  Data Mining Challenges with Big Data  A Big Data processing framework 18-Jan-16 2
  • 3. BIG DATA AND DATA MINING  Big Data concern large-volume, complex, growing data sets with multiple, autonomous sources.  Data Mining is Process of semi-automatically analyzing large databases to find patterns that are:  valid: hold on new data with some certainty  useful: should be possible to act on the item  understandable: humans should be able to interpret the pattern  Also known as Knowledge Discovery in Databases (KDD) 18-Jan-16 3
  • 4. HOW BIG IS THE BIG DATA? 4 - What is big today maybe not big tomorrow - Fast growing Big data can challenge our current technology in some manner - Volume - Communication - Speed of Generating - Meaningful Analysis
  • 5. BIG DATA VECTORS (4VS) - Volume amount of data - Velocity Speed rate in collecting or acquiring or generating or processing of data - Variety different data type such as audio, video, image data (mostly unstructured data) - Variability semantics, or the variability of meaning in language. [Gartner 2012] 5
  • 6. EXAMPLES:  Government  On 4 October 2012, the first presidential debate between President Barack Obama and Governor Mitt Romney triggered more than 10 million tweets within 2 hours  Private Sector  Walmart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes of data  Facebook handles 40 billion photos from its user base.  Flickr, a public picture sharing site, which received 1.8 million photos per day, on average, from February to March 2012 [5]. Assuming the size of each photo is 2 megabytes (MB), this requires 3.6 terabytes (TB) storage every single day. 18-Jan-16 6
  • 7. BIG DATA CHARACTERISTICS: HASE THEOREM  HACE Theorem. Big Data starts with large volume, Heterogeneous, Autonomous sources with distributed and decentralized control, and seeks to explore Complex and Evolving relationships among data [1]. 18-Jan-16 7
  • 8. Fig. The blind men and the giant elephant: the localized (limited) view of each blind man leads to a biased conclusion. 18-Jan-16 8
  • 9. BIG DATA CHARACTERISTICS  Huge Data with Heterogeneous and Diverse Dimensionality.  Autonomous Sources with Distributed and Decentralized Control.  Complex and Evolving Relationships. 18-Jan-16 9
  • 10. CONCEPTUAL VIEW OF THE BIG DATA PROCESSING FRAMEWORK Fig. A Big Data processing framework 18-Jan-16 10
  • 11. A BIG DATA PROCESSING FRAMEWORK:  Tier I :- which focuses on low-level data accessing and computing.  Tier II:- which concentrates on high-level semantics, application domain knowledge, and user privacy issues.  Tier III:- challenges on actual mining algorithms. 18-Jan-16 11
  • 12. TIER I: BIG DATA MINING PLATFORM  TIRE I -which focuses on low-level data accessing and computing.  One of the most important characteristics of Big Data is to carry out computing on the petabyte (PB), even the exabyte (EB)-level data with a complex computing process. 18-Jan-16 12
  • 13. Small scale data mining tasks:  a single desktop computer, which contains hard disk and CPU processors, is sufficient to fulfill the data mining goals. Medium scale data mining tasks:  Common solutions are to rely on parallel computing [3], [4] or collective mining [2] parallel computing programming. Big Data mining tasks: with a data mining task being deployed by running some parallel programming tools, such as MapReduce or Enterprise Control Language (ECL), on a large number of computing nodes (i.e., clusters). 18-Jan-16 13
  • 14. MAPREDUCE TECHNIQUE  MapReduce is programming model for distributed system.  MapReduce program execute in three stages  Map  Shuffle  Reduce  MapReduce is a batch-oriented parallel computing model[7] 18-Jan-16 14
  • 16. FLOW OF MAP REDUCE FUNCTION Fig. MapReduce Technique[IBM.COM] 18-Jan-16 16
  • 17. EXAMPLE : WORD COUNT 18-Jan-16 17 Fig. MapReduce Technique for word count [IBM.COM]
  • 20. TIER II: BIG DATA SEMANTICS AND APPLICATIONKNOWLEDGE  Information Sharing and Data Privacy  To protect privacy, two common approaches are to 1. restrict access to the data such as adding certification or access control to the data entries, so sensitive information is accessible by a limited group of users only 2. anonymize data fields sensitive information cannot be pinpointed to an individual record [15]. 18-Jan-16 20
  • 21. TIER II: BIG DATA SEMANTICS AND APPLICATIONKNOWLEDGE  Domain and Application Knowledge  Domain and application knowledge [28] provides essential information for designing Big Data mining algorithms and systems.  Help identify right features for modeling the underlying data.  Help design achievable business objectives by using Big Data analytical techniques 18-Jan-16 21
  • 22. TIER III: BIG DATA MINING ALGORITHMS  Local Learning and Model Fusion for Multiple Information Sources  Mining from Sparse, Uncertain, and Incomplete Data  Mining Complex and Dynamic Data 18-Jan-16 22
  • 23. CONCLUSION  To explore Big Data, we have analyzed several challenges at the data, model, and system levels.  To support Big Data mining, high-performance computing platforms are required, which impose systematic designs to unleash the full power of the Big Data. 18-Jan-16 23
  • 24. REFERENCES 1. Xindong wu, Xingquan zhu, Gong-qing wu, Wei ding, “Data Mining With Big Data” IEEE transactions on knowledge and data engineering, vol. 26, no. 1, january 2014 2. B. Brown, M. Chuiu and J. Manyika, “Are you ready for the era of Big Data?” McKinsey Quarterly, Oct 2011, McKinsey Global Institute 3. C. Bizer, P. Bonez, M. L. Bordie and O. Erling, “The Meaningful Use of Big Data: Four Perspective Four Challenges” SIGMOD Vol. 40, No. 4, December 2011 4. D. Boyd and K. Crawford, “Six Provation for Big Data” A Decade in Internet Time: Symposium on the Dynamics of the Internet and Society, September 2011, Oxford Internet Institute 5. D. Agrawal, S. Das and A. E. Abbadi, “Big Data and Cloud Computing: Current State and Future Opportunities” ETDB 2011, Uppsala, Sweden 6. D. Agrawal, S. Das and A. E. Abbadi, “Big Data and Cloud Computing: New Wine or Just New Bottles?” VLDB 2010, Vol. 3, No. 2 7. F. J. Alexander, A. Hoisie and A. Szalay, “Big Data” IEEE Computing in Science and Engineering journal 2011 8. O. Trelles, P Prins, M. Snir and R. C. Jansen, “Big Data, but are we ready?” Nature Reviews, Feb 2011 9. K. Bakhshi, “Considerations for Big data: Architecture and approach” Aerospace Conference, 2012 IEEE 10. S. Lohr, “The Age of Big Data” Thr New York times Publication, February 2012 11. M. Nielsen, “Aguide to the day of big data”, Nature, vol. 462, December 2009 24

Notes de l'éditeur

  1. These characteristics make it an extreme challenge for discovering useful knowledge from the Big Data.
  2. we can imagine that a number of blind men are trying to size up a giant elephant (see Fig. 1), which will be the Big Data in this context. The goal of each blind man is to draw a picture (or conclusion) of the elephant according to the part of information he collects during the process. Because each person’s view is limited to his local region, it is not surprising that the blind men will each conclude independently that the elephant “feels” like a rope, a hose, or a wall, depending on the region each of them is limited to. To make the problem even more complicated, let us assume that 1) the elephant is growing rapidly and its pose changes constantly, and 2) each blind man may have his own (possible unreliable and inaccurate) information sources that tell him about biased knowledge about the elephant (e.g., one blind man may exchange his feeling about the elephant with another blind man, where the exchanged knowledge is inherently biased). Exploring the Big Data in this scenario is equivalent to aggregating heterogeneous information from different sources (blind men) to help draw a best possible picture to reveal the genuine gesture of the elephant in a real-time fashion. Indeed, this task is not as simple as asking each blind man to describe his feelings about the elephant and then getting an expert to draw one single picture with a combined view, concerning that each individual may speak a different language (heterogeneous and diverse information sources) and they may even have privacy concerns about the messages they deliberate in the information exchange process.
  3. Small scale data mining tasks: a single desktop computer, which contains hard disk and CPU processors, is sufficient to fulfill the data mining goals. Medium scale data mining tasks: data are typically large (and possibly distributed) and cannot be fit into the main memory. Common solutions are to rely on parallel computing [3], [4] or collective mining [2] to sample and aggregate data from different sources and then use parallel computing programming (such as the Message to carry out the mining process.