2. What is …… ?
• Data Mining
‣ computational process of discovering patterns in
large data sets
• Big Data
‣ it is the term for a collection of data sets so large
and complex that it becomes difficult to process
‣ data has exponential growth, both structured and
unstructured
3. How much Data does
exist?
• 2.5 quintillion bytes of data are created
EVERY DAY
• IBM: 90 percent of the data in the world today were
produced with past two years
• Forms of Data????
4. Big Data Examples
• October 4th, 2012, the first presidential debate
• Flicker and its photos
5. Problem…!
• Data has grown tremendously
• This large amount of data is beyond the of software
tools to manage
• Exploring the large volume of data and extracting
useful information and knowledge is a challenge,
and sometimes, it is almost infeasible
6. HACE Theorem
• Heterogeneous, Autonomous, Complex, Evolving
• Big data starts with large volume, heterogeneous,
autonomous sources with distributed and
decentralized control, and seeks to explore
complex and evolving relationships among data
• These are characteristics of Big Data
• This is theorem to model Big Data characteristics
7.
8. • Huge Data with heterogeneous and diverse
dimensionality
‣ represent huge volume of data
• Autonomous sources with distributed and
decentralized control
‣ main characteristics of Big Data
• Complex and evolving relationships
9. Data Mining Challenges with Big
Data
• Big Data Mining Platform
• Dig Data Semantics and Application Knowledge
I. Information Sharing and Data Privacy
II. Domain and Application Knowledge
• Big Data Mining Algorithm
I. Local Learning and Model Fusion for Multiple
Information Sources
II. mining from Sparse, Uncertain, and Incomplete Data
III. Mining Complex and Dynamic Data