Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Data mining with big data

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 30 Publicité

Plus De Contenu Connexe

Les utilisateurs ont également aimé (20)

Publicité

Data mining with big data

  1. 1. Data mining With Big Data Presented By: Sandip B. Tipayle Patil Under the Guidance of Prof. Y.N.Patil DEPARTMENT OF COMPUTER ENGINEERING DR. BABASAHEB AMBEDKAR TECHNOLOGICAL UNIVERSITY Lonere.
  2. 2. Outlines  Introduction  What is Big Data?  How Much Data really Exist?  Literature Review  4Vs of Big Data  Proposed System  System Architecture  Big Data mining Framework  Hadoop Framework  Big Data Challenges and solution  Conclusion
  3. 3. Introduction
  4. 4. Interesting Facts  The volume of business data worldwide, across all companies, doubles every 1.2 years (was 1.5 years)  Daily 2500 quadrillion of data are produced and more than 90 percentage of data are produced within past two years.  A regular person is processing daily more data than a 16th century individual in his entire life  In the last years cost of storage and processing power dropped significantly  Bad data or poor data quality costs US businesses $600 billion annually  Facebook processes 10 TB of data every day / Twitter 7 TB  Google has over 3 million servers processing over 2 trillion searches per year in 2012 (only 22 million in 2000)
  5. 5. What is
  6. 6. “Big Data is the frontier of a firm's ability to store, process, and access (SPA) all the data it needs to operate effectively, make decisions, reduce risks, and serve customers.” -- Forrester
  7. 7. “Big Data is the frontier of a firm's ability to store, process, and access (SPA) all the data it needs to operate effectively, make decisions, reduce risks, and serve customers.” -- Forrester
  8. 8. “Big data is the data characterized by 3 attributes: volume, variety and velocity.” -- IBM
  9. 9. “Big data is the data characterized by 3 attributes: volume, variety and velocity.” -- IBM
  10. 10. Big Data is not about the size of the data, it’s about the value within the data.
  11. 11. What is …… ?  Data Mining ‣ computational process of discovering patterns in large data sets  Big Data The term Big data is used to describe a massive volume of both structured and unstructured data that is so large that it's difficult to process using traditional database and software techniques.
  12. 12.  ‘Big Data’ is similar to ‘small data’, but bigger  …but having data bigger it requires different approaches:  Techniques, tools and architecture  …with an aim to solve new problems  …or old problems in a better way
  13. 13. How much Data does exist?  2.5 quintillion bytes of data are created EVERY DAY  IBM: 90 percent of the data in the world today were produced with past two years  Forms of Data????  Examples : Boing Jet, Scientific Data, Sensor Data, Internet Data,
  14. 14. Literature Review  Data has grown tremendously.  This large amount of data is beyond the software tools to manage.  Exploring the large volume of data and extracting useful information and knowledge is a challenge, and sometimes, it is almost infeasible.  Most people don’t know what to do with all data that they already have
  15. 15. Giant Elephant
  16. 16.  Huge Data with heterogeneous and diverse dimensionality ‣ represent huge volume of data  Autonomous sources with distributed and decentralized control ‣ main characteristics of Big Data  Complex and evolving relationships
  17. 17. 4 Vs of Big Data Volume • Data quantity Velocity • Data Speed Variety • Data Types Veracity • Authenticity
  18. 18. Proposed System:  Identify relationships between different idea  Capable of handling Huge volume of Data  Uses distributed parallel computing with help of Hadoop  Provides platform for process data in different dimensions and summarized results.  system architecture is to be flexible enough that the components built on top of it for expressing the various kinds of processing tasks can tune it to efficiently run these different workloads.  System will process these data within reasonable cost and time limits.
  19. 19. Gap due to Lack of analysis
  20. 20. System Architecture:
  21. 21. Hadoop framework :
  22. 22. Big Data Mining framework  Big Data Mining Platform  Dig Data Semantics and Application Knowledge I. Information Sharing and Data Privacy II. Domain and Application Knowledge  Big Data Mining Algorithm I. Local Learning and Model Fusion for Multiple Information Sources II. mining from Sparse, Uncertain, and Incomplete Data III. Mining Complex and Dynamic Data
  23. 23. Big Data mining Framework
  24. 24. Challenges Location of Big Data sources- Commonly Big Data are stored in different locations Volume of the Big Data- size of the Big Data grows continuously. Hardware resources- RAM capacity Privacy- Medical reports, bank transactions Having domain knowledge Getting meaningful information
  25. 25. Solutions Parallel computing programming An efficient platform for computing will not have centralized data storage instead of that platform will be distributed in big scale storage. Restricting access to the data
  26. 26. Advantages:  Fast response  Extract useful information  Prediction of required data from large amount of data.  Savour of better results in the form of visualization.
  27. 27. Conclusion  We have entered an era of Big Data. Through better analysis of the large volumes of data that are becoming available, there is the potential for making faster advances in many scientific and improving the profitability and success of many enterprises by using technologies like hadoop ,pig and so on.  Proposed system will fully serviceable across a large variety of application domains, and therefore not cost-effective to address in the context of one domain alone.  Furthermore, this system will provide fully transformative solutions, and will be address naturally for the next generation of industrial applications. We must support and encourage this proposed framework towards addressing these technical challenges of unstructured data, if we are to achieve the promised benefits of Big Data.

Notes de l'éditeur

  • Sourcessssssssss
    Social network
    Satellite data
    Geographical data
    Live streaming data
  • Acco.to IBM

×