Publicité

Big Data

Intern à Asian Institute of Medical Sciences ( Official )
6 Dec 2014
Publicité

Contenu connexe

Publicité

Big Data

  1. By : Priyanka Tuteja (2k14-mtech(cse)-mrce-012)
  2. Introduction
  3. Outlines 1. What is Big Data 2. Big Data generators 3. Why Big Data 4. Characteristic of Big Data 5. Big Data – A world wide problem 6. Solution for Big Data 7. Hadoop  HDFS  Map Reduce 8. How Big Data Impact on IT 9. Future of Big Data
  4. What is big data? Big data is a collection of large and complex data sets which becomes difficult to process using on-hand database management tools or traditional data processing applications. In simpler terms, Big Data is a term given to large volumes of data that organizations store and process.
  5. Huge amount of data + From the beginning of recorded time until 2003,we created 5 billion gigabytes (exabytes) of data. + In 2011, the same amount was created every two days + In 2013, the same amount of data is created every 10 minutes.
  6. Types of Data Generators This data comes from everywhere: <> sensors used to gather climate information, <> posts to social media sites, <> digital pictures <> online Shopping <> Hospitality data <> Airlines <> purchase transaction records, and many more… This data is “ big data.”
  7. Comparison 1990’s 2014 H/D: 1GB-20 GB I TB RAM : 64-128 MB 4-16 GB Reading : 10 KBPS 100 MBPS
  8. Big Data Requires ? • Growth of Big Data is needed – Increase of storage capacities – Increase of processing power – Availability of data(different data types) – Every day we create 2.5 quintillion bytes of data; 90% of the data in the world today has been created in the last two years alone
  9. Big Data stores • Choosing the correct data stores based on your data characteristics • Data center people maintain these servers and these servers can be IBM, EMC server etc. • Whenever you want to process data – Fetch data. – Give it to your local machine. – Then process.
  10. Three Characteristics of Big Data V3s Volume •Data quantity Velocity •Data Speed Variety •Data Types
  11. 1st Character of Big Data Volume • It refers to vast amount of data generated every second. •The size of available data has been growing at an increasing rate. •Today, Facebook ingests 500 terabytes of new data every day. • The smart phones, the data they create and consume; sensors embedded into everyday objects will soon result in billions of new, constantly-updated data feeds containing environmental, location, and other information, including video.
  12. 2nd Character of Big Data Velocity • It refers to speed at which new data is being generated. • Speed at which data moves around. • Clickstreams and ad impressions capture user behavior at millions of events per second • machine to machine processes exchange data between billions of devices • on-line gaming systems support millions of concurrent users, each producing multiple inputs per second.
  13. 3rd Character of Big Data Variety • It refers to different types of data we are now using. • In past we only focused on structured data that nearly fitted into tables and relational databases. • Nowa days 80% data is unstructured (text, images , video,voice) or semi structured (log files) • Big Data analysis includes different types of data
  14. Big Data! A Worldwide Problem ? It is becoming very difficult for companies to store, retrieve and process the ever-increasing data. The problem lies in the use of traditional systems to store enormous data.  These systems were a success a few years ago, with increasing amount and complexity of data, these are soon becoming obsolete.
  15. Contd.. • When data is less , processing speed is feasible • As soon as data increases, processing is not that much good. • Thus for more data, processing should be equalise. • Thus, HADOOP is introduced as a best solution.
  16. Solution for Big Data ! The good news is - Hadoop,  Panacea for all those companies working with BIG DATA in a variety of applications  It has become an integral part for storing, handling, evaluating and retrieving hundreds or even petabytes of data.
  17. Apache Hadoop! Hadoop was developed by Doug Cutting and Michael J. Cafarella. Hadoop is an open source software framework. It supports data-intensive distributed applications.  Hadoop is licensed under the Apache v2 license.  Therefore known as Apache Hadoop.
  18. Core concepts of Hadooop • HDFS (Hadoop Distributed File System) Technique for storing huge amount of data. • Map Reduce Technique for processing the data which we are storing in HDFS.
  19. HDFS • It is a specially designed file system for storing huge data sets with cluster of commodity h/w and with streaming access pattern. cluster - Set of machines working togather commodity h/w -Cheap hardware streaming access pattern - Write ones, read any no of times but dont try to change the content of file ones you are keeping data in HDFS
  20. CONTD. • Its HDFS (Hadoop Distributed File System) splits files into large blocks (default 64MB or 128MB) and distributes the blocks • Amongst the nodes in the cluster. • For processing the data, the Hadoop Map/Reduce ships code to the nodes that have the required data, and the nodes then process the data in parallel.
  21. Map Reduce • It is a technique for processing a data which we are storing in HDFS. • Hadoop runs map reduce in form of key , value pairs. • Mapper and Reducer also works with key, value pairs.
  22. Contd.. • Record reader is a interface between input split and mapper • For every input split and mapper there is one record reader. • Record reader has been taken care by hadoop framework itself by default • In Mapper code we are writting logic on basis of that logic it will give key,value pairs • Record reader on basis of 3 file formats converts records into key,value pairs – Text Input Format (by default) – KeyValueText Input Format – SequenceFile Input Format
  23. • Shuffling: – It is a phase on intermediate data to combine all key values pairs into a collection associated to same key. • (how[11111]) • (is[11111]) • Sorting : – It is also an another phase on intermediate data to sort all key values pairs.
  24. How Big data impacts on IT • Big data is a troublesome force presenting opportunities with challenges to IT organizations. • By 2015 4.4 million IT jobs in Big Data ; 1.9 million is in US itself • India will require a minimum of 1 lakh data scientists in the next couple of years in addition to data analysts and data managers to support the Big Data space.
  25. Future of Big Data • $15 billion on software firms only specializing in data management and analytics. • This industry on its own is worth more than $100 billion and growing at almost 10% a year which is roughly twice as fast as the software business as a whole. • In February 2012, the open source analyst firm Wikibon released the first market forecast for Big Data , listing $5.1B revenue in 2012 with growth to $53.4B in 2017 • The McKinsey Global Institute estimates that data volume is growing 40% per year.

Notes de l'éditeur

  1. Acco.to IBM
Publicité