Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Big data introduction

Prochain SlideShare
Applications of Big Data
Applications of Big Data
Chargement dans…3

Consultez-les par la suite

1 sur 44 Publicité

Plus De Contenu Connexe

Similaire à Big data introduction (20)


Plus récents (20)

Big data introduction

  1. 1. Big Data
  2. 2. Hello! I am Vikas Samant. Working With Entrench Electronics and Pentaho as a Big Data and Data Science Engineer. 2
  3. 3. What will you Learn : Big Data and Data Science What is Bigdata? Characteristics of Big Data 3 Big Data use cases Processing Big Data
  4. 4. What is Big Data?
  5. 5. “  Big data is a term that describes the large volume of data – structured, semi-structured and unstructured – that overpower a business on a day-to- day basis 5
  6. 6. Big data can be analyzed for insights that lead to better decisions and strategic business moves. 6 Big Data Contd…
  7. 7. Big Data Characteristics
  8. 8. Big Data: 3V’s Volume Variety 8 Velocity
  9. 9. 9 Some Make it 4V’s:
  10. 10.  Volume refers to the vast amounts of data generated every second. We are not talking Terabytes but Zettabytes or Brontobytes.  If we take all the data generated in the world between the beginning of time and 2000, the same amount of data will soon be generated every minute. 1.Volume 10
  11. 11.  Velocity is the frequency of incoming data that needs to be processed. The flow of data is massive and continuous.  Think about how many SMS messages, Facebook status updates, or credit card swipes are being sent on a particular telecom carrier every minute of every day, and you’ll have a good appreciation of velocity. 2.Velocity 11
  12. 12.  Variety refers to the different types of data we can now use. In the past we only focused on structured data that neatly fitted into tables or relational databases, such as financial data.  In fact, 80% of the world’s data is unstructured (text, images, video, voice, etc.) With big data technology we can now analyse and bring together data of different types 3.Variety 12
  13. 13.  Veracity refers to the messiness or trustworthiness of the data. With many forms of big data quality and accuracy are less controllable .  Just think of Twitter posts with hash tags, abbreviations, typos and colloquial speech as well as the reliability and accuracy of content but technology now allows us to work with this type of data. 4.Varacity 13
  14. 14. Big Data : Data Structure 14 Structured Semi-Structured “Quasi” Structured Unstructured
  15. 15.  Data containing a defined data type, format, structure.  Example: Transaction data and Data in Databases. 1. Structure Data 15
  16. 16.  Textual data files with a discernable pattern, enabling parsing.  Example: XML data files that are self describing and defined by an xml schema. 2.Semi- Structure Data 16
  17. 17.  Textual data with erratic data formats, can be formatted with effort, tools, and time.  Example: Web clickstream data that may contain some inconsistencies in data values and formats. . 3.Quasi Sturecture Data 17 http://www.google.com/#hl=en&sugexp=kjrmc&cp=8&gs_id=2m&xhr=t&q= data+scientist&pq=big+data&pf=p&sclient=psyb&source=hp&pbx=1&oq=d ata+sci&aq=0&aqi=g4&aql=f&gs_sm=&gs_upl=&bav=on.2,or.r_gc.r_pw.,cf. osb&fp=d566e0fbd09c8604&biw=1382&bih=651
  18. 18.  Data that has no inherent structure and is usually stored as different types of files.  Example: Text documents, PDFs, images and video. 4.Unstructure Data 18
  19. 19. Big Data and Data Science
  20. 20. How does Big Data relate to Data Science? Big Data and Data Science 20
  21. 21. Big Data and Data Science 21 Data Science is the process of deriving insights from Big data to form decisions and provide support to Organizations.
  22. 22. Big Data and Data Science 22
  23. 23. Data Science : Python and R Big Data and Data Science 23
  24. 24. Data Science Process 24
  25. 25. Big Data Use Cases:
  26. 26. 26 BIG D A T A USE C A S E S : 1 . O p t i m i z e F u n n e l C o n v e r s i o n 2 . B e h a v i o r a l A n a l y t i c s 3 . C u s t o m e r S e g m e n t a t i o n 4 . F r a u d D e t e c t i o n
  27. 27. 1. Optimize Funnel Conversion 27
  28. 28. 28 1. OPTIMIZE FUNNEL CONVERSION Big data analytics allows companies to track leads through the entire sales conversion process, from a click on an adword ad to the final transaction, in order to uncover insights on how the conversion process can be improved.
  29. 29. COMPANY T-Mobile Industry Communication Employees 38000 Type Optimize Funnel Conversion Purpose: T-Mobile uses multiple indicators, such as billing and sentiment analysis, in order to identify customers that can be upgraded to higher quality products, as well as to identify those with a high lifetime customer-value, so its team can focus on retaining those customers.
  30. 30. 2. Behavioral Analytics 30
  31. 31. 31 2. Behavioral analytics With access to data on consumer behavior, companies can learn what prompts a customer to stick around longer, as well as learn more about their customer’s characteristics and purchasing habits in order to improve marketing efforts and boost profits.
  32. 32. COMPANY Nestle Industry Food and Beverage Employees 38000 Type Behavioral Analytics Purpose: Customer complaints and PR crises have become more difficult to handle thanks to social media. To better keep track of customer sentiment and what is being said about the company online, Nestle created a 24/7 monitoring center to listen to all of the conversations about the company and its products on social media. The company will actively engage with those that post about them online in order to mitigate damage and build customer loyalty.
  33. 33. 3. Customer Segmentation 33
  34. 34. 34 3. CUSTOMER SEGMENTATION By accessing data about the consumer from multiple sources, such as social media data and transaction history, companies can better segment and target their customers and start to make personalized offers to those customers.
  35. 35. COMPANY Heineken Industry Food and Beverage Employees 64270 Type Customer Segmentation Purpose: Thanks to its partnerships with Google and Facebook, Heineken has access to vast amounts of data about its customers that it uses to create real-time, personalized marketing messages. One project provides real-time content to fans who happen to be watching a sponsored event.
  36. 36. 4. Fraud Detection 36
  37. 37. 37 7. FRAUD DETECTION Financial firms use big data to help them identify sophisticated fraud schemes by combining multiple points of data.
  38. 38. COMPANY Discovery Health Industry Insurance Employees 5000 Type Fraud Detection Purpose: Discovery Health uses big data analytics to identify fraudulent claims and possible fraudulent prescriptions. For example, it can identify if a healthcare provider is charging for a more expensive procedure than was actually performed.
  39. 39. Processing Big Data
  40. 40. Big Data Technologies 40
  41. 41. Big Data Vendors 41
  42. 42. What is Hadoop Framework 42 Hadoop is an open source framework that supports the processing and storage of extremely large data sets in a distributed computing environment with commodity Hardware‘s.
  43. 43. Why Hadoop? 43 Studies show, that by 2020, 80% of all Fortune 500 companies will have adopted Hadoop. A study at McKinsley Global Institute predicted that by 2020, the annual GDP in manufacturing and retail industries will increase to $325 billion with the use of big data analytics.
  44. 44. Thanks! Any questions? 44