Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Big Data - Applications and Technologies Overview

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Prochain SlideShare
Big data
Big data
Chargement dans…3
×

Consultez-les par la suite

1 sur 27 Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Publicité

Similaire à Big Data - Applications and Technologies Overview (20)

Plus récents (20)

Publicité

Big Data - Applications and Technologies Overview

  1. 1. Big Data and its applications
  2. 2. Introduction Big Data – use cases, applications, technologies and vendors overview Aimed at providing high level overview of tools and technologies related to big data
  3. 3. Topics covered Introduction to Big Data ◦ Definition, need for Big Data, hype cycle Applications of Big Data ◦ Industry-wise applications Big Data Technologies Overview ◦ Hadoop, PIG, Hive, NoSQL, Columnar DB Big Data Vendors Overview ◦ Amazon, Cloudera, Hortonworks, MapR etc
  4. 4. Big Data - definition popular term used to describe the exponential growth and availability of data, both structured and unstructured. collection of data from traditional and digital sources inside and outside your company that represents a source for ongoing discovery and analysis. may refer to both volume of data as well as the tools and processes
  5. 5. 3 Vs of Big Data
  6. 6. Need for Big Data 2.7 ZB of data in Digital Universe Today FB stores and analyzes 30+ PB of data Walmart data exceeds 2.5 PB better decision making and increased operational efficiency
  7. 7. When to go for a Big Data Soln Analyze all types of data Most or all of the data to be analyzed Iterative and exploratory Business measures not predetermined Traditional warehouse not suitable for unstructured data and schema compliant
  8. 8. Gartner’s Hype Cycle
  9. 9. Retail – Pricing Optimization Analyze millions of sold or items for sale Valuable insights about customers and markets in quicker timeframes Aggregate data from multiple channels in multiple formats Day long jobs complete in minutes
  10. 10. Retail – Smart shopping exp Pricing data, POS, txns, Social media, call center records, promotions Better understanding of customer preferences, shopping patterns Geo location apps - deliver personalized marketing experience
  11. 11. Big Data in Finance Customer segmentation ◦ Correlate purchase history, profile info, behaviour on social media ◦ Generate portfolio advice Fraud Detection systems Wealth Management ◦ Investment Research – try out new investment ideas, improve algorithmic trading ◦ Customer knowledge – unified view of customer
  12. 12. Big Data in Finance Regulatory Compliance ◦ Impact of Credit Crisis ‘08 – regulatory compliance ◦ Stringent monitoring and reporting of data Risk Management ◦ Better analysis of investment positions and risk metrics
  13. 13. Big Data in Healthcare EMR – Electronic Medical Records initiative in US Complete digitization of a patient’s medical info such as profile, disease treatment, pharmacy visits etc Shared across networks Slow adoption and challenges in aggregation
  14. 14. Big Data in Healthcare Predict health issues ◦ Build Model that predicts patient’s risk ◦ Hospital to do followup with high-risk patients to avoid hospitalization Predicting outbreaks ◦ IBM Research project -STEM ◦ Model – correlates disease data with climate and temperature ◦ Can predict disease outbreak for regions expecting climatic change
  15. 15. Big Data – Internet of Things Data generated by machine – RFID chips implanted in devices 3 phases ◦ Data ingestion – cost ◦ Data storage - cost ◦ Analytics – real value Outsouce phases 1 and 2 to DBAAS (redshift, hortonworks, cloudera)
  16. 16. UPS – Case study Aim ◦ Find the fastest and most fuel-efficient way to deliver packages to customers ORION research project ◦ Captures driver behaviour and safety habits thru GPS ◦ Sensor data on fuel emissions and consumption ◦ Monitors deliveries and customer service ◦ Runs advanced algorithms to optimize routes
  17. 17. UPS – Case Study early testing in 2011-2012 for 10k routes – 1.5 million gallons of fuel saved Complete deployment in 55000 routes throughout North America by 2017
  18. 18. Big Data – Technologies Mapreduce ◦ programming paradigm allows massive job execution parallely across thousands of servers ◦ Map task - input dataset is converted into a different set of key/value pairs ◦ Reduce task - several of the outputs of the "Map" task are combined to form a reduced set of tuples
  19. 19. Big Data - Technologies Hadoop ◦ Most popular open-source implementation of mapreduce ◦ Can work with multiple forms of data ◦ run processor-intensive machine learning jobs HIVE ◦ Developed by FB and later made open- source ◦ SQL like feature on top of hadoop ◦ Query data stored in a hadoop cluster
  20. 20. Big Data - Technologies PIG ◦ Scripting language ◦ Transforms data present in Hadoop cluster ◦ Developed by Yahoo and made open- source NoSQL ◦ Schema less databases ◦ Storage and retrieval of huge amounts of unstructured data ◦ Scalable, flexible and cloud-friendly but less consistent
  21. 21. Other Big Data Technologies Search engines – Lucene, Solr, ElasticSearch, Amazon CloudSearch Stream Processing ◦ Apache Storm, Apache Spark, Cloudera’s Impala, Yahoo’s S4 and Apache Tez
  22. 22. Big Data – Vendors Amazon ◦ Elastic Map Reduce – Amazon’s hadoop distribution to be run on AWS infrastructure ◦ “largest adoption of hadoop platforms in the market” – Forrester report Cloudera ◦ Uses many aspects of open-source hadoop ◦ Lot of features built on top of its hadoop namely Cloudera Manager and Impala
  23. 23. Big Data - Vendors Hortonworks ◦ Builds open-source hadoop ecosystem ◦ Also innovates – Ambari – cluster management software IBM ◦ Infosphere BigInsights – Analytics at rest ◦ Infosphere streams – Analytics in motion ◦ Hadoop-based analytics ◦ Stream computing ◦ Data Warehousing ◦ Application development
  24. 24. Big Data - Vendors Intel ◦ Develops custom Hadoop version on Xeon chips ◦ Closest affinity between hardware and software MapR ◦ Best growing Hadoop distribution company ◦ Highest scores for distribution architechture and data processing capabilities
  25. 25. Big Data - Vendors Microsoft ◦ Does not encourage open-source but promotes hadoop ◦ HDInsight Hadoop as a service to be run on Windows Azure based on Hortonworks’ hadoop distribution ◦ Polybase SQL server info can be searched on hadoop ◦ Big presence in other markets enables delivering end-end Hadoop solution
  26. 26. Big Data - Vendors Teradata ◦ SQL and RDBMS specialization ◦ Partnered with HortonWorks ◦ Integrated Hadoop with existing SQL offerings ◦ Existing teradata users can use Hadoop platform to process warehouses data
  27. 27. Questions

Notes de l'éditeur

  • Tip: Add your own speaker notes here.
  • Tip: Add your own speaker notes here.
  • Tip: Add your own speaker notes here.
  • Tip: Add your own speaker notes here.

×