Publicité

Big data4businessusers

Sr. Solution Architect à EMC Corporation
6 Oct 2014
Publicité

Contenu connexe

Présentations pour vous(20)

Publicité

Big data4businessusers

  1. Qlik Sense and Big Data Making Big Data Relevant for the Business User Bob Hardaway – Solution Architect 2 October, 2014
  2. And now they coming, yeah, now they coming Out from the shadows To take me to the club because they know That I shut this down, 'cause they been watching all my windows They gathered up the wall and listening You understand, they got a plan for us I bet you didn't know that I was dangerous
  3. Intelligence Community Comprehensive National Cyber Security Initiative Data Center (ICCNCSIDC) Capable of processing all forms of communication, including the complete contents of private emails, cell phone calls, and Internet searches, as well as all types of personal data trails—parking receipts, travel itineraries, bookstore purchases, and other digital 'pocket litter'.
  4. Big Data comes with big challenges The Big Data bottleneck Reports Data Scientists Business Users Big Data “ many organizations lack the skills required to exploit big data ” “ most of these skills are in short supply and rare in the market at large ” “ data science encompasses hard skills ” Source: Gartner Big Data Hype Cycle Report 2013
  5. Qlik relieves the Big Data bottleneck The Big Data bottleneck Data Scientists Reports Analytics & Discovery Big Data Business Users QlikView’s user-centric Business Discover approach gives decision-makers access to the benefits of Big Data
  6. What is Big Data?
  7. Big Data happens in every part of History Paper Print Computer Internet • Medium to write ideas and information • Not enough writers to disseminate • Technology to distribute information • No place to store • Place to store • Can’t keep up with computing requirements • Distributed computing globally • Too many Emails to read We always create more than we can consume!
  8. The Internet of Things (IoT) • Cisco estimates 50B connected devices by 2020 • Intel says 15B by 2015 • Uber adds 70000 drivers per week • AirBnB had 42M bookings last year • ZipCar lets you reserve a parking space anywhere The Physical Web – Google project to de-App devices “People should be able to walk up to any smart device – a vending machine, a poster, a toy, a bus stop, a rental car – and not have to download an app first,” – Scott Jenson
  9. Quantifying Big Data Bigness is the least important thing … it’s the insights that can be gained from interactions vs. transactions … the customer experience vs. the value of what was purchased - Stephen Brobst, CTO Teradata Real time streaming data High volumes in Low latency Complexity in processing, analysis and deriving insights 12TB/day across 80 servers 32 billion rows per day Very large data sets Order of 100s of TB to PBs Structured & Unstructured Data, living together (OLTP, DW, data marts) text, audio, video, click streams, log files, etc 75TB compressed data processing/day 7500+ analytical jobs per day 15TB per day @ 1:7 compression ratio 4 PB storage Images - Flat file - DNA 4TB of TIFF to 11mn PDF files Using Hadoop in < 24hours
  10. A Less Alliterative Definition • Big Data is about analyzing ALL your data, ALL the time – Traditional BI systems operate on assumptions, and limited data sets that preclude true discovery and insight – The Same question gets asked over and over • The cost of analysis has always been the limiting factor for Business Intelligence – Solutions have to be justified before they are deployed • Big Data is about storing everything, cheaply and letting the User look for value • Big Data is about driving the business based on Data • Big Data doesn’t solve every problem, but it does put the User in charge of the process
  11. Hadoop – A Brief History Cutting joins Yahoo, estimates a billion page index will cost $500k and $30k/mos to support A 1400n Yahoo cluster sorts 500GB in 59s. Cloudera launches Google releases a paper on GFS, based on a distributed search platform called Nutch Hadoop promoted to top level Apache project, predictive search index creation time reduced from 12days to 8hrs Yahoo spins remaining Hadoop folks out into Hortonworks Apache Spark becomes the most contributed to Hadoop related project 3rd Hadoop World conf attracts 2300 developers, vs 275 the first time Cloudera adds real-time search, based on Lucene, also created by Cutting 2006 2008 2011 2013 2014
  12. Real-time Analytics Big Data is much more than just storage Extreme Analytic Engines Big Data Exploration, DW/ETL Pre-processing Big Data Cache + BI Infrastructure Prepare for Big Data Business Demands Real-Time Agility Advanced Analytic Capability Transformation and Exploration Advanced Data Management 1 4 3 2 1
  13. Popular “Big Data” Myths • You need to have Ga-zinga-bytes to deploy a Big Data solution – Typical Cloudera Cluster is 15-20 nodes, < 10TB of data – Hadoop storage is 3-400% cheaper than an EDW • Hadoop is all you need – Hadoop is an enabling technology that provides the foundation for Big Data solutions – Focus today is on data management • The RDBMS is dead – RDBMS is still critical – but not for high volume, low quality analytics • ew can’t handle Big Data – Reality is a Human can’t handle Big Data – It’s all about the use case – Direct Discovery is a unique approach
  14. Gartner Top Big Data Challenges You need to determine your goals/objectives Qlik can help you with these challenges
  15. Turn Big Data (lots of dots) Into Small Data (Insights) The Value in Big Data Comes from Context and Relevance More History They’re both the same number of bricks! The same volume of data, same schema. You choose what is relevant to your analysis. More Categories
  16. Hard Disk Drives (HDD) Solid State Storage (SSD) Random Access Memory (RAM) Speed (t/TB) 3300s 1000-300s 1s Price $/TB $ 50 $ 500 $ 4500 • Keep data in memory when the value obtained from processing it is high • Leave data on disk when it is inactive or the value from processing it is low Value Size The Big Data Value Chain
  17. Fine, Big Data is here, but what are the Big Data Use Cases that matter to my Business?
  18. Initially Hadoop Came About to Reduce Costs • How cheaply? – By one estimate running a 75-node, 300TB Hadoop cluster costs $1.05 Million over 3 years. – Simply for an RDBMS may cost 2.5x for the same time period. • This type of savings means companies can keep ‘more’ or all of their data. • Hadoop is for storage, not analytics – Data storage remains the most common use case for Hadoop • Example: – Expedia is moving from DB2 to Cloudera with expected savings of approximately $100 million per year.
  19. But Big Data Technologies are Evolving Rapidly • 2010 – Download Apache Hadoop, cobble together surplus hardware, hire a couple java developers • 2012 – CDH 4 from Cloudera reduces deployment time from days to minutes • 2013 – AWS introduces Elastic Map Reduce (EMR) • 2014 – Google Counters with Google Compute Engine (GCE) • Platform Vendors cover more than just Hadoop-like capabilities – Map-Reduce for large scale, batch processing – NoSQL for real-time, adhoc query with operational performance – Spark/Solr/Impala for real-time analytics – R Integration for deep predictive/advanced analytics – All need a delivery agent (aka Visualization tool) to bring the benefit to the business
  20. Big Data Use Cases are About Finding Value • Internet (Expedia) – Search Index Generation – User Engagement Behavior – Targeting / Advertising Optimizations – Recommendations • BioMed (Carefusion) – Computational BioMedical Systems – Bioinformatics – Data Mining and Genome Analysis • Financial (Metlife / Wells Fargo) – Prediction Models – Fraud Analysis – Portfolio Risk Management • Telecom (BritTelecom/DeutscheTele) – Call data records – Set top & DVR streams • Social (Facebook) – Recommendations – Network Graphs – Feed Updates • Enterprise Operations – email and image processing – Robust ETL – Data Archival – Natural Language Processing • Media & Entertainment (DIRECTV) – Customer 360 – Marketing Campaigns • Agriculture (ADM) – Process “agri” stream – Mineral Management • Image (Corbis) – Geo-Spatial processing • Education (State of …) – Systems Research – Statistical analysis of the web
  21. Big Data Ecosystem is Much More Than Just Hadoop Data Visualization, Statistical & In-memory Analytics Open source Distributed Processing Frameworks Big Data Analytic Appliances Massively Parallel Processing Platforms Big data Integration Packaged Mapreduce platforms Big Insights & Streams Big Data Appliance HANA splunk >
  22. Qlik Brings Big Data to the Business User
  23. Insight Comes from Big Data, in Context NoSQL Databases SAP HANA Google BigQuery Batch Real-time Hadoop Advanced Analytics Platform Vendors
  24. Leveraging QlikView for Big Data Discovery Define Your Use Case • A Hybrid approach that – Provides any/all business stakeholder with a simple but powerful environment for exploring data, without – Limiting or filtering what data is available for analysis when • Follow the Value – Start with simple questions: • What data do we already have they we are not making good use of today? – Let your business decide where the exploration goes • The technologies are cost effective, flexible and designed for a business-first methodology
  25. QlikView Direct Discovery • Combines the associative capabilities of the QlikView in-memory dataset with a query model where:  The aggregated query result is passed back to a QlikView object without being loaded into the QlikView data model  The result set is still part of the associative experience  Capability to Drill to Detail records QlikView In-Memory Data Model QlikView Application Direct Discovery Batch Load
  26. Complement Hadoop and EDW co-existence Data Warehouse Aggregates Direct Discovery Broad Application to discover new trends Deep Application to confirm and take action Move highly valuable data to EDW for more broad accessibility Point QlikView to new source
  27. Big Data Business Needs Descriptive Analytics Predictive Analytics DATA Clinical, Claims, Monitoring, others How are we doing? What might happen in the future? Prescriptive Analytics Best course of action given objectives, requirements & constraints How many claims did we pay today? Which of tomorrow’s claims might be requesting an Emergency Room (ER) admission? What would be effective steps to reduce probability of ER admission? Qlikview is leader in Descriptive but barely plays in Predictive and Prescriptive. Radically different algorithmic and visualization concepts are needed to play in that arena
  28. King.com: Big Data in Action • 1.6B rows of data per day in Hadoop — – 211M rows per day extracted for analysis in QlikView • Customer browsing activity: – Player Interactions within each game – Many additional metrics • Results: Marketing ROI of campaigns achieved for the first time (# of players, # of games played, time played, etc.)
  29. Thank You

Notes de l'éditeur

  1. The Bloor Group write in “Why In-Memory Technology will dominate Big Data” from Kognitio download site http://www.kognitio.com/information-center/reports/ If the goal is to accelerate BI activities dramatically, the natural approach is to have an in memory processing resource that can be used where it makes a difference, flowing the data from disk through SSD to memory in order to support those BI workloads. In other words, data is kept in memory when the value obtained from processing it is high, and data stays on disk when it is inactive or the value from processing it is low.
  2. Readwrite.com/2013/05/29/the-real-reason-hadoop-is-such-a-big-deal-in-big-data#awesm=-ov83pYC1hKZ58O Rainstor.com/compression-tames-big-data-on-hadoop
  3. Readwrite.com/2013/05/29/the-real-reason-hadoop-is-such-a-big-deal-in-big-data#awesm=-ov83pYC1hKZ58O Rainstor.com/compression-tames-big-data-on-hadoop
Publicité