Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Webinar | Using Hadoop Analytics to Gain a Big Data Advantage

3 127 vues

Publié le

Learn about:
Why big data matters to your business: realize revenue, increase customer loyalty, and pinpoint effective strategies
The business and technical challenges of big data solutions
How to leverage big data for competitive advantage
The “must haves” of an effective big data solution
Real-world examples of Cloudera, Pentaho and Dell big data solutions in action

Publié dans : Technologie, Business
  • Soyez le premier à commenter

Webinar | Using Hadoop Analytics to Gain a Big Data Advantage

  1. 1. Using Hadoop Analytics toGain a Big Data AdvantageJonathan Seidman, Solution Architect, ClouderaIan Fyfe, VP Product Marketing, PentahoJeff Stacey, Director of GTM Strategy, Channel & Sales Development, Dell
  2. 2. Why big data matters to your business Jonathan Seidman, Cloudera2 Confidential Big Data Solutions 2
  3. 3. Explosive Data Growth 10,000 GIGABYTES OF DATA CREATED (IN BILLIONS) 1.8 trillion gigabytes of data was created in 2011… • More than 90% is unstructured data • Approx. 500 quadrillion files 5,000 • Quantity doubles every 2 years 0 2005 2010 2015 STRUCTURED DATA UNSTRUCTURED DATASource: IDC 2011 3 Confidential Big Data Solutions
  4. 4. The ‗Big Data‘ Phenomenon Big Data Drivers More Content More Devices • The proliferation of data capture and creation technologies • Increased ―interconnectedness‖ drives consumption (creating more data) More New & Better Consumption Information • Inexpensive storage makes it possible to keep more, longer • Innovative software and analysis tools turn data into information • Every gigabyte of stored content can generate Big Data encompasses not a petabyte or more of transient data* only the content itself, but how it’s consumed • The information about you is much greater than the information you create*Source: IDC 2011 4 Confidential Big Data Solutions
  5. 5. The Opportunity: Quickly gain a competitiveadvantage Use Cases • Big opportunity to drive • Ecommerce – Predict revenue, e.g. customer behavior across – Predict customer behavior all channels to drive across all channels (Web revenue site, social media, email, etc.) • E-gaming – understand – Understand and monetize and better monetize customer behavior customer behavior – Predict customer churn • Networks – predict failure, neutralize attacks to reduce • Big opportunity to reduce costs costs, e.g. • Customers – predict churn, – Networks – predict optimize revenue failure, neutralize attacks • Machines/sensors – – Machines/sensors – predict predict failures, reduce failures costs – Financial risk management – • Financial risk reduce fraud, increase security management – reduce fraud, increase security5 Confidential Big Data Solutions
  6. 6. Big data challenges Ian Fyfe, Pentaho6 Confidential Big Data Solutions 6
  7. 7. Big Data Challenges Cost-effectively managing the volume, velocity and variety of data Deriving value across structured and unstructured data Adapting to context changes and integrating new data sources and types7 Confidential Big Data Solutions
  8. 8. The Current Solutions 10,000GIGABYTES OF DATA CREATED (IN BILLIONS) Current Database Solutions are designed for structured data. • Optimized to answer known questions quickly 5,000 • Schemas dictate form/context • Difficult to adapt to new data types and new questions • Expensive at petabyte scale 0 10% 2005 2010 2015 STRUCTURED DATA UNSTRUCTURED DATA 8 Confidential Big Data Solutions
  9. 9. Common Data Analytics Architecture Offline data can‘t be analyzed easily TAPE ARCHIVE Can‘t explore original BI REPORTS & high fidelity data INTERACTIVE APPS STORAGE ONLY RDBMS GRID ETL COMPUTE GRID (AGGREGATED(ORIGINAL RAW DATA) DATA) Moving data to compute doesn‘t scale DATA COLLECTION DATA SOURCES9 Confidential Big Data Solutions
  10. 10. Leveraging big data for competitive advantage All10 Confidential Big Data Solutions
  11. 11. Success With Hadoop11 Confidential Big Data Solutions
  12. 12. Big Data Analytics at TravelTainment Multi-channel distribution platform for the travel industry Pentaho Business Analytics fits perfectly into our open source Big Data environment.‖ -- Ibrahim Husseini, Director of Data Warehouse, TravelTainment• Business challenge: Inefficient and time consuming reporting capabilities on big data sets with legacy system. Benefits Why Pentaho • Ability to visualize its very large data volumes for reporting and analysis in such a way that non-technical users can also easily • Capability to analyze data from Hadoop understand them and Hive • Professional support for in-depth • Can now run complex reports three times faster and with more analysis flexibility than before • Self-service analysis and reporting for • For the first time can offer clients user-friendly, self-service and business customers ad-hoc reporting services helping IT focus on their main business and not serve as support desk for reporting • Cost effective solution 12 Confidential Big Data Solutions 1
  13. 13. Dell uncovers new insights and reduces IT costs by US$35 million with abusiness intelligence solution designed for big data Accelerated customer shipment time by 33 percentDell Saved US$2 million by improving product qualityBusiness Integrated data silosIntelligencePractice Reduced IT costs by US$35 million Increased agility 13 Confidential Big Data Solutions
  14. 14. SecureWorks slashes the cost of storage Organization with Dell | Cloudera SecureWorks is a true security partner to help protect your IT assets, comply Solution with regulations and reduce costs — without having to build your internal security expertise from scratch.ChallengeSecureWorks needed a highly scalable solution forcollecting, processing, and analyzing massive amounts of data collected fromcustomer environments. “Our storage cost per gigabyte is 23 cents.Solution We thought we hadThe organization deployed the Dell™ | Cloudera® Solution with Cloudera‘s great economicsdistribution of Apache® Hadoop® software, Dell-developed Crowbar software previously when weframework, PowerEdge™ C2100 servers, Force10 switches, Dell and were spending aboutCloudera services in a solution based on a Dell reference architecture. seventeen dollars per gigabyte.”Benefits• Reduced the cost of data storage to 23 cents per/gigabyte• Gained easy scalability for future growth Robert Scudiere, Director of Engineering, Dell SecureWorks• Leveraged open source software and commodity hardware to reduce time to market• Maintain high availability for critical services and flexibility to analyze structured and unstructured data Read the case study Watch the case study video14 Confidential Big Data Solutions
  15. 15. Must-haves of an effective big data solution Jeff Stacey, Dell 16, then 24 to close & Jonathan Seidman, Cloudera15 Confidential Big Data Solutions 1
  16. 16. Big Data Solution Requirements Cost-effectively manage the volume, variety and velocity of data Process and analyze large, complex data sets…quickly Flexibly adapt to context changes and new data types16 Confidential Big Data Solutions
  17. 17. Why was Hadoop created? Dramatic changes inExploding data volumes & types LEADS TO enterprise data management With Hadoop, you can… • Extract more value DIGITAL CONTENT • From more data • More cost effectively NEW • With greater flexibility OPERATIONAL OPPORTUNITY WEB DATA LOGS SOCIAL MEDIA • Deep analysis FILES SMART GRIDS • Exhaustive and detailed HARD • Sophisticated algorithms PROBLEMS • Quick results TRANSACTIONAL DATA AD IMPRESSIONS • Any kind R&D • From any source DATA • Structured and unstructured BIG DATA • At scale It’s difficult to handle data this diverse at this scale. Traditional platforms can’t keep pace.17 Confidential Big Data Solutions
  18. 18. What is Apache Hadoop? CORE HADOOP COMPONENTSHadoop is a platform for datastorage and processing that is… Hadoop Distributed File MapReduce  Scalable System (HDFS)  Fault tolerant File sharing and data Distributed computing  Open source protection across physical across physical servers servers Consolidates Excels at Scales everything complex analysis economically • Scale-out architecture divides • Can be deployed on • A single repository for storing workloads across multiple commodity hardware and mining any type of data nodes • Not bound by a single schema • Open source platform • Flexible file system eliminates guards against vendor ETL bottlenecks lock18 Confidential Big Data Solutions
  19. 19. Core Hadoop: HDFSSelf-healing, high bandwidth CLUSTERED STORAGE 1 2 HDFS 3 2 1 1 1 2 4 4 3 5 3 3 5 5 4 2 4 5 HDFS breaks incoming files into blocks and stores them redundantly across the cluster19 Confidential Big Data Solutions
  20. 20. Core Hadoop: MapReduceFramework for DISTRIBUTED COMPUTING 1 2 MR 3 2 1 1 1 2 4 4 3 5 3 3 5 5 4 2 4 5 Processes many jobs in parallel across many nodes and combines the results20 Confidential Big Data Solutions
  21. 21. Major Hadoop Utilities Apache Pig High-level language for expressing data Apache Hive analysis programs Apache HBase SQL-like language and metadata repository The Hadoop database. Random, real -time read/write access Hue Apache Zookeeper Browser-based desktop interface for Highly reliable interacting with distributed Hadoop coordination service Oozie Flume Server-based workflow engine for Distributed service for Hadoop activities collecting and aggregating log and event data Sqoop Apache Whirr Integrating Hadoop with RDBMS Library for running Hadoop in the cloud21 Confidential Big Data Solutions
  22. 22. Hadoop in Production22 Confidential Big Data Solutions
  23. 23. The unrivaled leader in Hadoop• Worldwide #1 distribution of Apache Hadoop• 100% Open-Source Hadoop Distribution• Largest contributor to the open source Hadoop ecosystem – Project founders from 8 of the 13 leading Apache Projects• Cloudera has more Apache committers on staff than any other company• More than 100 enterprise & public sector customers across a wide variety of industries23 Confidential Big Data Solutions
  24. 24. Dell | Cloudera Solution with Pentaho Dell Value  Business intelligence practice  Open & scalable infrastructure  Certified and tested platforms  Active community participation  Crowbar deployment tool  Reference Architecture  Deployment Guide & Services  Joint support with Cloudera  Actual customers24 Confidential Big Data Solutions
  25. 25. Industry first: PowerEdge C8000 Mix and match for the ultimate performance in a dense 4U package • Speed up your most resource-intensive workloads by mixing and matching compute, storage and/or GPU nodes in the same 4U shared infrastructure chassis • Get the cores, memory and I/O expansion you need for peak workload performance Great for: Big Data, Web 2.0/Hosting, HPC Get faster results with Mix & Match Do more with less more compute power • Mix compute, storage and GPUs in the same 4U • Intel Xeon ES-2600 • Shared infrastructure chassis processors boost reduces power & cooling performance by 80% costs by ~20% • More workload flexibility, HD & I/O options than the HP • Up to 135W support • Refresh with the latest components without having SL6500 or Super Micro • 2x the I/O bandwidth with 6047R to replace the entire chassis PCI Express Gen325 Confidential Big Data Solutions
  26. 26. • Visual design for Hadoop• Reduces skills requirements• Deep integration with Hadoop – HDFS, MapReduce, Sqoop, Oo zie – Runs as MapReduce in-Hadoop Reporting & Data Discovery Predictive• Easily connects Hadoop to Dashboards Visualization Analytics other enterprise data sources• Broadens Hadoop use to data analysts, business users and IT Data Ingestion, Man ipulation, Integ ration, Workflo w 26 Confidential Big Data Solutions
  27. 27. Fast Visual Development for Hadoop Ingestion / Manipulation / Integration Scheduling Modeling27 Confidential Big Data Solutions 2
  28. 28. Discovery > Proof of Value > Deployment28 Confidential Big Data Solutions
  29. 29. Summary Dell | Cloudera Solution with Pentaho Cost-effectively managing the volume, velocity and variety of data Derive value across structured and unstructured data Rapidly adapt to context changes and integrating new data sources and types29 Confidential Big Data Solutions
  30. 30. Q&A Ian Fyfe, Pentaho30 Confidential Big Data Solutions
  31. 31. Start getting big insights Jonathan Seidman, Cloudera jseidman@cloudera.com www.cloudera.com Ian Fyfe, Pentaho ifyfe@pentaho.com www.pentaho.com Jeff Stacey, Dell Hadoop@dell.com www.dell.com/hadoop31 Confidential Big Data Solutions