Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Consolidate your data marts for fast, flexible analytics 5.24.18

726 vues

Publié le

In this webinar, Cloudera and AtScale will showcase:

How a company can modernize their analytic architecture to deliver flexibility and agility to more end-users.
How using AtScale’s Universal Semantic layer can end the data chaos and allow business users to use the data in the modern platform.
Highlight the performance of AtScale and Cloudera’s analytic database with newly completed TPC-DS standard benchmarking.
Best practices for migrating from legacy appliances.

Publié dans : Technologie
  • Soyez le premier à commenter

Consolidate your data marts for fast, flexible analytics 5.24.18

  1. 1. CONSOLIDATE YOUR DATA MARTS FOR FAST, FLEXIBLE ANALYTICS Alex Gutow | Cloudera Josh Klahr | AtScale
  2. 2. 2 © Cloudera, Inc. All rights reserved. TRENDS WITHIN DATA WAREHOUSING Expand to Do More Nearly 40% want greater capacity for growing data, users, reports, analyses, etc1 Improve Responsiveness 67% of users are requesting to do more BI/analytics on their own2 Balance Business Critical & Exploration 42% looking to augment EDW with modern platform1 1Data Warehouse Modernization In the Age of Big Data Analytics, TDWI, 2016 2Achieving Greater Agility with Business Intelligence, TDWI, 2014
  3. 3. 3 © Cloudera, Inc. All rights reserved. FIRST GENERATION EDWETL
  4. 4. 4 © Cloudera, Inc. All rights reserved. THE RISE OF NEW TOOLS EDWET L ET L Analytic Appliances Curated Data Sets/OLAP Cubes
  5. 5. 5 © Cloudera, Inc. All rights reserved. POLL QUESTION #1 What analytic appliances or data marts does your organization use? (Select all that apply) • Netezza • Vertica • AWS Redshift • Snowflake • Exadata • BI Tool Extracts (Tableau, Qlik, etc) • EDW • None/Unknown • Other
  6. 6. 6 © Cloudera, Inc. All rights reserved. RISE OF THE DATA LAKE EDW ET L Analytic Appliances Curated Data Sets/OLAP Cubes Data Hub /Data Lake
  7. 7. 7 © Cloudera, Inc. All rights reserved. LIMITATIONS OF EXISTING INFRASTRUCTURE • Not able to take on more reports, use cases, users, etc. • Constrained exploration to prevent risking critical SLAs • Proliferation of data silos to address additional workloads • Maintain data copies causes inefficiencies for storage, processing, and people • Need to contain costs for existing workloads • Difficult to justify budget and maintenance for expansion • Struggle to do more with less • Designed for curated reports, not iterative, self-service analytics • Not built for elasticity or object store integration Compute Store
  8. 8. 8 © Cloudera, Inc. All rights reserved. POLL QUESTION #2 What pain points or limitations are you facing with your data mart(s)? (Select all that apply) • Cannot meet internal SLA for delivering data • Struggle to contain costs • Long wait times for new data • Limited data for querying • Data exists across silos • Cannot connect BI tools • Migrating to cloud • Other
  9. 9. 9 © Cloudera, Inc. All rights reserved. DATA WAREHOUSING OPTIMIZATION & CONSOLIDATION CLOUDERA DATA WAREHOUSE ATSCALE UNIVERSAL SEMANTIC LAYER ENTERPRISE DATA WAREHOUSE DATA MART(S) OPTIMIZE THROUGH OFFLOAD COMPLETE MIGRATION
  10. 10. 10 © Cloudera, Inc. All rights reserved. ADVANTAGES OF MODERN DATA WAREHOUSING Data Flexibility • Iterative modeling and self-service accessibility • Portability: No proprietary formats or storage lock-in Go Beyond SQL • Consolidate data silos with an open architecture • Shared data across SQL and non-SQL workloads High-Performance SQL + Cost-Effective Scalability • Elastic scale in any environment • Cloud-native integration for optimized pay-per-use costs • Proven at massive scale Hybrid Decoupled Architecture • Runs across multi-cloud & on- prem for zero lock-in • Multi- storage over S3, ADLS, HDFS, Kudu, Isilon, etc Shared Data
  11. 11. 11 © Cloudera, Inc. All rights reserved. MODERNIZING FOR THE DATA AGE Fixed Reports DATA SOURCES MODERN DATA WAREHOUSE Flexible Reporting Advanced Analytics Self-Service BI/Ad Hoc Dashboards/ Analytic Apps EDW UNIVERSAL SEMANTIC LAYER
  12. 12. 12 © Cloudera, Inc. All rights reserved. MORE VALUE AT A LOWER COST/TB AverageCostperTB Observed 80% decrease in TCO Note: TCO calculations and migration opportunity is dependent on your environment and types of workloads. Please work with Cloudera to calculate potential savings for your environment
  13. 13. © Cloudera, Inc. All rights reserved. PROOF OF PERFORMANCE & OLAP FUNCTIONALITY
  14. 14. 14 © Cloudera, Inc. All rights reserved. IMPALA OUTPERFORMS ANALYTIC DATABASES • Impala outperforms on both single and multi- user tests at 10TB • Impala lead expands with concurrency • Other SQL on Hadoop engines failed at 10TB Streams Impala QpH MPP DB QpH Impala Times Better 2 41 20 2.05x 4 75 20 3.75x 8 133 16 8.31x Metric Impala MPP DB Impala Times Better Total seconds 11,898 21,093 1.77x Geometric mean 33 92 2.79x Single-user Test – 10TB Multi-user Test – 10TB
  15. 15. 15 © Cloudera, Inc. All rights reserved. TPC-DS 1TB: SINGLE-USER • The analytical db cohort (Impala / MPP) leads SQL on Hadoop • Impala outperforms • Presto by 8.3x • Hive w/ LLAP by 4x • Spark SQL by 2.8x Impala Greenplum Spark SQL Hive with LLAP Presto Total 100% 97% 283% 405% 826% Geomean 100% 250% 367% 450% 1317% 0% 200% 400% 600% 800% 1000% 1200% 1400% RelativePerformance TPC-DS 1TB Single-user (Lower is Better)
  16. 16. 16 © Cloudera, Inc. All rights reserved. TPC-DS 1TB: MULTI-USER • At 1TB the analytic db cohort (Impala / MPP) expands lead with concurrency • With 16 streams Impala outperforms • Spark by ~22x • Hive by ~20x • Greenplum by ~3x 4 streams 8 streams 16 streams Impala 495 865 1,315 Greenplum 381 417 462 Spark SQL 76 70 61 Hive with LLAP 58 62 66 0 200 400 600 800 1000 1200 1400 QueriesperHour Multi-user TPC-DS 1TB Queries/Hour (Higher is Better) 20x
  17. 17. 17 © Cloudera, Inc. All rights reserved. COST-EFFECTIVE PERFORMANCE With Cloudera + AtScale Cloudera + AtScale can meet or exceed the performance needs of traditional database users • Impala provides leading analytic database performance at high user concurrency • AtScale’s Intelligence Platform improves query performance • Dimensional Calc Engine allows complex OLAP-style calculations to run directly on cluster Fortune 50 Insurance Company Use Case Netezza AtScale + Cloudera Speedup Factor Metric Trends 4.0 0.3 15.6 times faster Customer Growth 6.8 0.3 20.6 times faster Segment Benchmark 2.4 0.9 2.7 times faster Time Series 5.5 1.1 4.9 times faster Transaction Dashboard 5.5 0.4 12.7 times faster Top Customers 5.4 0.9 6.1 times faster Product Heatmap 5.5 0.2 34.8 times faster Top Products 5.5 1.9 2.9 times faster Top Suppliers 5.5 0.8 6.9 times faster
  18. 18. 18 © Cloudera, Inc. All rights reserved. BI ON BIG DATA IS ABOUT MORE THAN JUST PERFORMANCE AtScale adds OLAP to Cloudera’s Analytic DB Benefits of AtScale + Impala • OLAP is the language of BI ■ Provides business users access to functions like: time- intelligence, drill-down, and slicing and dicing • AtScale is optimized for Impala ■ Provides intelligent queries, optimized data structures
  19. 19. 19 © Cloudera, Inc. All rights reserved. DATA SCIENTISTS AND BUSINESS USE THE SAME DATA Curate the data once - deliver self-service BI and Data Science Benefits the AtScale Universal Semantic Layer • Keeps the data in the Cloudera cluster • Avoids extractions or data marts • Simplifies and modernizes your data architecture • Allows any BI tool to access the data, including Excel, PowerBI, Tableau, etc.
  20. 20. 20 © Cloudera, Inc. All rights reserved. POLL QUESTION #3 Which BI tools are you using? (Select all that apply) • Tableau • SAP Business Objects • Excel • PowerBI • Qlik • Spotfire • Microstrategy • Zoomdata • Cognos • Other
  21. 21. 21 © Cloudera, Inc. All rights reserved. ATSCALE PRESENTS DATA FOR BI USERS, NOT DATA ENGINEERS
  22. 22. 22 © Cloudera, Inc. All rights reserved. DASHBOARDS DEMAND CONCURRENCY, THROUGHPUT
  23. 23. © Cloudera, Inc. All rights reserved. JOINT CUSTOMER SUCCESS
  24. 24. 24 © Cloudera, Inc. All rights reserved. TOYOTA Self-Service BI for Finance & IoT Cloudera + AtScale • Deliver sub-second queries • Supports ad hoc Tableau and PowerBI • Data science embedded with business units and goals • Enable and protect enterprise data consumption Faster time-to-insights Self-service users Databases migrated Self-delivered dashboards
  25. 25. 25 © Cloudera, Inc. All rights reserved. GLOBAL PHARMACEUTICAL R&D Information Platform Cloudera + AtScale • Consistent, shared data access through BI tools (Tableau, Tibco) • Interactive query speeds • OLAP capabilities • HIPAA compliant Time-to-insights in minutes, not years: • Reduced cost and time to identify clinical trial groups • Accelerate new drug development • 1st time metrics and monitoring on compliance data structured + unstructured 100% unstructured data captured Analytic Users 70% Execs/Managers 25% Analysts 5% Data Scientists Reduction in silos Use cases
  26. 26. © Cloudera, Inc. All rights reserved. THE OFFLOAD PROCESS
  27. 27. 27 © Cloudera, Inc. All rights reserved. WHERE DO YOU START? 12 am 12 pm 12 am 1.6M 1.2M 800K 400K 6 am 6 pm Query volume can be huge • How do you determine what workloads to run on Cloudera’s platform? • Will the queries run efficiently? • What does it take to migrate? • How do you prioritize? Numerous databases, thousands of tables, many users and applications Queries can be very complex
  28. 28. 28 © Cloudera, Inc. All rights reserved. DATA WAREHOUSE OPTIMIZATION Tools & Framework to help you through the process Plan Offload Optimize Estimate Effort Risk Analysis Schema Design Test & Validate Evaluate Identify Use Cases Impact Analysis Set Objectives Prioritized Plan Initial POC Offload each workloadEvaluate the need and scope Impact analysis, prioritized plan Optimize for performance Identify Suitable Workloads Offload Actions Capacity Planning Fine Tuning Data Model on Hadoop Optimize Queries for Performance Validate ROI, Cost
  29. 29. 29 © Cloudera, Inc. All rights reserved. WHERE TO LEARN MORE Learn more how GSK and Toyota brought BI to the Data Lake: http://info.atscale.com/webinar_do_dont_bi_on_ data_lake_2018-0 Get the report for how to offload BI workloads: https://www.cloudera.com/content/dam/www/ma rketing/resources/analyst-reports/efficiently- offloading-and-optimizing-bi-workloads-to- hadoop-with-cloudera-navigator- optimizer.pdf.landing.html Contact us or check out:
  30. 30. THANK YOU

×