Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
1© Cloudera, Inc. All rights reserved.
Cloudera’s Analytic Database:
BI & SQL Analytics in a Hybrid
Cloud World
Alex Gutow...
2© Cloudera, Inc. All rights reserved.
What’s Driving Analytics to the Cloud?
Big data deployments in cloud
are accelerati...
3© Cloudera, Inc. All rights reserved.
POLL
What best describes your cloud strategy today?
•Standardizing on single-cloud ...
4© Cloudera, Inc. All rights reserved.
Key Applications
EDW
Optimization
Data Preparation
Self-Service BI &
Exploration
Us...
5© Cloudera, Inc. All rights reserved.
Key Benefits
An analytic database designed for Hadoop
High-Performance BI and SQL A...
6© Cloudera, Inc. All rights reserved.
Anatomy of an Analytic Database
Cloudera Decoupled by Design
Query Engine
Storage E...
7© Cloudera, Inc. All rights reserved.
Traditional Monolithic Analytic Databases
No Cloud Elasticity or Cloud
Storage Inte...
8© Cloudera, Inc. All rights reserved.
Advantages of Our Approach
Cloud-Native & On-Premise
Go Beyond SQL
• Open Architect...
9© Cloudera, Inc. All rights reserved.
Cost-Efficiencies & Flexibility in the Cloud
Primary Analytic Database Patterns
Onl...
10© Cloudera, Inc. All rights reserved.
Add Use Cases, Analytics,
and Data On-Demand
• Avoid the IT backlog with instant
a...
11© Cloudera, Inc. All rights reserved.
Demo
Querying against cloud object stores
12© Cloudera, Inc. All rights reserved.
Deployment Patterns: ETL in the Cloud
Object Storage
Batch
Cluster
Transient Batch...
13© Cloudera, Inc. All rights reserved.
Deployment Patterns: BI/Analytics in the Cloud
Three Architecture Options to Optim...
14© Cloudera, Inc. All rights reserved.
Persistent BI on Object Storage
Best for elasticity (and speed vs transient)
● Thi...
15© Cloudera, Inc. All rights reserved.
Persistent BI with Locally-Attached Storage
Best performance for consistent worklo...
16© Cloudera, Inc. All rights reserved.
Transient BI on Object Storage
Best TCO for infrequent usage
Object Storage
Cloude...
17© Cloudera, Inc. All rights reserved.
• Engine: Impala
• Data Formats:
• Use Parquet, Snappy, and 256 MB blocks for best...
18© Cloudera, Inc. All rights reserved.
Security Guidelines for BI in the cloud
Persistent clusters
• Identity:
• Tie the ...
19© Cloudera, Inc. All rights reserved.
Real-World Use Cases
20© Cloudera, Inc. All rights reserved.
Helping Companies with a Global Value
Chain Save Millions
• 360° view of supply ch...
21© Cloudera, Inc. All rights reserved.
Providing a complete view of consumer
watching and buying habits
• Helps customers...
22© Cloudera, Inc. All rights reserved.
Measure user interaction across the
ecosystem, help direct R&D and
development spe...
23© Cloudera, Inc. All rights reserved.
Next Steps
Learn about Operational and Data Engineering
Workloads in the Cloud
www...
24© Cloudera, Inc. All rights reserved.
Thank you
Prochain SlideShare
Chargement dans…5
×

Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World

1 956 vues

Publié le

3 Things to Learn About:
* On-premises versus the cloud: What’s the same and what’s different?

* Design and benefits of analytics in the cloud

* Best practices and architectural considerations

Publié dans : Logiciels
  • Soyez le premier à commenter

Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World

  1. 1. 1© Cloudera, Inc. All rights reserved. Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World Alex Gutow | Product Marketing Greg Rahn | Product Management
  2. 2. 2© Cloudera, Inc. All rights reserved. What’s Driving Analytics to the Cloud? Big data deployments in cloud are accelerating: ● Increased Agility: End-user self-service ● Elasticity: Optimize infrastructure usage ● Lower Overall TCO ● Executive Mandate: Minimize on-prem datacenter footprint
  3. 3. 3© Cloudera, Inc. All rights reserved. POLL What best describes your cloud strategy today? •Standardizing on single-cloud deployment (now/future) •Multi-cloud deployment (now/future) •Hybrid on-prem and cloud (now/future) •Not sure •No cloud plans yet
  4. 4. 4© Cloudera, Inc. All rights reserved. Key Applications EDW Optimization Data Preparation Self-Service BI & Exploration Use your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over large data volumes, so data is always ready for your business Fastest time-to-insights with a modern analytic database designed with Hadoop’s flexibility and agility
  5. 5. 5© Cloudera, Inc. All rights reserved. Key Benefits An analytic database designed for Hadoop High-Performance BI and SQL Analytics Flexibility for Data and Use Case Variety Cost-effective Scale for Today and Tomorrow Go Beyond SQL with an Open Architecture
  6. 6. 6© Cloudera, Inc. All rights reserved. Anatomy of an Analytic Database Cloudera Decoupled by Design Query Engine Storage Engine Catalog Query Engine (Impala) Catalog (HMS) Monolithic Analytic Database Modern Analytic Database Storage (Kudu) Storage (S3) Storage (HDFS)
  7. 7. 7© Cloudera, Inc. All rights reserved. Traditional Monolithic Analytic Databases No Cloud Elasticity or Cloud Storage Integration Rigid Data Model with Tightly Coupled Storage/Compute Limited to SQL with Data Movement Necessary Static Sizing ∞ COMPUTE STORE
  8. 8. 8© Cloudera, Inc. All rights reserved. Advantages of Our Approach Cloud-Native & On-Premise Go Beyond SQL • Open Architecture: Open formats and open storage • Shared data across SQL and non-SQL workloads Data Flexibility • Faster, more agile data acquisition • Data portability: Open formats and open storage Cost-Effective Scalability • Elastic scale on-prem or in the cloud • Cloud-native pay-per-use and transience • Proven at big data scale Hybrid • Runs across multi-cloud & on-prem • Multi-storage over S3, HDFS, Kudu, Isilon, DSSD, etcShared Data
  9. 9. 9© Cloudera, Inc. All rights reserved. Cost-Efficiencies & Flexibility in the Cloud Primary Analytic Database Patterns Only pay for what you need, when you need it ▪ Transient clusters ▪ Object storage centric ▪ Cloud-native deployment ETL Reduce Operating Costs New Insights, New Revenue BI/Analytics Explore and analyze all data, wherever it lives ▪ Long-running clusters ▪ Object storage or local storage ▪ Lift-and-shift deployment
  10. 10. 10© Cloudera, Inc. All rights reserved. Add Use Cases, Analytics, and Data On-Demand • Avoid the IT backlog with instant access to all data • On-demand clusters query directly on shared object storage Predictable Results Whenever You Want • Consistent query performance, even during peak times • Multi-tenancy via isolated clusters on shared data Just-in-Time Resources • Real-time capacity for your needs, as they change • Elastically grow/shrink your cluster via decoupled architecture Contention-Free ETL • ETL anytime without impacting other workloads or risking SLAs • Separate ETL clusters as-needed on shared data Benefits Across the Business ETL and BI/Analytics in the cloud
  11. 11. 11© Cloudera, Inc. All rights reserved. Demo Querying against cloud object stores
  12. 12. 12© Cloudera, Inc. All rights reserved. Deployment Patterns: ETL in the Cloud Object Storage Batch Cluster Transient Batch (most flexible) Spin up clusters as needed ● On-demand/spot instances ● Usage-based pricing ● Sized for workload ● Cluster per tenant/user Batch Cluster Batch Cluster Persistent Batch (most control) Persistent cluster(s) for frequent ETL ● Reserved instances ● Node-based pricing ● Grow/shrink ● Cluster per tenant group Persistent Cluster Batch Persistent Batch on HDFS (fastest) Top performance for frequent ETL ● Reserved instances ● Node-based pricing ● Grow/shrink ● Shared across tenant groups Batch Persistent Cluster Batch Batch Persistent Cluster HDFS Batch Batch
  13. 13. 13© Cloudera, Inc. All rights reserved. Deployment Patterns: BI/Analytics in the Cloud Three Architecture Options to Optimize Price/Performance Object Storage Transient Cluster Transient BI (infrequent usage) Spin up clusters when needed ● On-demand instances ● Usage-based pricing ● Grow/shrink ● Cluster per tenant or user Persistent BI (regular usage) Persistent clusters for BI any time ● Reserved instances ● Node-based pricing ● Grow/shrink ● Cluster per tenant group Persistent Cluster Persistent BI with Local Storage (fastest) Max speed for more regular workloads ● Reserved instances ● Node-based pricing ● Less frequent grow/shrink ● Shared cluster for shared local data Persistent Cluster HDFS and/or Kudu Persistent Cluster Transient Cluster Default Choice
  14. 14. 14© Cloudera, Inc. All rights reserved. Persistent BI on Object Storage Best for elasticity (and speed vs transient) ● This is usually the best choice ● Best when workloads are: o Flexible and changing o Frequent during most working days o Not scheduled for fixed hours ● Benefits include: o Predictable results readily available o Full multi-tenant isolation o Common data in shared object storage o Grow/shrink for TCO efficiency ● Tradeoffs: o Per node performance of object storage (use more, cheaper nodes) Object Storage Persistent BI (regular usage) Persistent clusters for ready availability ● Reserved instances ● Node-based pricing ● Grow/shrink ● Cluster per tenant group Persistent Cluster Persistent Cluster Default Choice
  15. 15. 15© Cloudera, Inc. All rights reserved. Persistent BI with Locally-Attached Storage Best performance for consistent workloads ● Best when workloads are: o Regular and consistent o Consistently querying common data o Tight SLAs for performance o Fast changing data (that needs Kudu) o Running without object storage (eg. Azure, GCE) ● Benefits include: o Faster performance per node on local data o Ability to query object storage for rest of data ● Tradeoffs: o Less elastic than object stored based clusters o Less isolation for multi-tenant workloads using same HDFS data o Cost if there are off-peak hours Object Storage Persistent BI with HDFS (fastest) Max speed for more regular workloads ● Reserved instances ● Node-based pricing ● Less frequent grow/shrink ● Shared cluster for shared HDFS data Persistent Cluster HDFS and/or Kudu
  16. 16. 16© Cloudera, Inc. All rights reserved. Transient BI on Object Storage Best TCO for infrequent usage Object Storage Cloudera Director ● Best when workloads are: o Infrequent or scheduled ● Benefits include: o Lowest TCO with clusters only when needed o Full multi-tenant isolation o Common data in shared object storage ● Tradeoffs: o Delay to spin-up clusters when needed o Capability of BI users to spin up clusters o Per node performance of object storage (use more, cheaper nodes) Shared HMS DB Transient Cluster Transient BI (infrequent usage) Spin up clusters when needed. ● On-demand instances ● Usage-based pricing ● Grow/shrink ● Cluster per tenant or user Transient Cluster
  17. 17. 17© Cloudera, Inc. All rights reserved. • Engine: Impala • Data Formats: • Use Parquet, Snappy, and 256 MB blocks for best I/O efficiency (especially on object storage) • Avoid small files to avoid performance issues (especially with object storage) • Metadata (Hive metastore): • Use local RDBMS for clusters that use locally-stored HDFS/Kudu data • Use local RDBMS for persisted clusters (or transient clusters with Kerberos or Sentry) • Use a shared RDBMS for transient single-tenant clusters without Kerberos or Sentry • Avoid sharing RDBMS for non-data sharing clusters • Deployment and Administration: • Use Cloudera Manager for persistent clusters • Use Cloudera Director for transient clusters • Deploy NLB for persistent clusters as usual (and only when needed for transience) • Monitor workloads with Cloudera Manager • Security: • Transient BI clusters: use VPC with strict security groups • Persistent BI clusters: (see next slide) Basic Guidelines for BI in the cloud
  18. 18. 18© Cloudera, Inc. All rights reserved. Security Guidelines for BI in the cloud Persistent clusters • Identity: • Tie the cluster to the corporate user directory (AD or LDAP) using Linux SSSD • Authentication: • Kerberos for internal CDH services • AD (LDAP) for BI user/tools with Impala • Authorization: • Use Sentry for authorization per BI cluster (shared RDBMS with Sentry is not yet available) • Encryption: • With AWS, use SSE-S3 server-side encryption • If you need HDFS-level encryption or keys outside AWS, then use Persisted Cluster with Local Storage using CM NOTE: if all users for a cluster have access to all data in the cluster, skip steps above and just use VPC with strict security groups
  19. 19. 19© Cloudera, Inc. All rights reserved. Real-World Use Cases
  20. 20. 20© Cloudera, Inc. All rights reserved. Helping Companies with a Global Value Chain Save Millions • 360° view of supply chain process in seconds with data from suppliers, manufacturing, equipment, field service, IoT and repair • Improved product quality by identifying and addressing supply chain issues in near-real time • $15-$25 million savings annually for Siemens clients • Costs 90% less per TB than RDBMS; 75% less per TB than Netezza CUSTOMER 360
  21. 21. 21© Cloudera, Inc. All rights reserved. Providing a complete view of consumer watching and buying habits • Helps customers optimize their ad spend for greater campaign ROI • Improves processing performance as data volumes double • Boosts agility and flexibility and reduces risk with hybrid and multi-cloud strategy CUSTOMER 360
  22. 22. 22© Cloudera, Inc. All rights reserved. Measure user interaction across the ecosystem, help direct R&D and development spend • Virtuous cycle: Identify features that facilitate sharing of content that drive new customers • Real-time streaming and batch data from product logs, web analytics, channel data and ERP • Impala connects to third-party data wrangling and BI tools for fast reporting
  23. 23. 23© Cloudera, Inc. All rights reserved. Next Steps Learn about Operational and Data Engineering Workloads in the Cloud www.cloudera.com/about-cloudera/events/webinars/cloud-webinar-series.html Get Started with Impala in the Cloud Check Out the Latest Benchmarks • www.cloudera.com/downloads • www.cloudera.com/documentation/e nterprise/latest/topics/impala_s3.html http://blog.cloudera.com/blog/2016/09/apache- impala-incubating-vs-amazon-redshift-s3- integration-elasticity-agility-and-cost- performance-benefits-on-aws/
  24. 24. 24© Cloudera, Inc. All rights reserved. Thank you

×