Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud

1 063 vues

Publié le

3 Things to Learn About:
*On-premises versus the cloud
*Design & benefits of real-time operational data in the cloud
*Best practices and architectural considerations

Publié dans : Logiciels
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici

Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud

  1. 1. 1© Cloudera, Inc. All rights reserved. Operational Database in the Public Cloud Ryan Lippert | Cloudera Product Marketing
  2. 2. 2© Cloudera, Inc. All rights reserved.
  3. 3. 3© Cloudera, Inc. All rights reserved. What’s Driving Operations to the Cloud? ● Increased Agility: End-user self-service ● Elasticity: Optimize infrastructure usage ● Lower Overall TCO ● Executive Mandate: Minimize on- premises datacenter footprint Big data deployments in cloud are accelerating; why?
  4. 4. 4© Cloudera, Inc. All rights reserved. Overview Cloudera’s Operational Database
  5. 5. 5© Cloudera, Inc. All rights reserved. Cloudera’s Operational Database Build Data-Driven Applications to Deliver Real-Time Insights Operational Database Attributes Fast • Real time model serving w/ <15ms latency • Limitless concurrency (>100M updates/sec) Secure • Native encryption • Audit Easy • Stream ingest, processing, NoSQL, and real-time analytics together • Best-in-class management and cloud automation
  6. 6. 6© Cloudera, Inc. All rights reserved. Visualization Processing/ Exploration Storage Unique Components Cloudera’s Operational Database Fast/random reads and writes via a high-performance, distributed NoSQL data store HBase Fast analytics on fast data with a relational structure Kudu Integration with the leading BI tools BI Partners Faceted, text-based search for data exploration and democratization Cloudera Search Powerful and flexible processing, streaming, and SQL Spark Multi-Storage Multi-Environment Encryption, Key Trustee Navigator Storage & Governance
  7. 7. 7© Cloudera, Inc. All rights reserved. Operational Database Durable, low latency storage for web applications, message stores, and mission critical operational activities. Web-Scale Data Depot Identifying meaningful events based on multiple data streams and taking action. Complex Event Processing Use data and current/past events to score and serve the likelihood of subsequent events. Model Scoring/Serving
  8. 8. 8© Cloudera, Inc. All rights reserved. Web Scale Data Depot Key Applications for Operational Database Real-Time Data Access Low-latency and high concurrency enable broad-based access to real-time information, yielding informed decisions Enterprise Data Apps Build company-wide, easy-to-use apps to enable employees or customers to interact with pertinent data IoT Data Ingestion and Collection Ingest, process, and serve IoT data in real time to take advantage of instrumentation investments Web-Scale Data Management Store information from broad sets of customer interaction occurring online, in-app, or in-store Ingests over 4 million homes worth of energy data, and provides reports that help customers save millions Serves real-time market data for over 40M instruments; ingests 2.5M transactions/second, serves 3.5M messages/second
  9. 9. 9© Cloudera, Inc. All rights reserved. Complex Event Processing Key Applications for Operational Database Cybersecurity and Advanced Persistent Threats (APT) Protect data with data; by keeping full-fidelity records of network activity, anomalies can be surfaced and thwarted Network Health Monitoring Maintain performance within an enterprise network by identifying and remedying problems in real-time IoT Predictive Maintenance Use IoT sensor data from an unlimited number of sources to proactively predict and fix problems with physical equipment Reduced detection of APT from hundreds of days to minutes; now scales to thousands of endpoints vs. hundreds Remote diagnostics IoT platform reduces fleet maintenance on 180,000+ vehicles by 30-40%
  10. 10. 10© Cloudera, Inc. All rights reserved. Model Scoring & Serving Key Applications for Operational Database Cross-Sell/Up-Sell & Personalization Leverage a long history of purchases among a broad population to create personalized offers, in real-time, that are likely to be actioned by shoppers Fraud Prevention Compare recent financial transactions/claims with a company-wide history of nefarious transactions to identify and prevent fraud in real-time Customer Profitability Quickly identify high-value customers via individual characteristics that correlate with profitability; focus acquisition and retention on these segments Lower cart abandonment; 3x higher open email rate; decreased bounce rate by 20%; time to update indexes from a day to 15 min Can now provide customers with 300- 400% higher CTR, 10x more return visits, and longer sessions
  11. 11. 11© Cloudera, Inc. All rights reserved. Benefits of the Public Cloud Cloudera’s Operational Database
  12. 12. 12© Cloudera, Inc. All rights reserved. What’s Driving Operations to the Cloud? ● Increased Agility: End-user self-service ● Elasticity: Optimize infrastructure usage ● Lower overall TCO ● Executive Mandate: Minimize on- premises datacenter footprint Big data deployments in cloud are accelerating; why?
  13. 13. 13© Cloudera, Inc. All rights reserved. Advantages of Our Approach Cloud-Native & On-Premises Go Beyond SQL • Open Architecture: Open formats and open storage • Shared data across SQL and non-SQL workloads Data Flexibility • Faster, more agile data acquisition • Data portability: Open formats and open storage Cost-Effective Scalability • Elastic scale on-prem or in the cloud • Cloud-native pay-per-use and transience • Proven at big data scale Hybrid • Runs across multi-cloud & on-prem • Multi-storage over S3, HDFS, Kudu, Isilon, DSSD, etcShared Data
  14. 14. 14© Cloudera, Inc. All rights reserved. Operational Database in the Cloud Public Cloud Benefits Cost Considerations • Low-cost backup and disaster recovery • Development and testing environments easy to deploy and decommission Convenience Considerations • Elastic growth for tightly provisioned workloads makes expansion easy, and enables a lower-cost steady state • Fast and easy provisioning of additional clusters helps projects move quickly
  15. 15. 15© Cloudera, Inc. All rights reserved. Operational Database Cloud Architecture Applications Long lived Prod Cluster Operational DB Director Provision CM Manage/Provision Operational DB Temporary Dev/Test Cluster Burst Batch Processing Data copy for burst processing or read/write temp clusters 1 Easy Provisioning 2 Dev/Test Provisioning 3 Burst processing of large amounts of data 4 Low cost backups Data Sources Spark Streaming S31 EBS2 AWS Infrastructure Benefits 1. S3, Azure Blob Storage, etc. 2. EBS, Azure Premium Storage, etc.
  16. 16. 16© Cloudera, Inc. All rights reserved. Easy Provisioning of New Clusters Operational Database in the Public Cloud Easy Provisioning Business Challenge • On-premises installations can be slow to roll out, particularly for PoC engagements with long procurement cycles • Some organizations can take 3-6 months to execute this process, slowing developers and creating anchors to legacy technology Cloud-Enabled Solution • Cloudera enables customers to go to the public cloud with industry- leading software • Workloads can be moved across clouds or to on premise clusters, preventing cloud lock-in Details • Cloudera provides the ability to quickly provision a new cluster for operational use cases in the public cloud • Rapid provisioning without the permanent cost of internal infrastructure helps make Cloudera a fast and easy choice for PoC’s • Companies with Cloudera-trained employees have the ability to test and prove/disprove new use cases quickly, delivering more value to business
  17. 17. 17© Cloudera, Inc. All rights reserved. Easy Provisioning of New Clusters Operational Database in the Public Cloud Application Operational DB Cloud Instances Director Provision Cloud Storage Application Cloud Instances Direct Attached Storage Fast Cloud Storage1 Operational DB Instance Storage Director Provision 1. EBS, Azure Premium Storage, etc.
  18. 18. 18© Cloudera, Inc. All rights reserved. Spark Streaming and Operational DB in the cloud For real-time processing and serving architectures Availability Zone Applications Ingest Streaming Data Spark Streaming running on a dedicated permanent cluster Spark Streaming Operational DB Operational DB on a dedicated permanent cluster Both clusters in the same availability zone Deliver/ Serve Data
  19. 19. 19© Cloudera, Inc. All rights reserved. Creating Dev and Test Environments Operational Database in the Public Cloud Development and Testing Environments Business Challenge • Development and testing environments are expensive to maintain/configure, difficult to secure with real data, and have different projects competing for a finite pool of resources Cloud-Enabled Solution • Cloudera in the public cloud enables development and testing environments to be provisioned quickly, securely, and for the required period of time Details • Public cloud offerings with Cloudera enable the ability to easily and quickly replicate a production instance of data to a testing environment • Test environments can be configured with all the security of the production, without the risk of overloading critical infrastructure • Temporary instances mean environments are purpose-created, time-bound, and less competition for test/dev resources
  20. 20. 20© Cloudera, Inc. All rights reserved. Creating Development and Testing Environments Fast, Easy, and Secure Development & Testing in the Cloud Production Ready Data Application Production Instance Delivers Data to Users and Applications Cloud Object Storage1 Fast Cloud Storage2 Develop- ment and Testing Production Environment Dev/Test Environment Secure Dev/Test Environment 1. S3, Azure Blob Storage, etc. 2. EBS, Azure Premium Storage, etc.
  21. 21. 21© Cloudera, Inc. All rights reserved. Burst Provisioning for ETL Operational Database in the Public Cloud Leverage Cloud for ETL Business Challenge • ETL is a difficult process that consumes a large amount of resources and can create bottlenecks depending on the nature of batch processes or data spikes Cloud-Enabled Solution • Cloudera can leverage the elasticity of resources in the public cloud to help businesses handle large ETL jobs, regardless of whether they are anticipated or not Details • Unexpected surges in traffic, regular batch jobs that are growing in size, and other high-volume data ingestion issues can create bottlenecks that slow insight into the business; standard on- premises ETL may have a difficult time recovering from the surge, resulting in lost data • Public cloud instances of Cloudera enable additional ETL resources to be added temporarily, overcoming the deluge
  22. 22. 22© Cloudera, Inc. All rights reserved. Burst Provisioning for ETL Keeping your Operational Database Real-Time in the Cloud Operational DBData Surge 1 Data Sources Cloud StorageData Pushed to Cloud Storage 2 Data to Cloud Instances for Batch Processing 3 AWS EBS Instances 4 Transformed Data Returned to Cloud Storage 5 Transformed Data Sent to HBase Application 6 Data Served to Application
  23. 23. 23© Cloudera, Inc. All rights reserved. Backup and Disaster Recovery Operational Database in the Public Cloud Backup and Disaster Recovery Business Challenge • Businesses struggle to create backup of their critical data, including challenges with geographical dispersion, maintenance costs, frequency of backup, etc. Cloud-Enabled Solution • Public clouds offer the ability to take frequent snapshots of the data within your CDH cluster Details • By snapshotting to a remote public cloud datacenter, companies can take advantage of geographically dispersed data copies • Cheap storage enables more frequent backups, enabling a more recent copy of data to be recovered in case of problems • Navigate issues associated with data sovereignty
  24. 24. 24© Cloudera, Inc. All rights reserved. Backup and Disaster Recovery Reduce Risk by Backing-Up Operational Database Data in the Public Cloud On-Premises Instance Cloud Object Storage1 Cloud Instance Snapshot of Data 1 Cloud Object Storage1 Restore 2 Snapshot of Data 1 Restore 2 1. S3, Azure Blob Storage, etc.
  25. 25. 25© Cloudera, Inc. All rights reserved. CDH HBase on EBS vs. EMR HBase on S3 EMR and S3 options for HBase - Amazon can run HBase on S3 and get consistency via a proprietary EMR-FS connector - Cloudera leverages EBS for HBase cloud deployments Customer aim: performance and price - S3 saves on storage costs relative to EBS, but increases compute costs as you need more EC2 instances to get to the same metrics - So, for S3 storage costs go down, but compute costs go up; few use cases can combine low cost on both axises Qualitative customer concerns - From Cloudera: High availability, automated configuration, manageability (monitoring/alerts), support (from Cloudera’s HBase developers)
  26. 26. 26© Cloudera, Inc. All rights reserved. Instance Recommendations Easy Provisioning of always on clusters: Model vCPU Mem(GiB) Storage(GB) d2.xlarge 4 30.5 3 x 2000 HDD d2.2xlarge 8 61 6 x 2000 HDD d2.4xlarge 16 122 12 x 2000 HDD d2.8xlarge 36 244 24 x 2000 HDD Data Nodes: Master Nodes: Model vCPU Mem(GiB) Storage(GB) c3.8xlarge 32 60 2 x 320 SSD Snapshot Backups: S3 Smaller versions are recommended to stagger the impact of full block reports and garbage collection. The master node memory should be sized inline with the cluster size, c3.xlarge supports very large cluster sizes but smaller master nodes are possible.
  27. 27. 27© Cloudera, Inc. All rights reserved. Instance Recommendations Transient clusters with permanent storage using Director: Model vCPU Mem(GiB) Storage C4.large 4 30.5 EBS (4000 Mbps dedicated) Data Nodes: Master Nodes: Model vCPU Mem(GiB) Storage (GB) c3.8xlarge 32 60 2 x 320 SSD Storage, throughput workloads (ETL, etc.): Volume Type Volume Size IOPS Throughput st1 500 GiB – 16 TiB 500 800 MiB/s Volume Type Volume Size IOPS Throughput io1 4 GiB – 16 TiB 20,000 800 MiB/s Storage, real time workloads (HBase, etc.): These are default recommendations; they will vary based on the specifics of each use case. We recommend deploying more than the lowest tier, SC1, as the throttling limits are reached quickly, which brings down the database.
  28. 28. 28© Cloudera, Inc. All rights reserved. Instance Recommendations Always on Spark Streaming cluster • Spark Clusters have homogenous nodes i.e. no special “master” node Model vCPU Mem(GiB) Storage m4.2xlarge 8 32 EBS (1000 Mbps dedicated) Default: Model vCPU Mem(GiB) Storage m3.2xlarge 8 61 160GB SSD Very Memory Intensive Workloads: Best balance of memory and compute Examples are workloads that cache RDDs/Dataframes or maintain in-memory state via the updateStateByKey(…) function. Model vCPU Mem(GiB) Storage c4.2xlarge 8 15 EBS (1000 Mbps dedicated) Very Compute Intensive Workloads: Examples are workloads that may perform compute intensive machine learning operations to score incoming events.
  29. 29. 29© Cloudera, Inc. All rights reserved. • Get Started with Cloudera in the Cloud: • www.cloudera.com/downloads • Learn more about Cloudera’s Operational DB: • https://www.cloudera.com/solutions/operational-database.html • Learn about Data Engineering Workloads in the Cloud: • www.cloudera.com/about-cloudera/events/webinars/cloud-webinar-series.html Next Steps
  30. 30. 30© Cloudera, Inc. All rights reserved. Thank You

×