SlideShare a Scribd company logo
1 of 33
CLOUDERA ENTERPRISE 6.0 UPDATE:
GA AND BEYOND
2 © Cloudera, Inc. All rights reserved.
TODAY’S SPEAKERS
Matthew Schumpert
Product Management Director
mschumpert@cloudera.com
John Kennedy
Senior Manager
john.kennedy@cloudera.com
3 © Cloudera, Inc. All rights reserved.
SUPPORTING BUSINESS OBJECTIVES
CONNECT PRODUCTS &
SERVICES (IoT)
GROW BUSINESS PROTECT BUSINESS
4 © Cloudera, Inc. All rights reserved.
CLOUDERA
ENTERPRISE DATA
PLATFORM
The modern platform for
machine learning & analytics
optimized for the cloud
WORKLOADS 3RD PARTY
SERVICES
DATA
ENGINEERIN
G
DATA
SCIENCE
DATA
WAREHOUS
E
OPERATIONA
L DATABASE
DATA CATALOG
GOVERNANCESECURITY LIFECYCLE
MANAGEMENT
STORAGE
Microsoft
ADLS
COMMON SERVICES
HDFS
Amazon
S3
CONTROL
PLANE
KUDU
5 © Cloudera, Inc. All rights reserved.
ENTERPRISE
GRADE
HYBRID
MODERN PLATFORM CAPABILITIES
UNIFIED
 Diverse analytics
 Shared experience
 Any environment
 Secure
 Scalable
 Compliant
 Storage
 Compute
 Control
6 © Cloudera, Inc. All rights reserved.
DEPLOYMENT FLEXIBILITY
PRIVATE
CLOUD
BARE METAL
SDX in EDH clusters
VIA CLOUDERA MANAGER
HDFS, KUDU
DATA
ENGINEERING
DATA
WAREHOUSE
DATA
SCIENCE
OPERATIONAL
DATABASE
HDFS, KUDU, S3, ADLS S3, ADLS
SDX Reference Architecture Altus SDX
VIA CLOUDERA ALTUS
INFRASTRUCTURE SERVICES
7 © Cloudera, Inc. All rights reserved.
WORKLOADS 3RD PARTY
SERVICES
DATA
ENGINEERING
DATA
SCIENCE
DATA
WAREHOUSE
OPERATIONAL
DATABASE
DATA CATALOG
GOVERNANCESECURITY LIFECYCLE
MANAGEMENT
STORAGE
Microsoft
ADLS
COMMON SERVICES
HDFS
Amazon
S3
CONTROL
PLANE
KUDU
• Data Catalog: a comprehensive catalog of all data sets, spanning on-premises,
cloud object stores, structured, unstructured, and semi-structured. Includes
technical schemas from the Hive metastore, as well as business glossary
definitions, classifications, and usage guidance
• Security: role-based access control applied consistently across the platform
using Apache Sentry. Also includes full stack encryption and key management
• Governance: enterprise-grade auditing, lineage, and other governance
capabilities applied universally across the platform with rich extensibility for
partner integrations
• Lifecycle Management: comprehensive ingest-to-purge management of data
set lifecycle activities
• Control Plane: multi-environment cluster provisioning, deployment,
management, and troubleshooting
SHARED DATA CONTEXT SERVICES
Built for multi-function analytics anywhere
8 © Cloudera, Inc. All rights reserved.
CLOUDERA 6 HIGHLIGHTS
INNOVATION
Building unified analytics applications is easier than ever by bringing the most capable and
stable versions of open-source tools with our integrated, multi-disciplinary distribution.
ENTERPRISE
QUALITY
Rather than wrangling purely open source projects, Cloudera’s enterprise customers trust
the quality control and safety that only a complete platform can offer.
PRODUCTIVITY
Enable the business to get answers more quickly—which improves data scientist and
business analyst productivity and optimizes resource utilization to accelerates analytics.
10 © Cloudera, Inc. All rights reserved.
CLOUDERA 6 IS NOW GENERALLY AVAILABLE
A giant leap forward in our open source core
CLOUDERA MANAGER 6.0 • CLOUDERA NAVIGATOR
6.0
CLOUDERA DIRECTOR 6.0
HADOOP 3.0 HBASE 2.0HIVE 2.1 PARQUET 1.9SPARK 2.2
SOLR 7.0 SENTRY 2.0OOZIE 5.0 AVRO 1.8KAFKA 1.0
FLUME 1.8 HUE 4.2SQOOP 1.4
11 © Cloudera, Inc. All rights reserved.
PARTNERS CERTIFIED ON CDH6
Arcadia Data provides the first native visual
analytics software that runs within modern data
platforms for optimal scale, performance, and
security.
Syncsort organizes data everywhere, to keep
the world working – the same data that powers
machine learning, AI and predictive analytics.
Zoomdata enables the fastest visual analytics
for big data. Immerse yourself in dynamic
visualizations that unfold the story in front of
you.
12 © Cloudera, Inc. All rights reserved.
CLOUDERA
MANAGER 6
Fine-grained Admin Controls
Assign isolated administrative
privileges for each cluster under
management in order to improve
efficiency and reduce risk
Automated Wire Encryption
Reduce risk and administrative effort
by automatically configuring TLS wire
encryption for a wide variety of CDH
components
Scale
Manage up to 2,500 nodes with a
single Cloudera Manager instance
blah blah blah blah blah blah blah blah
blah blah
Upgrade from C5
Simplify upgrades from CDH5 with
pre-upgrade validations and
environment-specific upgrade docs
• Improve scale
• Improve efficiency
• Reduce risk
• Upgrade simplicity
13 © Cloudera, Inc. All rights reserved.
SOLR 7
JSON Facet API
• Richer analytics capabilities &
more fine grained partitions
lead to deeper insights on
unstructured data
Streaming Expressions
• A new approach to processing
queries and indexes
• More powerful compute on the
entire matching data set: time
series, math functions, NLP
and much more
14 © Cloudera, Inc. All rights reserved.
HBASE 2.0
Manageability
• New assignment manager
• Simpler replication configuration
• New CLI commands
• New compaction tool
• Improved metrics
Reliability
• Over 2,000 bug fixes
• Operational simplicity
Performance
• Avoid java heap for caching and
read paths
• Multi-threaded old file cleanup
• Concurrent prefetch of data
15 © Cloudera, Inc. All rights reserved.
HIVE 2.1
Better Debugging
• Faster surfacing of issues leads to
tighter controls and enhanced
cluster stability
Parquet Vectorization
• 20% to 80% performance increase
API Standardization
• Elimination of costly app rewrites
increases developer trust and
efficiency
• Increased productivity
• Improved performance
• Enterprise readiness
16 © Cloudera, Inc. All rights reserved.
HUE 4.2
Self Service Analytics
• Intelligent Table Discovery Wizards
• Index Creation Designers
• Query Design Assists & Hints
Seamless Business User
Experience
• 360 degree insight for structured AND
unstructured data
• Optimized UI look and feel - shorter time
to get started and get to answers for non-
technical users
17 © Cloudera, Inc. All rights reserved.
POLL 1: WHICH C6 UPDATES ARE YOU LOOKING FORWARD TO
MOST?
Multiple choice, multiple answer
• Cloudera Manager 6
• Solr 7.0
• Hbase 2.0
• Hive 2.1
• Hue 4.2
• Other
© Cloudera, Inc. All rights reserved.
COMING SOON ...
19 © Cloudera, Inc. All rights reserved.
HDFS ERASURE
CODING
Why
• Cut storage costs in half
Considerations
• Relative data temperature
• Relative availability of spare storage
capacity vs. spare network capacity
• Access to Intel CPUs with ISA-l
Typical usage
• Enable EC for cold directories
• Migrate data from hot to cold
directories over time using distcp
• Update CDH services to read
new directories (e.g. Hive Metastore)
Relative job performance with
EC
• Write-only jobs are faster because
less data to write
• Read-only jobs are about the same
• Typical job performance is slightly
faster with Erasure Coding
Relative reliability
• Supports up to 2 node failures
without data loss (just like 3x
replication)
• Parity can be increased via
configuration
20 © Cloudera, Inc. All rights reserved.
YARN
ENHANCEMENTS
Resource Types*
• Extend YARN’s view of
consumable resources per node
beyond vCores & Memory with
custom resources types
• Examples: GPUs, FPGAs
• Example: “Node with R licenses”
Oozie on YARN
• Improve Oozie runtime performance
• Simplify debugging
* Roadmap
21 © Cloudera, Inc. All rights reserved.
SOLR 7 SQL Interface
• Enables searching Solr indexed data
using SQL queries
• Deeper insight over combined structured
and unstructured data
Graph Query
• New execution framework allows more
powerful processing
CDCR
• There is no need for this Solr feature in
Cloudera Search as we have multiple
other more scalable options to replicate
data across DCs
* Roadmap
© Cloudera, Inc. All rights reserved.
PRACTICALITIES ...
23 © Cloudera, Inc. All rights reserved.
C6
DEPRECIATIONS
Java versions
• Oracle JDK 1.7
Operating Systems
• Red Hat Enterprise Linux 5
• CentOS 5
• Oracle Linux 5 (both RHCK & UEK)
• SLES 11
• All Debian versions
(Ubuntu continues to be supported)
Databases
• Oracle 11g
• Mysql 5.0, 5.1
• Postgresql 8.1, 8.4
Cloudera Enterprise
• Cloudera’s Distribution of Kafka (CDK) 1.x
(includes Apache Kafka 0.8.x)
• Legacy Scala clients for Kafka
• Flume Receiver in Spark
• HBaseSink in Flume
(replaced by HBase2Sink)
• Multi Cloudera Manager Dashboard
• Kite Dataset API
• Crunch
• Hive’s
org.apache.hadoop.hive.ql.exec.UDF API
24 © Cloudera, Inc. All rights reserved.
C6
REMOVALS
• DataFu
• Some Solr 4 features, data types, and
APIs are no longer supported in Solr 7
(we have a scan tool to help you
detect most common ones)
• Management of Key Trustee
Server without Cloudera Manager
• YARN Capacity Scheduler
• MapReduce Pipes
• Hue 3 Old interface and editor
• Sqoop 2
• Spark 1.x
• Flume AsyncHBaseSink
• All classes in com.cloudera.sqoop
packages
• Multi Cloudera Manager Dashboard
• Llama
• MapReduce 1
• Spark Standalone mode
• Mahout
• Whirr
• Old NameNode UI
• Navigator Encrypt File-Level
Encryption using eCryptfs
• Parquet Libraries under parquet.* Java
package: Renamed to
org.apache.parquet.*
• CDH Tarball Distribution
• CM Tarball Distribution
• Sentry Policy Files
25 © Cloudera, Inc. All rights reserved.
C6 UPGRADE: REQUIREMENTS
• Upgrading to CM 6 requires no CDH downtime (rolling restart)
• Upgrading to CDH 6 requires full cluster downtime
• Manual rollback will be documented
• Automated downgrade not possible
• Upgrading from C6 Beta to C6 GA not supported
26 © Cloudera, Inc. All rights reserved.
SUPPORTED PLATFORMS
OS Specific
27 © Cloudera, Inc. All rights reserved.
INFRASTRUCTURE UPGRADES
Before C6 upgrade
1. Review your current versions of OS and JDK
2. Plan what your final state for OS versions and JDK needs to be
a. You need to be on JDK 8.
3. Execute the upgrade of OS and JDK on all hosts
4. Begin planning and then execute your Cloudera Manager Upgrade
5. Begin planning then execute your Cloudera CDH Upgrade.
28 © Cloudera, Inc. All rights reserved.
KEY POINTS
Where to get more information on Upgrading
29 © Cloudera, Inc. All rights reserved.
KEY POINTS
High-level guidance
30 © Cloudera, Inc. All rights reserved.
KEY POINTS
Producing specific Upgrade steps for your setup
31 © Cloudera, Inc. All rights reserved.
KEY POINTS
• Interactive UI produces specific technical steps for your upgrade path,
• In one place
32 © Cloudera, Inc. All rights reserved.
POLL 2: WHEN DO YOU EXPECT TO UPGRADE TO C6?
Multiple choice, single answer
• This month
• This quarter
• Next quarter
• Next year
• Once 6.x is available
• Don’t know
33 © Cloudera, Inc. All rights reserved.
GET CLOUDERA ENTERPRISE 6 TODAY
THANK YOU

More Related Content

More from Cloudera, Inc.

More from Cloudera, Inc. (20)

Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 
Cloudera SDX
Cloudera SDXCloudera SDX
Cloudera SDX
 
Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solution
 
Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18
 
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18
 
How Cloudera SDX can aid GDPR compliance
How Cloudera SDX can aid GDPR complianceHow Cloudera SDX can aid GDPR compliance
How Cloudera SDX can aid GDPR compliance
 
When SAP alone is not enough
When SAP alone is not enoughWhen SAP alone is not enough
When SAP alone is not enough
 
Multi task learning stepping away from narrow expert models 7.11.18
Multi task learning stepping away from narrow expert models 7.11.18Multi task learning stepping away from narrow expert models 7.11.18
Multi task learning stepping away from narrow expert models 7.11.18
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

Cloudera Enterprise 6.0 Update GA and Beyond 9.25.18

  • 1. CLOUDERA ENTERPRISE 6.0 UPDATE: GA AND BEYOND
  • 2. 2 © Cloudera, Inc. All rights reserved. TODAY’S SPEAKERS Matthew Schumpert Product Management Director mschumpert@cloudera.com John Kennedy Senior Manager john.kennedy@cloudera.com
  • 3. 3 © Cloudera, Inc. All rights reserved. SUPPORTING BUSINESS OBJECTIVES CONNECT PRODUCTS & SERVICES (IoT) GROW BUSINESS PROTECT BUSINESS
  • 4. 4 © Cloudera, Inc. All rights reserved. CLOUDERA ENTERPRISE DATA PLATFORM The modern platform for machine learning & analytics optimized for the cloud WORKLOADS 3RD PARTY SERVICES DATA ENGINEERIN G DATA SCIENCE DATA WAREHOUS E OPERATIONA L DATABASE DATA CATALOG GOVERNANCESECURITY LIFECYCLE MANAGEMENT STORAGE Microsoft ADLS COMMON SERVICES HDFS Amazon S3 CONTROL PLANE KUDU
  • 5. 5 © Cloudera, Inc. All rights reserved. ENTERPRISE GRADE HYBRID MODERN PLATFORM CAPABILITIES UNIFIED  Diverse analytics  Shared experience  Any environment  Secure  Scalable  Compliant  Storage  Compute  Control
  • 6. 6 © Cloudera, Inc. All rights reserved. DEPLOYMENT FLEXIBILITY PRIVATE CLOUD BARE METAL SDX in EDH clusters VIA CLOUDERA MANAGER HDFS, KUDU DATA ENGINEERING DATA WAREHOUSE DATA SCIENCE OPERATIONAL DATABASE HDFS, KUDU, S3, ADLS S3, ADLS SDX Reference Architecture Altus SDX VIA CLOUDERA ALTUS INFRASTRUCTURE SERVICES
  • 7. 7 © Cloudera, Inc. All rights reserved. WORKLOADS 3RD PARTY SERVICES DATA ENGINEERING DATA SCIENCE DATA WAREHOUSE OPERATIONAL DATABASE DATA CATALOG GOVERNANCESECURITY LIFECYCLE MANAGEMENT STORAGE Microsoft ADLS COMMON SERVICES HDFS Amazon S3 CONTROL PLANE KUDU • Data Catalog: a comprehensive catalog of all data sets, spanning on-premises, cloud object stores, structured, unstructured, and semi-structured. Includes technical schemas from the Hive metastore, as well as business glossary definitions, classifications, and usage guidance • Security: role-based access control applied consistently across the platform using Apache Sentry. Also includes full stack encryption and key management • Governance: enterprise-grade auditing, lineage, and other governance capabilities applied universally across the platform with rich extensibility for partner integrations • Lifecycle Management: comprehensive ingest-to-purge management of data set lifecycle activities • Control Plane: multi-environment cluster provisioning, deployment, management, and troubleshooting SHARED DATA CONTEXT SERVICES Built for multi-function analytics anywhere
  • 8. 8 © Cloudera, Inc. All rights reserved. CLOUDERA 6 HIGHLIGHTS INNOVATION Building unified analytics applications is easier than ever by bringing the most capable and stable versions of open-source tools with our integrated, multi-disciplinary distribution. ENTERPRISE QUALITY Rather than wrangling purely open source projects, Cloudera’s enterprise customers trust the quality control and safety that only a complete platform can offer. PRODUCTIVITY Enable the business to get answers more quickly—which improves data scientist and business analyst productivity and optimizes resource utilization to accelerates analytics.
  • 9. 10 © Cloudera, Inc. All rights reserved. CLOUDERA 6 IS NOW GENERALLY AVAILABLE A giant leap forward in our open source core CLOUDERA MANAGER 6.0 • CLOUDERA NAVIGATOR 6.0 CLOUDERA DIRECTOR 6.0 HADOOP 3.0 HBASE 2.0HIVE 2.1 PARQUET 1.9SPARK 2.2 SOLR 7.0 SENTRY 2.0OOZIE 5.0 AVRO 1.8KAFKA 1.0 FLUME 1.8 HUE 4.2SQOOP 1.4
  • 10. 11 © Cloudera, Inc. All rights reserved. PARTNERS CERTIFIED ON CDH6 Arcadia Data provides the first native visual analytics software that runs within modern data platforms for optimal scale, performance, and security. Syncsort organizes data everywhere, to keep the world working – the same data that powers machine learning, AI and predictive analytics. Zoomdata enables the fastest visual analytics for big data. Immerse yourself in dynamic visualizations that unfold the story in front of you.
  • 11. 12 © Cloudera, Inc. All rights reserved. CLOUDERA MANAGER 6 Fine-grained Admin Controls Assign isolated administrative privileges for each cluster under management in order to improve efficiency and reduce risk Automated Wire Encryption Reduce risk and administrative effort by automatically configuring TLS wire encryption for a wide variety of CDH components Scale Manage up to 2,500 nodes with a single Cloudera Manager instance blah blah blah blah blah blah blah blah blah blah Upgrade from C5 Simplify upgrades from CDH5 with pre-upgrade validations and environment-specific upgrade docs • Improve scale • Improve efficiency • Reduce risk • Upgrade simplicity
  • 12. 13 © Cloudera, Inc. All rights reserved. SOLR 7 JSON Facet API • Richer analytics capabilities & more fine grained partitions lead to deeper insights on unstructured data Streaming Expressions • A new approach to processing queries and indexes • More powerful compute on the entire matching data set: time series, math functions, NLP and much more
  • 13. 14 © Cloudera, Inc. All rights reserved. HBASE 2.0 Manageability • New assignment manager • Simpler replication configuration • New CLI commands • New compaction tool • Improved metrics Reliability • Over 2,000 bug fixes • Operational simplicity Performance • Avoid java heap for caching and read paths • Multi-threaded old file cleanup • Concurrent prefetch of data
  • 14. 15 © Cloudera, Inc. All rights reserved. HIVE 2.1 Better Debugging • Faster surfacing of issues leads to tighter controls and enhanced cluster stability Parquet Vectorization • 20% to 80% performance increase API Standardization • Elimination of costly app rewrites increases developer trust and efficiency • Increased productivity • Improved performance • Enterprise readiness
  • 15. 16 © Cloudera, Inc. All rights reserved. HUE 4.2 Self Service Analytics • Intelligent Table Discovery Wizards • Index Creation Designers • Query Design Assists & Hints Seamless Business User Experience • 360 degree insight for structured AND unstructured data • Optimized UI look and feel - shorter time to get started and get to answers for non- technical users
  • 16. 17 © Cloudera, Inc. All rights reserved. POLL 1: WHICH C6 UPDATES ARE YOU LOOKING FORWARD TO MOST? Multiple choice, multiple answer • Cloudera Manager 6 • Solr 7.0 • Hbase 2.0 • Hive 2.1 • Hue 4.2 • Other
  • 17. © Cloudera, Inc. All rights reserved. COMING SOON ...
  • 18. 19 © Cloudera, Inc. All rights reserved. HDFS ERASURE CODING Why • Cut storage costs in half Considerations • Relative data temperature • Relative availability of spare storage capacity vs. spare network capacity • Access to Intel CPUs with ISA-l Typical usage • Enable EC for cold directories • Migrate data from hot to cold directories over time using distcp • Update CDH services to read new directories (e.g. Hive Metastore) Relative job performance with EC • Write-only jobs are faster because less data to write • Read-only jobs are about the same • Typical job performance is slightly faster with Erasure Coding Relative reliability • Supports up to 2 node failures without data loss (just like 3x replication) • Parity can be increased via configuration
  • 19. 20 © Cloudera, Inc. All rights reserved. YARN ENHANCEMENTS Resource Types* • Extend YARN’s view of consumable resources per node beyond vCores & Memory with custom resources types • Examples: GPUs, FPGAs • Example: “Node with R licenses” Oozie on YARN • Improve Oozie runtime performance • Simplify debugging * Roadmap
  • 20. 21 © Cloudera, Inc. All rights reserved. SOLR 7 SQL Interface • Enables searching Solr indexed data using SQL queries • Deeper insight over combined structured and unstructured data Graph Query • New execution framework allows more powerful processing CDCR • There is no need for this Solr feature in Cloudera Search as we have multiple other more scalable options to replicate data across DCs * Roadmap
  • 21. © Cloudera, Inc. All rights reserved. PRACTICALITIES ...
  • 22. 23 © Cloudera, Inc. All rights reserved. C6 DEPRECIATIONS Java versions • Oracle JDK 1.7 Operating Systems • Red Hat Enterprise Linux 5 • CentOS 5 • Oracle Linux 5 (both RHCK & UEK) • SLES 11 • All Debian versions (Ubuntu continues to be supported) Databases • Oracle 11g • Mysql 5.0, 5.1 • Postgresql 8.1, 8.4 Cloudera Enterprise • Cloudera’s Distribution of Kafka (CDK) 1.x (includes Apache Kafka 0.8.x) • Legacy Scala clients for Kafka • Flume Receiver in Spark • HBaseSink in Flume (replaced by HBase2Sink) • Multi Cloudera Manager Dashboard • Kite Dataset API • Crunch • Hive’s org.apache.hadoop.hive.ql.exec.UDF API
  • 23. 24 © Cloudera, Inc. All rights reserved. C6 REMOVALS • DataFu • Some Solr 4 features, data types, and APIs are no longer supported in Solr 7 (we have a scan tool to help you detect most common ones) • Management of Key Trustee Server without Cloudera Manager • YARN Capacity Scheduler • MapReduce Pipes • Hue 3 Old interface and editor • Sqoop 2 • Spark 1.x • Flume AsyncHBaseSink • All classes in com.cloudera.sqoop packages • Multi Cloudera Manager Dashboard • Llama • MapReduce 1 • Spark Standalone mode • Mahout • Whirr • Old NameNode UI • Navigator Encrypt File-Level Encryption using eCryptfs • Parquet Libraries under parquet.* Java package: Renamed to org.apache.parquet.* • CDH Tarball Distribution • CM Tarball Distribution • Sentry Policy Files
  • 24. 25 © Cloudera, Inc. All rights reserved. C6 UPGRADE: REQUIREMENTS • Upgrading to CM 6 requires no CDH downtime (rolling restart) • Upgrading to CDH 6 requires full cluster downtime • Manual rollback will be documented • Automated downgrade not possible • Upgrading from C6 Beta to C6 GA not supported
  • 25. 26 © Cloudera, Inc. All rights reserved. SUPPORTED PLATFORMS OS Specific
  • 26. 27 © Cloudera, Inc. All rights reserved. INFRASTRUCTURE UPGRADES Before C6 upgrade 1. Review your current versions of OS and JDK 2. Plan what your final state for OS versions and JDK needs to be a. You need to be on JDK 8. 3. Execute the upgrade of OS and JDK on all hosts 4. Begin planning and then execute your Cloudera Manager Upgrade 5. Begin planning then execute your Cloudera CDH Upgrade.
  • 27. 28 © Cloudera, Inc. All rights reserved. KEY POINTS Where to get more information on Upgrading
  • 28. 29 © Cloudera, Inc. All rights reserved. KEY POINTS High-level guidance
  • 29. 30 © Cloudera, Inc. All rights reserved. KEY POINTS Producing specific Upgrade steps for your setup
  • 30. 31 © Cloudera, Inc. All rights reserved. KEY POINTS • Interactive UI produces specific technical steps for your upgrade path, • In one place
  • 31. 32 © Cloudera, Inc. All rights reserved. POLL 2: WHEN DO YOU EXPECT TO UPGRADE TO C6? Multiple choice, single answer • This month • This quarter • Next quarter • Next year • Once 6.x is available • Don’t know
  • 32. 33 © Cloudera, Inc. All rights reserved. GET CLOUDERA ENTERPRISE 6 TODAY