More Related Content More from Cloudera, Inc. (20) Cloudera Enterprise 6.0 Update GA and Beyond 9.25.182. 2 © Cloudera, Inc. All rights reserved.
TODAY’S SPEAKERS
Matthew Schumpert
Product Management Director
mschumpert@cloudera.com
John Kennedy
Senior Manager
john.kennedy@cloudera.com
3. 3 © Cloudera, Inc. All rights reserved.
SUPPORTING BUSINESS OBJECTIVES
CONNECT PRODUCTS &
SERVICES (IoT)
GROW BUSINESS PROTECT BUSINESS
4. 4 © Cloudera, Inc. All rights reserved.
CLOUDERA
ENTERPRISE DATA
PLATFORM
The modern platform for
machine learning & analytics
optimized for the cloud
WORKLOADS 3RD PARTY
SERVICES
DATA
ENGINEERIN
G
DATA
SCIENCE
DATA
WAREHOUS
E
OPERATIONA
L DATABASE
DATA CATALOG
GOVERNANCESECURITY LIFECYCLE
MANAGEMENT
STORAGE
Microsoft
ADLS
COMMON SERVICES
HDFS
Amazon
S3
CONTROL
PLANE
KUDU
5. 5 © Cloudera, Inc. All rights reserved.
ENTERPRISE
GRADE
HYBRID
MODERN PLATFORM CAPABILITIES
UNIFIED
Diverse analytics
Shared experience
Any environment
Secure
Scalable
Compliant
Storage
Compute
Control
6. 6 © Cloudera, Inc. All rights reserved.
DEPLOYMENT FLEXIBILITY
PRIVATE
CLOUD
BARE METAL
SDX in EDH clusters
VIA CLOUDERA MANAGER
HDFS, KUDU
DATA
ENGINEERING
DATA
WAREHOUSE
DATA
SCIENCE
OPERATIONAL
DATABASE
HDFS, KUDU, S3, ADLS S3, ADLS
SDX Reference Architecture Altus SDX
VIA CLOUDERA ALTUS
INFRASTRUCTURE SERVICES
7. 7 © Cloudera, Inc. All rights reserved.
WORKLOADS 3RD PARTY
SERVICES
DATA
ENGINEERING
DATA
SCIENCE
DATA
WAREHOUSE
OPERATIONAL
DATABASE
DATA CATALOG
GOVERNANCESECURITY LIFECYCLE
MANAGEMENT
STORAGE
Microsoft
ADLS
COMMON SERVICES
HDFS
Amazon
S3
CONTROL
PLANE
KUDU
• Data Catalog: a comprehensive catalog of all data sets, spanning on-premises,
cloud object stores, structured, unstructured, and semi-structured. Includes
technical schemas from the Hive metastore, as well as business glossary
definitions, classifications, and usage guidance
• Security: role-based access control applied consistently across the platform
using Apache Sentry. Also includes full stack encryption and key management
• Governance: enterprise-grade auditing, lineage, and other governance
capabilities applied universally across the platform with rich extensibility for
partner integrations
• Lifecycle Management: comprehensive ingest-to-purge management of data
set lifecycle activities
• Control Plane: multi-environment cluster provisioning, deployment,
management, and troubleshooting
SHARED DATA CONTEXT SERVICES
Built for multi-function analytics anywhere
8. 8 © Cloudera, Inc. All rights reserved.
CLOUDERA 6 HIGHLIGHTS
INNOVATION
Building unified analytics applications is easier than ever by bringing the most capable and
stable versions of open-source tools with our integrated, multi-disciplinary distribution.
ENTERPRISE
QUALITY
Rather than wrangling purely open source projects, Cloudera’s enterprise customers trust
the quality control and safety that only a complete platform can offer.
PRODUCTIVITY
Enable the business to get answers more quickly—which improves data scientist and
business analyst productivity and optimizes resource utilization to accelerates analytics.
9. 10 © Cloudera, Inc. All rights reserved.
CLOUDERA 6 IS NOW GENERALLY AVAILABLE
A giant leap forward in our open source core
CLOUDERA MANAGER 6.0 • CLOUDERA NAVIGATOR
6.0
CLOUDERA DIRECTOR 6.0
HADOOP 3.0 HBASE 2.0HIVE 2.1 PARQUET 1.9SPARK 2.2
SOLR 7.0 SENTRY 2.0OOZIE 5.0 AVRO 1.8KAFKA 1.0
FLUME 1.8 HUE 4.2SQOOP 1.4
10. 11 © Cloudera, Inc. All rights reserved.
PARTNERS CERTIFIED ON CDH6
Arcadia Data provides the first native visual
analytics software that runs within modern data
platforms for optimal scale, performance, and
security.
Syncsort organizes data everywhere, to keep
the world working – the same data that powers
machine learning, AI and predictive analytics.
Zoomdata enables the fastest visual analytics
for big data. Immerse yourself in dynamic
visualizations that unfold the story in front of
you.
11. 12 © Cloudera, Inc. All rights reserved.
CLOUDERA
MANAGER 6
Fine-grained Admin Controls
Assign isolated administrative
privileges for each cluster under
management in order to improve
efficiency and reduce risk
Automated Wire Encryption
Reduce risk and administrative effort
by automatically configuring TLS wire
encryption for a wide variety of CDH
components
Scale
Manage up to 2,500 nodes with a
single Cloudera Manager instance
blah blah blah blah blah blah blah blah
blah blah
Upgrade from C5
Simplify upgrades from CDH5 with
pre-upgrade validations and
environment-specific upgrade docs
• Improve scale
• Improve efficiency
• Reduce risk
• Upgrade simplicity
12. 13 © Cloudera, Inc. All rights reserved.
SOLR 7
JSON Facet API
• Richer analytics capabilities &
more fine grained partitions
lead to deeper insights on
unstructured data
Streaming Expressions
• A new approach to processing
queries and indexes
• More powerful compute on the
entire matching data set: time
series, math functions, NLP
and much more
13. 14 © Cloudera, Inc. All rights reserved.
HBASE 2.0
Manageability
• New assignment manager
• Simpler replication configuration
• New CLI commands
• New compaction tool
• Improved metrics
Reliability
• Over 2,000 bug fixes
• Operational simplicity
Performance
• Avoid java heap for caching and
read paths
• Multi-threaded old file cleanup
• Concurrent prefetch of data
14. 15 © Cloudera, Inc. All rights reserved.
HIVE 2.1
Better Debugging
• Faster surfacing of issues leads to
tighter controls and enhanced
cluster stability
Parquet Vectorization
• 20% to 80% performance increase
API Standardization
• Elimination of costly app rewrites
increases developer trust and
efficiency
• Increased productivity
• Improved performance
• Enterprise readiness
15. 16 © Cloudera, Inc. All rights reserved.
HUE 4.2
Self Service Analytics
• Intelligent Table Discovery Wizards
• Index Creation Designers
• Query Design Assists & Hints
Seamless Business User
Experience
• 360 degree insight for structured AND
unstructured data
• Optimized UI look and feel - shorter time
to get started and get to answers for non-
technical users
16. 17 © Cloudera, Inc. All rights reserved.
POLL 1: WHICH C6 UPDATES ARE YOU LOOKING FORWARD TO
MOST?
Multiple choice, multiple answer
• Cloudera Manager 6
• Solr 7.0
• Hbase 2.0
• Hive 2.1
• Hue 4.2
• Other
18. 19 © Cloudera, Inc. All rights reserved.
HDFS ERASURE
CODING
Why
• Cut storage costs in half
Considerations
• Relative data temperature
• Relative availability of spare storage
capacity vs. spare network capacity
• Access to Intel CPUs with ISA-l
Typical usage
• Enable EC for cold directories
• Migrate data from hot to cold
directories over time using distcp
• Update CDH services to read
new directories (e.g. Hive Metastore)
Relative job performance with
EC
• Write-only jobs are faster because
less data to write
• Read-only jobs are about the same
• Typical job performance is slightly
faster with Erasure Coding
Relative reliability
• Supports up to 2 node failures
without data loss (just like 3x
replication)
• Parity can be increased via
configuration
19. 20 © Cloudera, Inc. All rights reserved.
YARN
ENHANCEMENTS
Resource Types*
• Extend YARN’s view of
consumable resources per node
beyond vCores & Memory with
custom resources types
• Examples: GPUs, FPGAs
• Example: “Node with R licenses”
Oozie on YARN
• Improve Oozie runtime performance
• Simplify debugging
* Roadmap
20. 21 © Cloudera, Inc. All rights reserved.
SOLR 7 SQL Interface
• Enables searching Solr indexed data
using SQL queries
• Deeper insight over combined structured
and unstructured data
Graph Query
• New execution framework allows more
powerful processing
CDCR
• There is no need for this Solr feature in
Cloudera Search as we have multiple
other more scalable options to replicate
data across DCs
* Roadmap
22. 23 © Cloudera, Inc. All rights reserved.
C6
DEPRECIATIONS
Java versions
• Oracle JDK 1.7
Operating Systems
• Red Hat Enterprise Linux 5
• CentOS 5
• Oracle Linux 5 (both RHCK & UEK)
• SLES 11
• All Debian versions
(Ubuntu continues to be supported)
Databases
• Oracle 11g
• Mysql 5.0, 5.1
• Postgresql 8.1, 8.4
Cloudera Enterprise
• Cloudera’s Distribution of Kafka (CDK) 1.x
(includes Apache Kafka 0.8.x)
• Legacy Scala clients for Kafka
• Flume Receiver in Spark
• HBaseSink in Flume
(replaced by HBase2Sink)
• Multi Cloudera Manager Dashboard
• Kite Dataset API
• Crunch
• Hive’s
org.apache.hadoop.hive.ql.exec.UDF API
23. 24 © Cloudera, Inc. All rights reserved.
C6
REMOVALS
• DataFu
• Some Solr 4 features, data types, and
APIs are no longer supported in Solr 7
(we have a scan tool to help you
detect most common ones)
• Management of Key Trustee
Server without Cloudera Manager
• YARN Capacity Scheduler
• MapReduce Pipes
• Hue 3 Old interface and editor
• Sqoop 2
• Spark 1.x
• Flume AsyncHBaseSink
• All classes in com.cloudera.sqoop
packages
• Multi Cloudera Manager Dashboard
• Llama
• MapReduce 1
• Spark Standalone mode
• Mahout
• Whirr
• Old NameNode UI
• Navigator Encrypt File-Level
Encryption using eCryptfs
• Parquet Libraries under parquet.* Java
package: Renamed to
org.apache.parquet.*
• CDH Tarball Distribution
• CM Tarball Distribution
• Sentry Policy Files
24. 25 © Cloudera, Inc. All rights reserved.
C6 UPGRADE: REQUIREMENTS
• Upgrading to CM 6 requires no CDH downtime (rolling restart)
• Upgrading to CDH 6 requires full cluster downtime
• Manual rollback will be documented
• Automated downgrade not possible
• Upgrading from C6 Beta to C6 GA not supported
25. 26 © Cloudera, Inc. All rights reserved.
SUPPORTED PLATFORMS
OS Specific
26. 27 © Cloudera, Inc. All rights reserved.
INFRASTRUCTURE UPGRADES
Before C6 upgrade
1. Review your current versions of OS and JDK
2. Plan what your final state for OS versions and JDK needs to be
a. You need to be on JDK 8.
3. Execute the upgrade of OS and JDK on all hosts
4. Begin planning and then execute your Cloudera Manager Upgrade
5. Begin planning then execute your Cloudera CDH Upgrade.
27. 28 © Cloudera, Inc. All rights reserved.
KEY POINTS
Where to get more information on Upgrading
28. 29 © Cloudera, Inc. All rights reserved.
KEY POINTS
High-level guidance
29. 30 © Cloudera, Inc. All rights reserved.
KEY POINTS
Producing specific Upgrade steps for your setup
30. 31 © Cloudera, Inc. All rights reserved.
KEY POINTS
• Interactive UI produces specific technical steps for your upgrade path,
• In one place
31. 32 © Cloudera, Inc. All rights reserved.
POLL 2: WHEN DO YOU EXPECT TO UPGRADE TO C6?
Multiple choice, single answer
• This month
• This quarter
• Next quarter
• Next year
• Once 6.x is available
• Don’t know
32. 33 © Cloudera, Inc. All rights reserved.
GET CLOUDERA ENTERPRISE 6 TODAY