Contenu connexe
Similaire à Stl meetup cloudera platform - january 2020 (20)
Stl meetup cloudera platform - january 2020
- 2. © 2019 Cloudera, Inc. All rights reserved. 2
DATA MANAGEMENT IS SPREAD ALL OVER
Where organizations manage data
Source: Harvard Business Review Analytic Services Survey – June 2019
47%
On-premises
32%
Private cloud
26%
Hybrid cloud
26%
Multi-cloud
26%
Single cloud
- 4. © 2019 Cloudera, Inc. All rights reserved. 4
up to 40%Shadow IT as a % of overall IT spend
- 7. © 2019 Cloudera, Inc. All rights reserved. 7
DATA TEAMS ARE HIGHLY SPECIALIZED
App DevelopersData Engineers
Compliance Mgrs.Data Architects
BI Analysts Data Scientists
Infrastructure Mgrs.
- 8. © 2019 Cloudera, Inc. All rights reserved. 8
App DevelopersData Engineers
Compliance Mgrs.Data Architects
BI Analysts Data Scientists
Infrastructure Mgrs.
SPECIALIZATION CREATES A DIVERSITY OF NEEDS
Continuous availability,
custom tooling
Capacity guarantees to
enable consist SLAs
Capacity on demand to
support bursty workloads
Latest tools and hardware,
unpredictable capacity
Single-source-of-truth Privacy and verifiable auditReliability, cost, & scale
- 9. © 2019 Cloudera, Inc. All rights reserved. 9
“ONE-SIZE-FITS-ALL” PITS THE BUSINESS VS. IT
VS
- 10. © 2019 Cloudera, Inc. All rights reserved. 10
“SHADOW IT” POINT SOLUTIONS LEAD TO CHAOS
??
- 11. © 2019 Cloudera, Inc. All rights reserved. 11
App Developers
Data ArchitectsCompliance Mgrs. Infrastructure Mgrs.
Centralized Data, Security,
Governance and Management
Data Engineers BI Analysts Data Scientists
CUSTOM ENVIRONMENTS CUSTOM ENVIRONMENTS CUSTOM ENVIRONMENTS CUSTOM ENVIRONMENTS
ANSWER: CENTRALIZE CONTROL + CUSTOMIZE ENVIRONMENTS
- 12. © 2019 Cloudera, Inc. All rights reserved. 12
A DATA PLATFORM OPTIMIZED FOR THE BEST OF BOTH
Cloudera Data Platform
SDX
App Developers
Data ArchitectsCompliance Mgrs. Infrastructure Mgrs.
Centralized Data, Security,
Governance and Management
Data Engineers BI Analysts Data Scientists
CUSTOM ENVIRONMENTS CUSTOM ENVIRONMENTS CUSTOM ENVIRONMENTS CUSTOM ENVIRONMENTS
- 14. © 2019 Cloudera, Inc. All rights reserved. 14
WHAT DOES ENTERPRISE IT NEED TO “SAY YES” TO THE
BUSINESS?
- 15. © 2019 Cloudera, Inc. All rights reserved. 15
CLOUD
EXPERIENCE
ARCHITECTURE & TECHNOLOGY REQUIREMENTS
COMPUTE
& STORAGE
KUBERNETES
& CONTAINERS
STREAMING
& ML/AI
- 17. © 2019 Cloudera, Inc. All rights reserved. 17
3 INITIAL CDP CLOUD SERVICES
Empowering Enterprise IT to deliver at the speed of business
Data
Hub
Data
Warehouse
Machine
Learning
- 19. © 2019 Cloudera, Inc. All rights reserved. 19
• On-premises and public cloud
• Multi-cloud and multi-function
• Simple to use and secure by design
• Manual and automated
• Open and extensible
• For data engineers and data scientists
CLOUDERA DATA PLATFORM
THE POWER OF “AND”
- 20. © 2019 Cloudera, Inc. All rights reserved. 20
NEWS
Recent News
• Open-source licensing
• Streams management
• CDP availability on AWS
• CDP availability on Azure
• CDP Data Center Edition
Coming Soon
• Workload XM on prem
• CDP Data Hub adding
Flow and Stream clusters
- 22. © 2019 Cloudera, Inc. All rights reserved. 22
ONE PLATFORM – TWO FORM FACTORS
CDP Public Cloud
(platform-as-a-service)
CDP On-Prem
(installable software)
- 23. © 2019 Cloudera, Inc. All rights reserved. 23
CDP PUBLIC CLOUD ARCHITECTURE
Management Console
Management Console - A single pane of glass to manage one or more
environments and the services that run within each environment
Environment
SDX
Data
Hub
Clusters
DW
Clusters
ML
Clusters
DataHub
Clusters
CDW
Clusters
CML
Clusters
Environment - A logical encapsulation of a customer network and the the
services that run within that network (like an Azure virtual network)
Cluster – A distributed computing service that running on VMs (Data
Hub) or K8s (the experiences) and has access the shared data lake
SDX – The data access control layer that sits on top of the backend
object store and provides coherent data security and governance for all
the applications running with the environment
Data Catalog
Workload
Manager
Replication
Manager
- 24. © 2019 Cloudera, Inc. All rights reserved. 24
CDP ON-PREM
ARCHITECTURE
CDP Data Center
Storage
SDX
Traditional Workloads
Servers
Containers
CDP Private Cloud
Container Cloud
Data
Hub
CDWCML ...
Management Console
Workload
Manager
Data
Catalog
Replication
Manager
- 25. © 2019 Cloudera, Inc. All rights reserved. 25
BURST TO CLOUD
• Workload Manager identifies
burstable workloads
• Replication Manager replicates
targeted datasets to cloud (data,
schema, policies, & lineage)
CDH / HDP / CDP
Existing Apps
Existing Data
Existing Hardware
Management Console
Data Catalog
Workload
Manager
Replication
Manager
CDP Cloud Environment
SDX
DW
Clusters
ML
Clusters
CDW
Clusters
CML
Clusters
DataHub
Clusters
- 26. © 2019 Cloudera, Inc. All rights reserved. 26
CLOUDERA DATA PLATFORM – UNIQUE CAPABILITIES
Cloud
Optimized for IT & LoB
(hybrid, multi-function, SDX,
open, container-based
cloud experiences)
Cloud Burst
(supplement on-prem
capacity)
Intelligent Replication
(data, users, workloads)
Best of CDH & HDP
(Cloudera Runtime)
ENTERPRISE
DATA CLOUD
- 28. © 2019 Cloudera, Inc. All rights reserved. 28
CDP Private Cloud
Replication
Manager
CDP ON PREM
CDP Data Center
Storage
SDX
Traditional Workloads
Servers
ContainersData
Hub
CDW
CML ...
Management Console
Workload
Manager
Data
Catalog
...
Operations Compute
CDP DATA CENTER
is the first step to
private cloud
- 29. © 2019 Cloudera, Inc. All rights reserved. 29
NEW FEATURES IN CDP DATA CENTER
New features for CDH 6 customers
Ranger 2.0
• Dynamic row filtering & column masking
• Attribute-based access control
• SparkSQL fine-grained access control
Atlas 2.0
• Advanced data discovery
• Improved performance and scalability
Hive 3
• Hive-on-Tez for better ETL performance
• ACID transactions
Ozone
(Preview) • 10x scalability of HDFS
Knox* • Gateway-based SSO
Druid* • Low-latency DataMart for real-time and
aggregate data
Spark on
Docker * • Simplified dependency management
New features for HDP 3 customers
Cloudera
Manager
• Virtual private clusters
• Automated wire encryption setup
• Fine-grained RBAC for administrators
• Streamlined maintenance workflows
Atlas 2.0
• Advanced data lineage
• Faceted search
Solr 7 • Relevance-based text search over
unstructured data (text, pdf, .jpg, ...)
Impala • Better fit for Data Mart migration use
cases (interactive, BI style queries)
Hue • Built-in SQL editor
Kudu • Better performance for fast changing /
updateable data
Better at-rest
Encryption • Key Trustee Server, NavEncrypt*
* In future release
- 30. © 2019 Cloudera, Inc. All rights reserved. 30
CDP DATA CENTER ROADMAP
CDP Data Center 7.0 (2H 2019) 1H 2020
• Cloudera Manager 7.0
• Hadoop 3.1
• Spark 2.4
• Hive 3.1
• Impala 3.2
• Oozie 5.1
• Hue 4.5
• Ranger 2.0
• Atlas 2.0
• Solr 7.4
• Tez 0.9
• HBase 2.2
• Phoenix 5.0
• Kudu 1.11
• Sqoop 1.4.7
• Parquet 1.10
• Avro 1.8
• ORC 1.5
• Zookeeper 3.5
• Kafka 2.3
• Key Trustee Server 7
• Ozone (Tech Preview)
• Livy
• Druid
• Ranger KMS
• Key HSM
• Navigator Encrypt
• Zeppelin
• Knox
• Accumulo
- 31. © 2019 Cloudera, Inc. All rights reserved. 31
CDP Private Cloud
Data
Hub
CDW CML
Management Console
UPGRADING AN EXISTING CLUSTER: OPTION A
Step 1: Upgrade an existing cluster to CDP Data Center, thus creating an
SDX environment based on existing data
Step 2: Install CDP Private Cloud and use the Experiences to build new
applications
Step 3: Use Workload Manager to intelligently migrate key workloads
from the CDP Data Center cluster to the CDP Private Cloud Experiences
CDP Data Center
(SDX environment)
Existing Apps
Existing Data
Existing Hardware
CDH 5 / HDP 2
Existing Apps
Existing Data
Existing Hardware
Upgrade
CDH 6 / HDP 3
Existing Apps
Existing Data
Existing Hardware
Upgrade
Upgrade
- 32. © 2019 Cloudera, Inc. All rights reserved. 32
CDP Data Center
(SDX environment)
New Data
New Hardware
No bare metal apps
CDH / HDP
Existing Apps
Existing Data
Existing Hardware
Intelligent Replication (data, metadata, policies)
UPGRADING AN EXISTING CLUSTER: OPTION B
Step 1: Install CDP Data Center on new hardware and use Replication
Manager to replicate data, metadata, and policies from an existing
cluster to create the SDX environment
Step 2: Install CDP Private Cloud and use the Experiences to build new
applications
Step 3: Use Workload Manager to intelligently migrate key workloads
from the CDH / HDP cluster to the CDP Private Cloud Experiences
Intelligent Replication (workloads)
CDP Private Cloud
Data
Hub
CDW CML
Management Console
- 35. © 2019 Cloudera, Inc. All rights reserved. 35
OUR CUSTOMERS ARE ASKING FOR AN ENTERPRISE DATA CLOUD
Hybrid, Multi-Cloud
• Move data and applications
without rewriting and
retraining
• Separate data management
strategy from infrastructure
strategy
• Manage all environments
from a single pane of glass
Multi-Function & Open
• Deploy one platform to
address current and future
workload needs
• Connect disparate
workload types to develop
Edge2AI applications on
one platform
• Open source and open
APIs
Secure & Governed
• Manage data security and
governance centrally
• Automate application
security at all layers
• Reduce time to value with
enterprise-grade
productivity tools
Cloud Experience
• Easy to use with self-serve
capabilities
• Elasticity and agility to meet
changing demands of
workloads and company
• Simple to manage and
maintain environments and
applications
- 37. © 2019 Cloudera, Inc. All rights reserved. 37
1) NOW EACH TEAM CAN CUSTOMIZE THEIR ENVIRONMENT
Business users can
• Upgrade software on their own schedule
• Customize software in isolation
• Control performance in isolation
• Scale resources dynamically to simplify
capacity planning
• Pause and resume their environments
without losing work
• All without losing the ability to collaborate
with other teams
IT users can
• Spin up custom clusters in 30 minutes
– Without recreating the data lake
– Without reconfiguring access rules
– Without reconfiguring users
– Without reconfiguring security
• Tune cluster internals for advanced use
cases
- 38. © 2019 Cloudera, Inc. All rights reserved. 38
2) NOW WE CAN RUN COST EFFECTIVELY IN THE CLOUD
Business users can
• Save money by unbundling infrastructure
into thin servers + object storage
• Save money by only paying for what they
use
• All without diminishing the operational
support provided by IT
IT users can
• Automate cluster lifecycle to support
ad-hoc and seasonal demand
• Troubleshoot cluster internals when things
go wrong
• Manage global footprint of 100s of
clusters without scaling support staff
- 39. © 2019 Cloudera, Inc. All rights reserved. 39
3) NOW WE CAN SAFELY ONBOARD 10X MORE USERS
Business users can
• Access platform via corporate SSO to
simplify login process
• Access only the data they require for their
work
• Leverage automated data profilers to
detect sensitive data
IT users can
• Federate authenticated users and groups
from the corporate identity provider
• Not have to worry about Kerberos, LDAP
• Comply with data privacy standards
– Deny access by default
– Control access at any granularity
– Configure once for all CDP services
• Comply with regulatory standards even
with clusters coming and going
• Troubleshoot workloads even with clusters
coming and going
- 40. © 2019 Cloudera, Inc. All rights reserved. 40
4) NOW WE CAN MIGRATE TO CLOUD WITHOUT STARTING OVER
Business users can
• Migrate to cloud without retraining and
rewriting
• Burst targeted workloads to elastic
infrastructure without waiting for full
migration
IT users can
• Support deployments across any
environment without retraining or rewriting
• Manage global deployments from a single
console
- 41. © 2019 Cloudera, Inc. All rights reserved. 41
HOW TIMES HAVE CHANGED
2008
SCALE 1 JOB TO
1000s OF SERVERS
2019
SCALE 1 PLATFORM TO
1000s OF USERS
- 42. © 2019 Cloudera, Inc. All rights reserved. 42
CDP HOME
A single login to access the full
platform, documentation, and
support - all controlled through
corporate SSO
- 43. © 2019 Cloudera, Inc. All rights reserved. 43
DATA
HUB
A familiar and highly customizable
cluster service optimized for the
separation of storage and compute
- 44. © 2019 Cloudera, Inc. All rights reserved. 44
DATA
WAREHOUSE
A data warehousing service
optimized for concurrency,
caching, and isolation
- 45. © 2019 Cloudera, Inc. All rights reserved. 45
A machine learning workspace
service to connect teams of data
scientists to enterprise data
MACHINE
LEARNING
- 46. © 2019 Cloudera, Inc. All rights reserved. 46
A single pane of glass to manage
100s of clusters all with different
lifecycles - across multiple
environments
MANAGEMENT
CONSOLE
- 47. © 2019 Cloudera, Inc. All rights reserved. 47
DATA
CATALOG
A centralized data stewardship
tool for searching, organizing,
securing, and governing data
across environments
- 48. © 2019 Cloudera, Inc. All rights reserved. 48
WORKLOAD
MANAGER
A centralized management tool
for analyzing and optimizing
workloads within and across
environments
- 49. © 2019 Cloudera, Inc. All rights reserved. 49
REPLICATION
MANAGER
A centralized management tool
for replicating and migrating data,
metadata, and policies between
environments
- 51. © 2019 Cloudera, Inc. All rights reserved. 51
Component CDH 5.16 CDH 6.2 Runtime 7.x
Apache Accumulo 1.7.2 1.9.0 [Roadmap]
Apache Avro 1.7.6 1.8.2 1.8.2
Apache Flume 1.6.0 1.9.0 [Removed]
Apache Hadoop 2.6.0 3.0.0 3.1
Apache HBase 1.2.0 2.1.1 2.2
HBase Indexer 1.5.0 1.5.0 1.5.0
Apache Hive 1.1.0 2.1.1 3.1
Hue 3.9.0 4.3.0 4.3
Apache Impala 2.12.0 3.2.0 3.2
Kite SDK 1.0.0 1.0.0 1.0.0
CDH VS. CLOUDERA RUNTIME (1 of 2)
- 52. © 2019 Cloudera, Inc. All rights reserved. 52
Component CDH 5.16 CDH 6.2 Runtime 7.0
Apache Kudu 1.7.0 1.9.0 1.11
Navigator 2.15 6.2 [Integrated into Atlas]
Apache Oozie 4.1.0 5.1.0 5.1
Apache Parquet 1.5.0 1.9.0 1.10
Parquet-format 2.1.0 2.3.1 2.3.1
Apache Pig 0.12 0.17.0 [Removed]
Apache Sentry 1.5.1 2.1.0 [Replaced by Ranger]
Apache Solr 4.10.3 7.4.0 7.4
Apache Spark 1.6.0 2.4.0 2.4
Apache Sqoop 1.4.6 1.4.7 1.4.7
Apache ZooKeeper 3.4.5 3.4.5 3.4.6
CDH VS. CLOUDERA RUNTIME (2 of 2)
- 53. © 2019 Cloudera, Inc. All rights reserved. 53
Component HDP 2.6.5 HDP 3.1.4 Runtime 7.x
Apache Accumulo 1.7.0 1.7.0 [Roadmap]
Apache Atlas 0.8.0 1.1.0 2.0.0
Apache Flume 1.5.2 [Removed] [Removed]
Apache Hadoop 2.7.3 3.1.1 3.1
Apache HBase 1.1.2 2.0.2 2.2
Apache Hive 1.2.1 / 2.1.0 3.1.0 3.1
Apache Knox 0.12 1.0.0 1.3
Apache Livy - 0.5.0 0.5
Apache Oozie 4.2.0 4.3.1 5.1
Apache Phoenix 4.7.0 5.0.0 5.0
HDP VS. CLOUDERA RUNTIME (1 of 2)
- 54. © 2019 Cloudera, Inc. All rights reserved. 54
Component HDP 2.6.5 HDP 3.1.4 Runtime 7.0
Apache Pig 0.16 0.16 [Removed]
Apache Ranger 0.7.0 1.2.0 2.0
Apache Spark 1.6.3 / 2.3.2 2.3 2.4
Apache Sqoop 1.4.6 1.4.7 1.4.7
Apache Storm 1.1.0 1.2.1 [Removed]
Apache TEZ 0.7.0 0.9.1 0.9
Apache Zeppelin 0.7.3 0.8.0 0.8
Apache ZooKeeper 3.4.6 3.4.6 3.4.6
HDP VS. CLOUDERA RUNTIME (2 of 2)