More Related Content Similar to Running Enterprise Workloads with an open source Hybrid Cloud Data Architecture (20) More from DataWorks Summit (20) Running Enterprise Workloads with an open source Hybrid Cloud Data Architecture1. 1 © Hortonworks Inc. 2011–2018. All rights reserved
Running Enterprise Workloads with an Open
Source Hybrid Cloud Data Architecture
Srikanth Venkat
Senior Director, Product Management
Hortonworks Inc.
2. 2 © Hortonworks Inc. 2011–2018. All rights reserved
Presenter
Srikanth Venkat
Senior Director of Product Management,
Hortonworks Inc.
Security & Governance portfolio products & services
Apache Ranger, Apache Atlas, Apache Knox, Platform Security, & Hortonworks DataPlane
Service – Data Steward Studio(DSS)
@srikvenk https://www.linkedin.com/in/srikanthvenkat/
3. 3 © Hortonworks Inc. 2011–2018. All rights reserved
HDF HDP
Next Generation Data Problems
My Data Is Spread Across Multiple
Clusters and Data Sources
I Store & Analyze Data From
ERP/CRM, Systems, IoT/ Mobile
Devices, Social Media, Geo
Location etc.
Some of my data is on-premise,
some is in the cloud. I move my data
from cloud to on-premise & vice
versa between different clouds
™ ®
4. 4 © Hortonworks Inc. 2011–2018. All rights reserved
Data Is Your Business
Focus on Your Data Strategy
●Consider how you store, manage and protect your data
●Data must be made known, discoverable, available, trusted and compliant
●Security and Governance of all data is paramount
●Stewardship, discovery, delivery and use of data is a key concern
Treat Your Data as a Strategic Asset
●Turn data into predictive and prescriptive analytics
●Enable self-service analytics to accelerate delivery of new business insights
●Build a solid foundation for higher value Data Science, ML and AI
●Data explosion is uncovering new possibilities – if you can seize them
The Next Generation of Data Problems require a Data Strategy
5. Big Data Platform Owners
Balancing Enterprise Requirements for Hybrid Cloud Data Strategy
Time to Insight
Access a Broad Set of Analytics Tools
On-demand, Self-service Access
Data Discovery, Provisioning and Deployment
Global Data Access Transparent of Location
Single Pane of Glass
Reduce Risk
Consistent Security and Governance
Manage Cloud and Shadow Spend
Retain Data Context, Lineage and Visibility
Operational Reliability, Portability
Remain Cloud Agnostic
Data Analyst, Data Engineer
and Data Scientists
Line of Business practitioners vs Enterprise IT stakeholders
6. 6 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
Global Data Management With Hortonworks
Globally Manage, Secure, Govern, Consume
HORTONWORKS
CONNECTION
ENTERPRISE SUPPORT
PREMIER SUPPORT
EDUCATIONAL SERVICES
PROFESSIONAL SERVICES
COMMUNITY CONNECTION
HORTONWORKS
PLATFORM SERVICES
OPERATIONAL SERVICES
SMARTSENSE™
DATA
SOURCES
DATA CENTER
Exception
Monitoring
360 View of
Operations
Cyber
Security
CLOUD
Telemetry –
Connected
Devices
Time Series
EDGE
Sensors,
Control
Systems
MODERN DATA USE CASES
EDW
OPTIMIZATION
CYBERSECURITY DATA SCIENCE
ADVANCED
ANALYTICS
IOT/ STREAMING
ANALYTICS
DATAPLANE SERVICE (DPS)
MANAGE, GOVERN, SECURE
DATA
LIFECYCLE
MANAGER
DATA
STEWARD
STUDIO
ISV
SERVICES
EXTENSIBLE SERVICES
IBM DSX*
DATA
ANALYTICS
STUDIO
STREAMS
MESSAGING
MANAGER
CONNECTED DATA PLATFORMS
HORTONWORKS
DATA PLATFORM (HDP®)
DATA-AT-REST
HORTONWORKS
DATAFLOW (HDF)
DATA-IN-MOTION
* Not available as a DPS module yet
7. 7 © Hortonworks Inc. 2011–2018. All rights reserved
Hortonworks Data Plane Service Enables a Hybrid Architecture for
Global Data Management
From the edge, through movement, to rest
Hortonworks DataPlane Service
a foundational platform for the delivery of data
solutions that will:
• Support enterprise hybrid deployment strategy
and adoption of cloud
• Common Metadata, Security and Governance
across all deployments
• Simplified enterprise data asset management
• Support variety of workloads
• Extensible to new services: Services enablement
layer for rapidly bringing new solutions to market
HORTONWORKS
DATAPLANE
SERVICE
MULTIPLE CLUSTERS AND SOURCES
MULTIHYBRID
Manage, Secure, Govern
DATA AT REST
Hortonworks
Data Platform
DATA IN MOTION
Hortonworks
Data Flow
8. 8 © Hortonworks Inc. 2011–2018. All rights reserved
Forrester Calls It Data Fabric
“Bringing together disparate big data sources automatically, intelligently,
and securely and processing them in a big data platform technology, using
data lakes, Hadoop, and Apache Spark to deliver a unified, trusted, and
comprehensive view of customer and business data.”
9. 9 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
Data Plane Service is the Global Data Fabric
Cluster 2
(Unstructured)
Cluster 1
(Structured
)
Cluster 2
(Unstructured)
Cluster 1
(Structured
)
Cluster 3
(Structured
)
Data Center Dublin
Cluster 2
(Unstructured)
Cluster 1
(Structured
)
Cluster 3
(Structured
)
Cluster 4
(Unstructured)
Data Center Las Vegas
Cluster 2
(Unstructured)
Cluster 1
(Structured
)
Cluster 3
(Structured
)
Data Center Bangkok
Cluster 1
(Unstructured)
Cluster 2
(Structured
)
Shared
Services
Connectivity
Application
Portability
10. 10 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
DataPlane Service (Applications)
* Not available as a DPS
module yet
HORTONWORKS
CONNECTION
ENTERPRISE SUPPORT
PREMIER SUPPORT
EDUCATIONAL SERVICES
PROFESSIONAL SERVICES
COMMUNITY CONNECTION
HORTONWORKS
PLATFORM SERVICES
OPERATIONAL SERVICES
SMARTSENSE™
DATA
SOURCES
DATA CENTER
Exception
Monitoring
360 View of
Operations
Cyber
Security
CLOUD
Telemetry –
Connected
Devices
Time
Series
EDGE
Sensors,
Control
Systems
MODERN DATA USE CASES
EDW
OPTIMIZATION
CYBERSECURITY DATA SCIENCE
ADVANCED
ANALYTICS
IOT/ STREAMING
ANALYTICS
DATAPLANE SERVICE (DPS)
MANAGE, GOVERN, SECURE
DATA
LIFECYCLE
MANAGER
DATA
STEWARD
STUDIO
ISV
SERVICES
EXTENSIBLE SERVICES
IBM DSX*
DATA
ANALYTICS
STUDIO
STREAMS
MESSAGING
MANAGER
CONNECTED DATA PLATFORMS
HORTONWORKS
DATA PLATFORM (HDP®)
DATA-AT-REST
HORTONWORKS
DATAFLOW (HDF™)
DATA-IN-MOTION
* Not available as a DPS module yet
Hortonworks DataPlane Service
• DLM - Data LifeCycle Manager
• DSS – Data Steward Studio
• DAS – Data Analytics Studio
• SMM – Streams Messaging Mgr
DATA
SOURCES
DATA CENTER
Exception
Monitoring
360 View of
Operations
Cyber
Security
CLOUD
Telemetry –
Connected
Devices
Time Series
EDGE
Sensors,
Control
Systems
DATAPLANE SERVICE (DPS)
MANAGE, GOVERN, SECURE
DATA
LIFECYCLE
MANAGER
DATA
STEWARD
STUDIO
EXTENSIBLE SERVICES
DATA
ANALYTICS
STUDIO
STREAMS
MESSAGING
MANAGER
11. 11 © Hortonworks Inc. 2011–2018. All rights reserved
Hortonworks DataPlane Service
a platform with extensible data management
services for:
q Addressing compliance and regulatory requirements for
enterprise
q Providing consistent security & governance across data
landscape
q Enabling centralized management of data assets
q Responsible data sharing and collaboration
What is Hortonworks DataPlane Service?
12. 12 © Hortonworks Inc. 2011–2018. All rights reserved
The DPS Ecosystem
DPS PLATFORM
DATA
LIFECYCLE
MANAGER
DATA
STEWARD
STUDIO*
DATA
ANALYTICS
STUDIO*
STREAMS
MESSAGING
MANAGER
DATA PLANE SERVICES
Authentication, Role-based access, Service lifecycle management,
Cluster registration, Cluster Service discovery and access
HDP/HDF Cluster
DLM Engine
Profiler
Service
DAS Agent
SMM Agent
13. 13 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
⬢ Manage the Data Lifecycle:
– Replication/failback to another cloud/on-prem
site for Disaster Recovery
– Auto Tiering of hot/warm/cold data to cloud
object storage/on-prem for TCO reduction
– Backup & Recover Critical Business Data
⬢ Maintain Common Security and Governance Policies
Across Multi Data Sources/ Environments
Data Lifecycle Manager (DLM)
DATA LIFECYCLE MANAGER
REPLICATION &
DISASTER
RECOVERY
Cluster Cluster ClusterMOVE MOVE
AUTO TIERING
BACKUP &
RESTORE
P(use): high
Cost: $$$
P(use): medium
Cost: $$
P(use): low
Cost: $
Full
backup
day 1 day 2 day 3
Cumulative incremental
backups
Accident
delete
X
FAILBACK
REPLICATION
RESTORE
Prod
Cluster
Backup
Cluster
Generally
Available
Coming Soon
Coming Soon
DLM
14. 14 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
DLM 1.0 (GA Product) DLM: Pair clusters and manage data replication flows
Data Lifecycle Manager (DLM)
15. 15 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
DLM: Replicate between on-prem and cloud
DPS PlatformData Lifecycle Manager (DLM)
16. 16 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
DLM: Replication policies and instances
Data Lifecycle Manager (DLM)Data Lifecycle Manager (DLM)
17. 17 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
Intuitive Query Tools
Full featured auto-complete, results direct download,
quick-data preview
Data Analytics Studio (DAS)Data Analytics Studio (DAS)
18. 18 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
19. 19 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
Enhance productivity through full featured auto-
complete, results direct download, quick-data
preview features
Data Analytics Studio (DAS)
20. 20 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
Self optimize queries and storage based on heuristic
recommendation engine
Data Analytics Studio (DAS)
21. 21 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
DAS: Data Analytics Studio gives database heatmap,
quickly discover and see what part of your cluster is
being utilized more
Data Analytics Studio (DAS)
22. 22 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
DAS: Heuristic recommendation engine
Fully self-serviced query and storage optimization
Data Analytics Studio (DAS)
23. 23 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
Built-in batch operations
No more scripting needed for day-to-day operations
Data Analytics Studio (DAS)Data Analytics Studio (DAS)
24. 24 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
Hortonworks Streams Messaging Manager (SMM)
What is SMM?
à Kafka Management and Monitoring tool
à Single Monitoring Dashboard for all your
Kafka Clusters across 4 entities
– Broker
– Producer
– Topic
– Consumer
à Supports multiple HDP and/or HDF Kafka
Clusters
à REST as a First Class Citizen
à Delivered as a DataPlane Service
25. 25 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
SMMSMM: Full visibility into all details of Kafka Clusters
DPS PlatformStreams Messaging Manager
26. 26 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
SMMSMM: Detailed Views of specific Topics
DPS PlatformStreams Messaging Manager
27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Explore Metadata about the Topic in Atlas
Click on Atlas Link to see the
metadata of the topic
gateway-west-raw-sensors
in Atlas
28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Traverse the flow of data across multiple Kafka Topics using SMM and Atlas Integration
Question
The topic has one active consumer
which is a NiFi consumer. Which
Kafka topic if any is this Nifi Flow
consumer publishing events to?
Step 1
Click on Atlas Icon to see
lineage of the the topic
gateway-west-raw-sensors
Analysis
NiFi App consumes from the
gateway-west-raw-sensors topic
and publishes events to
downstream Kafka topic called
syndicate-geo-event-avro
29. 29 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
SMMSMM: All producers and Consumers associated with a
topic
DPS PlatformStreams Messaging Manager
30. 30 © Hortonworks Inc. 2011–2018. All rights reserved.
Data Steward Studio Overview
31. 31 © Hortonworks Inc. 2011–2018. All rights reserved.
Data Governance: It’s a team sport!
Implements business data
requirements
Data CuratorData Steward
Manages business requirements
for data sharing
Sponsor
Champions data governance
across enterprise
Data Owner
Accountable for all data
generated by an agency
Supports the Data Steward in
data related activities
Business Data SME
Coordinate cross-agency data
management activities
Data Council
33. 36 © Hortonworks Inc. 2011–2018. All rights reserved.
Hortonworks DataPlane Service (DPS)
34. 37 © Hortonworks Inc. 2011–2018. All rights reserved.
Organize Your Data Assets as Collections
• Data Asset Collections - Organizational
construct for assets based on business
definition for grouping heterogenous data
• Create asset collections and attach
metadata
• Contextual attributes: Name,
Description, Owner, Datalake
• System attributes: - Created-on,
Modified-on, Modified-by, Created-by,
Version
• Search for assets using attribute facets or
free text
• View personalized dashboard of asset
collections
• Delete/update data asset collections
• Asset 360 view for assets in collection
Asset Collections
35. 38 © Hortonworks Inc. 2011–2018. All rights reserved.
Discover and Fingerprint your Data Assets
• Computes Profile for data assets
as they are ingested or created
within the platform. Automatically
determines types of columns
based on data values
• Generates key metrics for data in
each column. Various
visualizations can be utilized (Box
plots, Histograms, Pie charts) to
view metrics
• Persists profile information in
cluster
• As more data is added, profilers
can be scheduled for execution for
updating the profile metadata for
the asset.
Data Profiler
Column Statistical Profiler
36. 39 © Hortonworks Inc. 2011–2018. All rights reserved.
Know your Sensitive Data
• Automatically detect and
profile sensitive & personal
data
• Attach classification
annotations for sensitivity
• Manual approval and curation
of sensitive data
classifications
• Leverage classification based
data protection
• Sensitive data dashboard on
Asset 360
Sensitive Data Profiling
37. 40 © Hortonworks Inc. 2011–2018. All rights reserved.
Track your Sensitive Data
• IBAN (27 EU Countries)
• Credit Card Numbers
• Email
• Telephone (AMER, EU)
• IP Address
• URL
• Passport (12 EU Countries)
• National ID (19 EU Countries)
• Australian Drivers License
• Australian Passport
• Australian National ID
Sensitive Data Types
38. 41 © Hortonworks Inc. 2011–2018. All rights reserved.
Track Your Data Asset – Lineage and Impact
• Consolidated Upstream lineage and
downstream impact
• Detailed click-through to asset properties
Data Lineage and Impact
39. 42 © Hortonworks Inc. 2011–2018. All rights reserved.
View Security Policies for your Data Assets
• View security policies on
data assets
• View classification based
policies on assets
Security Policies
40. 43 © Hortonworks Inc. 2011–2018. All rights reserved.
Monitor Usage of your Data Assets
• Dashboard for access patterns and
trends for each asset
• Examples:
• Count of Access Events
• Top N Users over Time
• Most recent trail of access audit
events
Audit and Monitoring