SlideShare a Scribd company logo
1 of 41
Download to read offline
1 © Hortonworks Inc. 2011–2018. All rights reserved
Running Enterprise Workloads with an Open
Source Hybrid Cloud Data Architecture
Srikanth Venkat
Senior Director, Product Management
Hortonworks Inc.
2 © Hortonworks Inc. 2011–2018. All rights reserved
Presenter
Srikanth Venkat
Senior Director of Product Management,
Hortonworks Inc.
Security & Governance portfolio products & services
Apache Ranger, Apache Atlas, Apache Knox, Platform Security, & Hortonworks DataPlane
Service – Data Steward Studio(DSS)
@srikvenk https://www.linkedin.com/in/srikanthvenkat/
3 © Hortonworks Inc. 2011–2018. All rights reserved
HDF HDP
Next Generation Data Problems
My Data Is Spread Across Multiple
Clusters and Data Sources
I Store & Analyze Data From
ERP/CRM, Systems, IoT/ Mobile
Devices, Social Media, Geo
Location etc.
Some of my data is on-premise,
some is in the cloud. I move my data
from cloud to on-premise & vice
versa between different clouds
™ ®
4 © Hortonworks Inc. 2011–2018. All rights reserved
Data Is Your Business
Focus on Your Data Strategy
●Consider how you store, manage and protect your data
●Data must be made known, discoverable, available, trusted and compliant
●Security and Governance of all data is paramount
●Stewardship, discovery, delivery and use of data is a key concern
Treat Your Data as a Strategic Asset
●Turn data into predictive and prescriptive analytics
●Enable self-service analytics to accelerate delivery of new business insights
●Build a solid foundation for higher value Data Science, ML and AI
●Data explosion is uncovering new possibilities – if you can seize them
The Next Generation of Data Problems require a Data Strategy
Big Data Platform Owners
Balancing Enterprise Requirements for Hybrid Cloud Data Strategy
Time to Insight
Access a Broad Set of Analytics Tools
On-demand, Self-service Access
Data Discovery, Provisioning and Deployment
Global Data Access Transparent of Location
Single Pane of Glass
Reduce Risk
Consistent Security and Governance
Manage Cloud and Shadow Spend
Retain Data Context, Lineage and Visibility
Operational Reliability, Portability
Remain Cloud Agnostic
Data Analyst, Data Engineer
and Data Scientists
Line of Business practitioners vs Enterprise IT stakeholders
6 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
Global Data Management With Hortonworks
Globally Manage, Secure, Govern, Consume
HORTONWORKS
CONNECTION
ENTERPRISE SUPPORT
PREMIER SUPPORT
EDUCATIONAL SERVICES
PROFESSIONAL SERVICES
COMMUNITY CONNECTION
HORTONWORKS
PLATFORM SERVICES
OPERATIONAL SERVICES
SMARTSENSE™
DATA
SOURCES
DATA CENTER
Exception
Monitoring
360 View of
Operations
Cyber
Security
CLOUD
Telemetry –
Connected
Devices
Time Series
EDGE
Sensors,
Control
Systems
MODERN DATA USE CASES
EDW
OPTIMIZATION
CYBERSECURITY DATA SCIENCE
ADVANCED
ANALYTICS
IOT/ STREAMING
ANALYTICS
DATAPLANE SERVICE (DPS)
MANAGE, GOVERN, SECURE
DATA
LIFECYCLE
MANAGER
DATA
STEWARD
STUDIO
ISV
SERVICES
EXTENSIBLE SERVICES
IBM DSX*
DATA
ANALYTICS
STUDIO
STREAMS
MESSAGING
MANAGER
CONNECTED DATA PLATFORMS
HORTONWORKS
DATA PLATFORM (HDP®)
DATA-AT-REST
HORTONWORKS
DATAFLOW (HDF)
DATA-IN-MOTION
* Not available as a DPS module yet
7 © Hortonworks Inc. 2011–2018. All rights reserved
Hortonworks Data Plane Service Enables a Hybrid Architecture for
Global Data Management
From the edge, through movement, to rest
Hortonworks DataPlane Service
a foundational platform for the delivery of data
solutions that will:
• Support enterprise hybrid deployment strategy
and adoption of cloud
• Common Metadata, Security and Governance
across all deployments
• Simplified enterprise data asset management
• Support variety of workloads
• Extensible to new services: Services enablement
layer for rapidly bringing new solutions to market
HORTONWORKS
DATAPLANE
SERVICE
MULTIPLE CLUSTERS AND SOURCES
MULTIHYBRID
Manage, Secure, Govern
DATA AT REST
Hortonworks
Data Platform
DATA IN MOTION
Hortonworks
Data Flow
8 © Hortonworks Inc. 2011–2018. All rights reserved
Forrester Calls It Data Fabric
“Bringing together disparate big data sources automatically, intelligently,
and securely and processing them in a big data platform technology, using
data lakes, Hadoop, and Apache Spark to deliver a unified, trusted, and
comprehensive view of customer and business data.”
9 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
Data Plane Service is the Global Data Fabric
Cluster 2
(Unstructured)
Cluster 1
(Structured
)
Cluster 2
(Unstructured)
Cluster 1
(Structured
)
Cluster 3
(Structured
)
Data Center Dublin
Cluster 2
(Unstructured)
Cluster 1
(Structured
)
Cluster 3
(Structured
)
Cluster 4
(Unstructured)
Data Center Las Vegas
Cluster 2
(Unstructured)
Cluster 1
(Structured
)
Cluster 3
(Structured
)
Data Center Bangkok
Cluster 1
(Unstructured)
Cluster 2
(Structured
)
Shared
Services
Connectivity
Application
Portability
10 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
DataPlane Service (Applications)
* Not available as a DPS
module yet
HORTONWORKS
CONNECTION
ENTERPRISE SUPPORT
PREMIER SUPPORT
EDUCATIONAL SERVICES
PROFESSIONAL SERVICES
COMMUNITY CONNECTION
HORTONWORKS
PLATFORM SERVICES
OPERATIONAL SERVICES
SMARTSENSE™
DATA
SOURCES
DATA CENTER
Exception
Monitoring
360 View of
Operations
Cyber
Security
CLOUD
Telemetry –
Connected
Devices
Time
Series
EDGE
Sensors,
Control
Systems
MODERN DATA USE CASES
EDW
OPTIMIZATION
CYBERSECURITY DATA SCIENCE
ADVANCED
ANALYTICS
IOT/ STREAMING
ANALYTICS
DATAPLANE SERVICE (DPS)
MANAGE, GOVERN, SECURE
DATA
LIFECYCLE
MANAGER
DATA
STEWARD
STUDIO
ISV
SERVICES
EXTENSIBLE SERVICES
IBM DSX*
DATA
ANALYTICS
STUDIO
STREAMS
MESSAGING
MANAGER
CONNECTED DATA PLATFORMS
HORTONWORKS
DATA PLATFORM (HDP®)
DATA-AT-REST
HORTONWORKS
DATAFLOW (HDF™)
DATA-IN-MOTION
* Not available as a DPS module yet
Hortonworks DataPlane Service
• DLM - Data LifeCycle Manager
• DSS – Data Steward Studio
• DAS – Data Analytics Studio
• SMM – Streams Messaging Mgr
DATA
SOURCES
DATA CENTER
Exception
Monitoring
360 View of
Operations
Cyber
Security
CLOUD
Telemetry –
Connected
Devices
Time Series
EDGE
Sensors,
Control
Systems
DATAPLANE SERVICE (DPS)
MANAGE, GOVERN, SECURE
DATA
LIFECYCLE
MANAGER
DATA
STEWARD
STUDIO
EXTENSIBLE SERVICES
DATA
ANALYTICS
STUDIO
STREAMS
MESSAGING
MANAGER
11 © Hortonworks Inc. 2011–2018. All rights reserved
Hortonworks DataPlane Service
a platform with extensible data management
services for:
q Addressing compliance and regulatory requirements for
enterprise
q Providing consistent security & governance across data
landscape
q Enabling centralized management of data assets
q Responsible data sharing and collaboration
What is Hortonworks DataPlane Service?
12 © Hortonworks Inc. 2011–2018. All rights reserved
The DPS Ecosystem
DPS PLATFORM
DATA
LIFECYCLE
MANAGER
DATA
STEWARD
STUDIO*
DATA
ANALYTICS
STUDIO*
STREAMS
MESSAGING
MANAGER
DATA PLANE SERVICES
Authentication, Role-based access, Service lifecycle management,
Cluster registration, Cluster Service discovery and access
HDP/HDF Cluster
DLM Engine
Profiler
Service
DAS Agent
SMM Agent
13 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
⬢ Manage the Data Lifecycle:
– Replication/failback to another cloud/on-prem
site for Disaster Recovery
– Auto Tiering of hot/warm/cold data to cloud
object storage/on-prem for TCO reduction
– Backup & Recover Critical Business Data
⬢ Maintain Common Security and Governance Policies
Across Multi Data Sources/ Environments
Data Lifecycle Manager (DLM)
DATA LIFECYCLE MANAGER
REPLICATION &
DISASTER
RECOVERY
Cluster Cluster ClusterMOVE MOVE
AUTO TIERING
BACKUP &
RESTORE
P(use): high
Cost: $$$
P(use): medium
Cost: $$
P(use): low
Cost: $
Full
backup
day 1 day 2 day 3
Cumulative incremental
backups
Accident
delete
X
FAILBACK
REPLICATION
RESTORE
Prod
Cluster
Backup
Cluster
Generally
Available
Coming Soon
Coming Soon
DLM
14 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
DLM 1.0 (GA Product) DLM: Pair clusters and manage data replication flows
Data Lifecycle Manager (DLM)
15 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
DLM: Replicate between on-prem and cloud
DPS PlatformData Lifecycle Manager (DLM)
16 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
DLM: Replication policies and instances
Data Lifecycle Manager (DLM)Data Lifecycle Manager (DLM)
17 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
Intuitive Query Tools
Full featured auto-complete, results direct download,
quick-data preview
Data Analytics Studio (DAS)Data Analytics Studio (DAS)
18 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
19 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
Enhance productivity through full featured auto-
complete, results direct download, quick-data
preview features
Data Analytics Studio (DAS)
20 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
Self optimize queries and storage based on heuristic
recommendation engine
Data Analytics Studio (DAS)
21 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
DAS: Data Analytics Studio gives database heatmap,
quickly discover and see what part of your cluster is
being utilized more
Data Analytics Studio (DAS)
22 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
DAS: Heuristic recommendation engine
Fully self-serviced query and storage optimization
Data Analytics Studio (DAS)
23 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
Built-in batch operations
No more scripting needed for day-to-day operations
Data Analytics Studio (DAS)Data Analytics Studio (DAS)
24 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
Hortonworks Streams Messaging Manager (SMM)
What is SMM?
à Kafka Management and Monitoring tool
à Single Monitoring Dashboard for all your
Kafka Clusters across 4 entities
– Broker
– Producer
– Topic
– Consumer
à Supports multiple HDP and/or HDF Kafka
Clusters
à REST as a First Class Citizen
à Delivered as a DataPlane Service
25 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
SMMSMM: Full visibility into all details of Kafka Clusters
DPS PlatformStreams Messaging Manager
26 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
SMMSMM: Detailed Views of specific Topics
DPS PlatformStreams Messaging Manager
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Explore Metadata about the Topic in Atlas
Click on Atlas Link to see the
metadata of the topic
gateway-west-raw-sensors
in Atlas
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Traverse the flow of data across multiple Kafka Topics using SMM and Atlas Integration
Question
The topic has one active consumer
which is a NiFi consumer. Which
Kafka topic if any is this Nifi Flow
consumer publishing events to?
Step 1
Click on Atlas Icon to see
lineage of the the topic
gateway-west-raw-sensors
Analysis
NiFi App consumes from the
gateway-west-raw-sensors topic
and publishes events to
downstream Kafka topic called
syndicate-geo-event-avro
29 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
SMMSMM: All producers and Consumers associated with a
topic
DPS PlatformStreams Messaging Manager
30 © Hortonworks Inc. 2011–2018. All rights reserved.
Data Steward Studio Overview
31 © Hortonworks Inc. 2011–2018. All rights reserved.
Data Governance: It’s a team sport!
Implements business data
requirements
Data CuratorData Steward
Manages business requirements
for data sharing
Sponsor
Champions data governance
across enterprise
Data Owner
Accountable for all data
generated by an agency
Supports the Data Steward in
data related activities
Business Data SME
Coordinate cross-agency data
management activities
Data Council
35 © Hortonworks Inc. 2011–2018. All rights reserved.
Goals
36 © Hortonworks Inc. 2011–2018. All rights reserved.
Hortonworks DataPlane Service (DPS)
37 © Hortonworks Inc. 2011–2018. All rights reserved.
Organize Your Data Assets as Collections
• Data Asset Collections - Organizational
construct for assets based on business
definition for grouping heterogenous data
• Create asset collections and attach
metadata
• Contextual attributes: Name,
Description, Owner, Datalake
• System attributes: - Created-on,
Modified-on, Modified-by, Created-by,
Version
• Search for assets using attribute facets or
free text
• View personalized dashboard of asset
collections
• Delete/update data asset collections
• Asset 360 view for assets in collection
Asset Collections
38 © Hortonworks Inc. 2011–2018. All rights reserved.
Discover and Fingerprint your Data Assets
• Computes Profile for data assets
as they are ingested or created
within the platform. Automatically
determines types of columns
based on data values
• Generates key metrics for data in
each column. Various
visualizations can be utilized (Box
plots, Histograms, Pie charts) to
view metrics
• Persists profile information in
cluster
• As more data is added, profilers
can be scheduled for execution for
updating the profile metadata for
the asset.
Data Profiler
Column Statistical Profiler
39 © Hortonworks Inc. 2011–2018. All rights reserved.
Know your Sensitive Data
• Automatically detect and
profile sensitive & personal
data
• Attach classification
annotations for sensitivity
• Manual approval and curation
of sensitive data
classifications
• Leverage classification based
data protection
• Sensitive data dashboard on
Asset 360
Sensitive Data Profiling
40 © Hortonworks Inc. 2011–2018. All rights reserved.
Track your Sensitive Data
• IBAN (27 EU Countries)
• Credit Card Numbers
• Email
• Telephone (AMER, EU)
• IP Address
• URL
• Passport (12 EU Countries)
• National ID (19 EU Countries)
• Australian Drivers License
• Australian Passport
• Australian National ID
Sensitive Data Types
41 © Hortonworks Inc. 2011–2018. All rights reserved.
Track Your Data Asset – Lineage and Impact
• Consolidated Upstream lineage and
downstream impact
• Detailed click-through to asset properties
Data Lineage and Impact
42 © Hortonworks Inc. 2011–2018. All rights reserved.
View Security Policies for your Data Assets
• View security policies on
data assets
• View classification based
policies on assets
Security Policies
43 © Hortonworks Inc. 2011–2018. All rights reserved.
Monitor Usage of your Data Assets
• Dashboard for access patterns and
trends for each asset
• Examples:
• Count of Access Events
• Top N Users over Time
• Most recent trail of access audit
events
Audit and Monitoring
44 © Hortonworks Inc. 2011–2018. All rights reserved
Thank you!

More Related Content

What's hot

The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFiThe First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
DataWorks Summit
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
 
Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...
DataWorks Summit
 
The Implacable advance of the data
The Implacable advance of the dataThe Implacable advance of the data
The Implacable advance of the data
DataWorks Summit
 

What's hot (19)

Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash Course
 
Curing the Kafka Blindness – Streams Messaging Manager
Curing the Kafka Blindness – Streams Messaging ManagerCuring the Kafka Blindness – Streams Messaging Manager
Curing the Kafka Blindness – Streams Messaging Manager
 
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFiThe First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
 
Deep Learning 101
Deep Learning 101Deep Learning 101
Deep Learning 101
 
The Car of the Future - Autonomous, Connected, and Data Centric
The Car of the Future - Autonomous, Connected, and Data CentricThe Car of the Future - Autonomous, Connected, and Data Centric
The Car of the Future - Autonomous, Connected, and Data Centric
 
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?
 
Big Traffic, Big Trouble: Big Data - Tokyo
Big Traffic, Big Trouble: Big Data - TokyoBig Traffic, Big Trouble: Big Data - Tokyo
Big Traffic, Big Trouble: Big Data - Tokyo
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Ozone and HDFS’s evolution
Ozone and HDFS’s evolutionOzone and HDFS’s evolution
Ozone and HDFS’s evolution
 
Keynote
KeynoteKeynote
Keynote
 
Containers and Big Data
Containers and Big DataContainers and Big Data
Containers and Big Data
 
Lessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARNLessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARN
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
The Elephant in the Clouds
The Elephant in the CloudsThe Elephant in the Clouds
The Elephant in the Clouds
 
Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...
 
Spark Summit EMEA - Arun Murthy's Keynote
Spark Summit EMEA - Arun Murthy's KeynoteSpark Summit EMEA - Arun Murthy's Keynote
Spark Summit EMEA - Arun Murthy's Keynote
 
The Implacable advance of the data
The Implacable advance of the dataThe Implacable advance of the data
The Implacable advance of the data
 
Modernise your EDW - Data Lake
Modernise your EDW - Data LakeModernise your EDW - Data Lake
Modernise your EDW - Data Lake
 

Similar to Running Enterprise Workloads with an open source Hybrid Cloud Data Architecture

Manage democratization of the data - Data Replication in Hadoop
Manage democratization of the data - Data Replication in HadoopManage democratization of the data - Data Replication in Hadoop
Manage democratization of the data - Data Replication in Hadoop
DataWorks Summit
 
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
DataWorks Summit
 
IBM Cloud Paris meetup 20180213 - Hortonworks
IBM Cloud Paris meetup   20180213 - HortonworksIBM Cloud Paris meetup   20180213 - Hortonworks
IBM Cloud Paris meetup 20180213 - Hortonworks
IBM France Lab
 
Imperative Induced Innovation - Patrick W. Dowd, Ph. D
Imperative Induced Innovation - Patrick W. Dowd, Ph. DImperative Induced Innovation - Patrick W. Dowd, Ph. D
Imperative Induced Innovation - Patrick W. Dowd, Ph. D
scoopnewsgroup
 

Similar to Running Enterprise Workloads with an open source Hybrid Cloud Data Architecture (20)

Hortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your dataHortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your data
 
Hortonworks - IBM - Cloud Event
Hortonworks - IBM - Cloud EventHortonworks - IBM - Cloud Event
Hortonworks - IBM - Cloud Event
 
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...
 
Manage democratization of the data - Data Replication in Hadoop
Manage democratization of the data - Data Replication in HadoopManage democratization of the data - Data Replication in Hadoop
Manage democratization of the data - Data Replication in Hadoop
 
Reinvent Your Data Management Strategy for Successful Digital Transformation
Reinvent Your Data Management Strategy for Successful Digital TransformationReinvent Your Data Management Strategy for Successful Digital Transformation
Reinvent Your Data Management Strategy for Successful Digital Transformation
 
Hortonworks & Bilot Data Driven Transformations with Hadoop
Hortonworks & Bilot Data Driven Transformations with HadoopHortonworks & Bilot Data Driven Transformations with Hadoop
Hortonworks & Bilot Data Driven Transformations with Hadoop
 
Easing Integration of Large-Scale Real-Time Systems with DDS
Easing Integration of Large-Scale Real-Time Systems with DDSEasing Integration of Large-Scale Real-Time Systems with DDS
Easing Integration of Large-Scale Real-Time Systems with DDS
 
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
 
DDS in Action -- Part I
DDS in Action -- Part IDDS in Action -- Part I
DDS in Action -- Part I
 
Hortonworks - IBM Cognitive - The Future of Data Science
Hortonworks - IBM Cognitive - The Future of Data ScienceHortonworks - IBM Cognitive - The Future of Data Science
Hortonworks - IBM Cognitive - The Future of Data Science
 
Big Data
Big DataBig Data
Big Data
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu BariApache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
 
IBM Cloud Paris meetup 20180213 - Hortonworks
IBM Cloud Paris meetup   20180213 - HortonworksIBM Cloud Paris meetup   20180213 - Hortonworks
IBM Cloud Paris meetup 20180213 - Hortonworks
 
Solving the Really Big Tech Problems with IoT
 Solving the Really Big Tech Problems with IoT Solving the Really Big Tech Problems with IoT
Solving the Really Big Tech Problems with IoT
 
Introduction to Modern Data Virtualization (US)
Introduction to Modern Data Virtualization (US)Introduction to Modern Data Virtualization (US)
Introduction to Modern Data Virtualization (US)
 
DG_Architecture_Training.pptx
DG_Architecture_Training.pptxDG_Architecture_Training.pptx
DG_Architecture_Training.pptx
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Hortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptx
 
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
 
Imperative Induced Innovation - Patrick W. Dowd, Ph. D
Imperative Induced Innovation - Patrick W. Dowd, Ph. DImperative Induced Innovation - Patrick W. Dowd, Ph. D
Imperative Induced Innovation - Patrick W. Dowd, Ph. D
 

More from DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 

Running Enterprise Workloads with an open source Hybrid Cloud Data Architecture

  • 1. 1 © Hortonworks Inc. 2011–2018. All rights reserved Running Enterprise Workloads with an Open Source Hybrid Cloud Data Architecture Srikanth Venkat Senior Director, Product Management Hortonworks Inc.
  • 2. 2 © Hortonworks Inc. 2011–2018. All rights reserved Presenter Srikanth Venkat Senior Director of Product Management, Hortonworks Inc. Security & Governance portfolio products & services Apache Ranger, Apache Atlas, Apache Knox, Platform Security, & Hortonworks DataPlane Service – Data Steward Studio(DSS) @srikvenk https://www.linkedin.com/in/srikanthvenkat/
  • 3. 3 © Hortonworks Inc. 2011–2018. All rights reserved HDF HDP Next Generation Data Problems My Data Is Spread Across Multiple Clusters and Data Sources I Store & Analyze Data From ERP/CRM, Systems, IoT/ Mobile Devices, Social Media, Geo Location etc. Some of my data is on-premise, some is in the cloud. I move my data from cloud to on-premise & vice versa between different clouds ™ ®
  • 4. 4 © Hortonworks Inc. 2011–2018. All rights reserved Data Is Your Business Focus on Your Data Strategy ●Consider how you store, manage and protect your data ●Data must be made known, discoverable, available, trusted and compliant ●Security and Governance of all data is paramount ●Stewardship, discovery, delivery and use of data is a key concern Treat Your Data as a Strategic Asset ●Turn data into predictive and prescriptive analytics ●Enable self-service analytics to accelerate delivery of new business insights ●Build a solid foundation for higher value Data Science, ML and AI ●Data explosion is uncovering new possibilities – if you can seize them The Next Generation of Data Problems require a Data Strategy
  • 5. Big Data Platform Owners Balancing Enterprise Requirements for Hybrid Cloud Data Strategy Time to Insight Access a Broad Set of Analytics Tools On-demand, Self-service Access Data Discovery, Provisioning and Deployment Global Data Access Transparent of Location Single Pane of Glass Reduce Risk Consistent Security and Governance Manage Cloud and Shadow Spend Retain Data Context, Lineage and Visibility Operational Reliability, Portability Remain Cloud Agnostic Data Analyst, Data Engineer and Data Scientists Line of Business practitioners vs Enterprise IT stakeholders
  • 6. 6 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. Global Data Management With Hortonworks Globally Manage, Secure, Govern, Consume HORTONWORKS CONNECTION ENTERPRISE SUPPORT PREMIER SUPPORT EDUCATIONAL SERVICES PROFESSIONAL SERVICES COMMUNITY CONNECTION HORTONWORKS PLATFORM SERVICES OPERATIONAL SERVICES SMARTSENSE™ DATA SOURCES DATA CENTER Exception Monitoring 360 View of Operations Cyber Security CLOUD Telemetry – Connected Devices Time Series EDGE Sensors, Control Systems MODERN DATA USE CASES EDW OPTIMIZATION CYBERSECURITY DATA SCIENCE ADVANCED ANALYTICS IOT/ STREAMING ANALYTICS DATAPLANE SERVICE (DPS) MANAGE, GOVERN, SECURE DATA LIFECYCLE MANAGER DATA STEWARD STUDIO ISV SERVICES EXTENSIBLE SERVICES IBM DSX* DATA ANALYTICS STUDIO STREAMS MESSAGING MANAGER CONNECTED DATA PLATFORMS HORTONWORKS DATA PLATFORM (HDP®) DATA-AT-REST HORTONWORKS DATAFLOW (HDF) DATA-IN-MOTION * Not available as a DPS module yet
  • 7. 7 © Hortonworks Inc. 2011–2018. All rights reserved Hortonworks Data Plane Service Enables a Hybrid Architecture for Global Data Management From the edge, through movement, to rest Hortonworks DataPlane Service a foundational platform for the delivery of data solutions that will: • Support enterprise hybrid deployment strategy and adoption of cloud • Common Metadata, Security and Governance across all deployments • Simplified enterprise data asset management • Support variety of workloads • Extensible to new services: Services enablement layer for rapidly bringing new solutions to market HORTONWORKS DATAPLANE SERVICE MULTIPLE CLUSTERS AND SOURCES MULTIHYBRID Manage, Secure, Govern DATA AT REST Hortonworks Data Platform DATA IN MOTION Hortonworks Data Flow
  • 8. 8 © Hortonworks Inc. 2011–2018. All rights reserved Forrester Calls It Data Fabric “Bringing together disparate big data sources automatically, intelligently, and securely and processing them in a big data platform technology, using data lakes, Hadoop, and Apache Spark to deliver a unified, trusted, and comprehensive view of customer and business data.”
  • 9. 9 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. Data Plane Service is the Global Data Fabric Cluster 2 (Unstructured) Cluster 1 (Structured ) Cluster 2 (Unstructured) Cluster 1 (Structured ) Cluster 3 (Structured ) Data Center Dublin Cluster 2 (Unstructured) Cluster 1 (Structured ) Cluster 3 (Structured ) Cluster 4 (Unstructured) Data Center Las Vegas Cluster 2 (Unstructured) Cluster 1 (Structured ) Cluster 3 (Structured ) Data Center Bangkok Cluster 1 (Unstructured) Cluster 2 (Structured ) Shared Services Connectivity Application Portability
  • 10. 10 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. DataPlane Service (Applications) * Not available as a DPS module yet HORTONWORKS CONNECTION ENTERPRISE SUPPORT PREMIER SUPPORT EDUCATIONAL SERVICES PROFESSIONAL SERVICES COMMUNITY CONNECTION HORTONWORKS PLATFORM SERVICES OPERATIONAL SERVICES SMARTSENSE™ DATA SOURCES DATA CENTER Exception Monitoring 360 View of Operations Cyber Security CLOUD Telemetry – Connected Devices Time Series EDGE Sensors, Control Systems MODERN DATA USE CASES EDW OPTIMIZATION CYBERSECURITY DATA SCIENCE ADVANCED ANALYTICS IOT/ STREAMING ANALYTICS DATAPLANE SERVICE (DPS) MANAGE, GOVERN, SECURE DATA LIFECYCLE MANAGER DATA STEWARD STUDIO ISV SERVICES EXTENSIBLE SERVICES IBM DSX* DATA ANALYTICS STUDIO STREAMS MESSAGING MANAGER CONNECTED DATA PLATFORMS HORTONWORKS DATA PLATFORM (HDP®) DATA-AT-REST HORTONWORKS DATAFLOW (HDF™) DATA-IN-MOTION * Not available as a DPS module yet Hortonworks DataPlane Service • DLM - Data LifeCycle Manager • DSS – Data Steward Studio • DAS – Data Analytics Studio • SMM – Streams Messaging Mgr DATA SOURCES DATA CENTER Exception Monitoring 360 View of Operations Cyber Security CLOUD Telemetry – Connected Devices Time Series EDGE Sensors, Control Systems DATAPLANE SERVICE (DPS) MANAGE, GOVERN, SECURE DATA LIFECYCLE MANAGER DATA STEWARD STUDIO EXTENSIBLE SERVICES DATA ANALYTICS STUDIO STREAMS MESSAGING MANAGER
  • 11. 11 © Hortonworks Inc. 2011–2018. All rights reserved Hortonworks DataPlane Service a platform with extensible data management services for: q Addressing compliance and regulatory requirements for enterprise q Providing consistent security & governance across data landscape q Enabling centralized management of data assets q Responsible data sharing and collaboration What is Hortonworks DataPlane Service?
  • 12. 12 © Hortonworks Inc. 2011–2018. All rights reserved The DPS Ecosystem DPS PLATFORM DATA LIFECYCLE MANAGER DATA STEWARD STUDIO* DATA ANALYTICS STUDIO* STREAMS MESSAGING MANAGER DATA PLANE SERVICES Authentication, Role-based access, Service lifecycle management, Cluster registration, Cluster Service discovery and access HDP/HDF Cluster DLM Engine Profiler Service DAS Agent SMM Agent
  • 13. 13 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. ⬢ Manage the Data Lifecycle: – Replication/failback to another cloud/on-prem site for Disaster Recovery – Auto Tiering of hot/warm/cold data to cloud object storage/on-prem for TCO reduction – Backup & Recover Critical Business Data ⬢ Maintain Common Security and Governance Policies Across Multi Data Sources/ Environments Data Lifecycle Manager (DLM) DATA LIFECYCLE MANAGER REPLICATION & DISASTER RECOVERY Cluster Cluster ClusterMOVE MOVE AUTO TIERING BACKUP & RESTORE P(use): high Cost: $$$ P(use): medium Cost: $$ P(use): low Cost: $ Full backup day 1 day 2 day 3 Cumulative incremental backups Accident delete X FAILBACK REPLICATION RESTORE Prod Cluster Backup Cluster Generally Available Coming Soon Coming Soon DLM
  • 14. 14 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. DLM 1.0 (GA Product) DLM: Pair clusters and manage data replication flows Data Lifecycle Manager (DLM)
  • 15. 15 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. DLM: Replicate between on-prem and cloud DPS PlatformData Lifecycle Manager (DLM)
  • 16. 16 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. DLM: Replication policies and instances Data Lifecycle Manager (DLM)Data Lifecycle Manager (DLM)
  • 17. 17 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. Intuitive Query Tools Full featured auto-complete, results direct download, quick-data preview Data Analytics Studio (DAS)Data Analytics Studio (DAS)
  • 18. 18 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
  • 19. 19 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. Enhance productivity through full featured auto- complete, results direct download, quick-data preview features Data Analytics Studio (DAS)
  • 20. 20 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. Self optimize queries and storage based on heuristic recommendation engine Data Analytics Studio (DAS)
  • 21. 21 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. DAS: Data Analytics Studio gives database heatmap, quickly discover and see what part of your cluster is being utilized more Data Analytics Studio (DAS)
  • 22. 22 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. DAS: Heuristic recommendation engine Fully self-serviced query and storage optimization Data Analytics Studio (DAS)
  • 23. 23 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. Built-in batch operations No more scripting needed for day-to-day operations Data Analytics Studio (DAS)Data Analytics Studio (DAS)
  • 24. 24 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. Hortonworks Streams Messaging Manager (SMM) What is SMM? à Kafka Management and Monitoring tool à Single Monitoring Dashboard for all your Kafka Clusters across 4 entities – Broker – Producer – Topic – Consumer à Supports multiple HDP and/or HDF Kafka Clusters à REST as a First Class Citizen à Delivered as a DataPlane Service
  • 25. 25 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. SMMSMM: Full visibility into all details of Kafka Clusters DPS PlatformStreams Messaging Manager
  • 26. 26 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. SMMSMM: Detailed Views of specific Topics DPS PlatformStreams Messaging Manager
  • 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Explore Metadata about the Topic in Atlas Click on Atlas Link to see the metadata of the topic gateway-west-raw-sensors in Atlas
  • 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Traverse the flow of data across multiple Kafka Topics using SMM and Atlas Integration Question The topic has one active consumer which is a NiFi consumer. Which Kafka topic if any is this Nifi Flow consumer publishing events to? Step 1 Click on Atlas Icon to see lineage of the the topic gateway-west-raw-sensors Analysis NiFi App consumes from the gateway-west-raw-sensors topic and publishes events to downstream Kafka topic called syndicate-geo-event-avro
  • 29. 29 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. SMMSMM: All producers and Consumers associated with a topic DPS PlatformStreams Messaging Manager
  • 30. 30 © Hortonworks Inc. 2011–2018. All rights reserved. Data Steward Studio Overview
  • 31. 31 © Hortonworks Inc. 2011–2018. All rights reserved. Data Governance: It’s a team sport! Implements business data requirements Data CuratorData Steward Manages business requirements for data sharing Sponsor Champions data governance across enterprise Data Owner Accountable for all data generated by an agency Supports the Data Steward in data related activities Business Data SME Coordinate cross-agency data management activities Data Council
  • 32. 35 © Hortonworks Inc. 2011–2018. All rights reserved. Goals
  • 33. 36 © Hortonworks Inc. 2011–2018. All rights reserved. Hortonworks DataPlane Service (DPS)
  • 34. 37 © Hortonworks Inc. 2011–2018. All rights reserved. Organize Your Data Assets as Collections • Data Asset Collections - Organizational construct for assets based on business definition for grouping heterogenous data • Create asset collections and attach metadata • Contextual attributes: Name, Description, Owner, Datalake • System attributes: - Created-on, Modified-on, Modified-by, Created-by, Version • Search for assets using attribute facets or free text • View personalized dashboard of asset collections • Delete/update data asset collections • Asset 360 view for assets in collection Asset Collections
  • 35. 38 © Hortonworks Inc. 2011–2018. All rights reserved. Discover and Fingerprint your Data Assets • Computes Profile for data assets as they are ingested or created within the platform. Automatically determines types of columns based on data values • Generates key metrics for data in each column. Various visualizations can be utilized (Box plots, Histograms, Pie charts) to view metrics • Persists profile information in cluster • As more data is added, profilers can be scheduled for execution for updating the profile metadata for the asset. Data Profiler Column Statistical Profiler
  • 36. 39 © Hortonworks Inc. 2011–2018. All rights reserved. Know your Sensitive Data • Automatically detect and profile sensitive & personal data • Attach classification annotations for sensitivity • Manual approval and curation of sensitive data classifications • Leverage classification based data protection • Sensitive data dashboard on Asset 360 Sensitive Data Profiling
  • 37. 40 © Hortonworks Inc. 2011–2018. All rights reserved. Track your Sensitive Data • IBAN (27 EU Countries) • Credit Card Numbers • Email • Telephone (AMER, EU) • IP Address • URL • Passport (12 EU Countries) • National ID (19 EU Countries) • Australian Drivers License • Australian Passport • Australian National ID Sensitive Data Types
  • 38. 41 © Hortonworks Inc. 2011–2018. All rights reserved. Track Your Data Asset – Lineage and Impact • Consolidated Upstream lineage and downstream impact • Detailed click-through to asset properties Data Lineage and Impact
  • 39. 42 © Hortonworks Inc. 2011–2018. All rights reserved. View Security Policies for your Data Assets • View security policies on data assets • View classification based policies on assets Security Policies
  • 40. 43 © Hortonworks Inc. 2011–2018. All rights reserved. Monitor Usage of your Data Assets • Dashboard for access patterns and trends for each asset • Examples: • Count of Access Events • Top N Users over Time • Most recent trail of access audit events Audit and Monitoring
  • 41. 44 © Hortonworks Inc. 2011–2018. All rights reserved Thank you!