SlideShare a Scribd company logo
1 of 36
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Top Three Big Data Governance
Issues and How Apache ATLAS
resolves it for the Enterprise
June 28, 2016
Apache Atlas
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Disclaimer
This document may contain product features and technology directions that are under development, may be
under development in the future or may ultimately not be developed.
Project capabilities are based on information that is publicly available within the Apache Software Foundation
project websites ("Apache"). Progress of the project capabilities can be tracked from inception to release
through Apache, however, technical feasibility, market demand, user feedback and the overarching Apache
Software Foundation community development process can all effect timing and final delivery.
This document’s description of these features and technology directions does not represent a contractual
commitment, promise or obligation from Hortonworks to deliver these features in any generally available
product.
Product features and technology directions are subject to change, and must not be included in contracts,
purchase orders, or sales agreements of any kind.
Since this document contains an outline of general product development plans, customers should not rely upon it
when making purchasing decisions.
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Atlas Data Governance
Organizations need data governance to understand its information to answer
questions such as:
• What do we know about our information?
• Where did this data come from and who can use it?
• Does this data adhere to company policies and rules?
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
STRUCTURED
UNSTRUCTURED
Vision - Enterprise Data Governance Across Platforms
TRADITIONAL
RDBMS
METADATA
MPP
APPLIANCES
Project 1
Project 5
Project 4
Project 3
METADATA
Project 6
DATA
LAKE
Atlas: Metadata Truth in Hadoop
Data Management
along the entire data lifecycle with integrated
provenance and lineage capability
Modeling with Metadata
enables comprehensive data lineage through
a hybrid approach with enhanced tagging and
attribute capabilities
Interoperable Solutions
across the Hadoop ecosystem, through a
common metadata store
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Atlas Overview
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Atlas Data Governance
Data governance practices provide a holistic approach to managing,
improving and leveraging information to help you gain insight and build
confidence in business decisions and operations.
Atlas helps customers discover information about data objects, their
meaning, location, characteristics, and usage.
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Atlas timeline: from DGI to present
May
2015
Apache
Atlas
Incubation
DGI group
Kickoff
Dec
2014
July
2015
HDP 2.3
Foundation
GA Release
First kickoff to
GA in 7 months
Global
Financial
Company
* DGI: Data Governance Initiative
Key Benefits:
• Co-Dev = Built for
real customer use
cases
• Faster & Safer =
Customers know
business + HWX
knows Hadoop
Jan
2016
HDP 2.4
Kafka/Storm
Sqoop
Falcon
Tag Based
Security
Summer
2016
HDP 2.5
Business Catalog
AD integration
Versioning
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Big Data Management Through Metadata
Management Scalability
Many traditional tools and patterns do not scale when applied to multi-
tenant data lakes. Many enterprise have silo’d data and metadata
stores that collide in the data lake. This is compounded by the ability to
have very large windows (years). Can traditional EDW tools manage
100 million entities effectively with room to grow ?
Metadata Tools
Scalable, decoupled, de-centralized manage driven through metadata
is the only via solution. This allows quick integration with automation
and other metamodels
Tags for Management, Discovery and Security
Proper metadata is the foundation for business taxonomy, stewardship,
attribute based security and self-service.
Key Benefits:
Modern Data Lakes
need new ways to
govern because:
• Cost – Traditional staff ratio
to data size not possible
• Diversity – Only way to
manage velocity of new
datasets
• Agility – Quick change based
on tags / taxonomy
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
High Level Architecture: 4 Key points
Type System
Repository
Search DSL
Bridge
Hive Storm
Falcon
Custo
m
REST API
Graph DB
Search
Kafka
Sqoop
Connectors
MessagingFramework
3 REST API
Modern, flexible
access to Atlas
services, HDP
components, UI &
external tools
1 Data Lineage
Only product that
captures lineage
across Hadoop
components at
platform level.
4 Exchange
Leverage existing
metadata / models by
importing it from
current tools. Export
metadata to
downstream systems
2 Agile Data
Modeling:
Type system allows
custom metadata
structures in a
hierarchy taxonomy
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Governance Ready Certification Program
Discovery
Tagging
Prep /
Cleanse
ETL
Governance
BPM
Self Service
Visualization
Choice: Customers choose features that they want to
deploy—a la carte versus vendor lock
Curated & Fast: Selected group of vendor partners to
provide rich, complimentary and complete features ready
to deploy
Agile: Low switching costs, Faster deployment and
innovation
Centralized: Common SLA & common open metadata
store
Flexibility: Interoperability of products through Atlas
metadata
Safe: HDP at core to provide stability and interoperability
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Governance Ready Certification Program
Completed:
• Waterline
• Dataguise
• Attivo
Next:
• SAP ILM,VORA
• IBM IGC
Work in progress:
• Collibra
• Alation
• Meta Integration
(Miti)
• Paxata
• Syncsort
• Trifacta
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Near Term Roadmap:
Summer 2016
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Summer 2016 Release Summary
• Dynamic Access Policies
• Cross component lineage
• Enterprise Readiness
• Business Catalog
Differentiato
r
Differentiato
r
Differentiato
r
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dynamic Access Policy
Apache Ranger + Atlas Integration
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Summary of Dynamic Access Policies
• Basic Tag policy – PII example. Permission
mapped to re-useable tag not resource
• Geo-based policy – Policy based on IP address
mappings. Rule enforcement dynamically geo
aware.
• Time-based policy – Timer for data access for
resource management, compliance reporting
• Prohibitions – Prevention of toxic combinations
of Hive tables or columns that may pose a risk
together.
Key Benefits:
New scalable metadata
based security paradigm
Dynamic, real-time
policy
Automatically updates to
changes in metadata
Centralized and simple
to manage policy
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How does Atlas work with Ranger at scale?
Atlas provides: Metadata
• Business Classification (taxonomy): Company > HR > Driver
• Hierarchy with Inheritance of attribute to child objects:
Sensitive “PII” tag of department HR will be inherited by group
HR> Driver
• Atlas will notify Ranger via Kafka Topic for changes
Apache Atlas
Hive
Ranger
Falcon
Kafka
Storm
Atlas provides the
metadata tag to
create policies
Ranger provides: Access & Entitlements
• Ranger will cache tags and asset mapping for performance
• Ranger will have a policy based on tags instead of roles.
• Example: PII = <group> This can work for a may assets.
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Scalable Access Control – Reusable Tag Policy
User group
• AD
• Linux
Resources:
• Files
• Tables
• Topologies
Atlas Tag
• PII
ANY asset PII
• Files
• Tables
• Topologies
Single Admin Group
Assigns
Many Stewards Tag +
Single point of
enforcement and
audit
All future tagging
is covered by
existing policy
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Automatic update of policies – active protection
Metastore
• Tags
• Assets
• Entities
Notification
Framework
Kafka Topics
Atlas
Atlas Client
• Subscribes to
Topic
• Gets Metadata
Updates
PDP
Resource Cache
Ranger
Notification Metadata
updates
Message
durability
Optimized
for Speed
Event driven
updates
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hadoop Cross Component
Data Lineage
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Atlas Component Integration
• Cross- component dataset lineage. Centralized
location for all metadata inside HDP
• Single Interface point for Metadata Exchange with
platforms outside of HDP
Apache Atlas
Hive
Ranger
Falcon
Sqoop
Storm
Kafka
Spark
NiFi
HBase
HDP 2.3
HDP 2.5
Beyond HDP 2.5
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Users in the upcoming release of HDP 2.5 will be able to
track lineage across the following components using
Atlas:

Sqoop – Import from and export to relational databases, and
additional package that leverages sqoop. ATLAS-184 , SQOOP-
2609
 Hive - Dataset lineage with entity versioning (including schema
changes) ATLAS-75. ATLAS-183, ATLAS-492
 Kafka/ Storm - IoT event-level processing, such as syslogs, or
sensor data ATLAS-181 , ATLAS-183, STORM-1381
 Falcon - Data lifecycle at Feed and Process entity level for
replication, and repeating workflows. Tracks period-icy,
throttling, ecviction. ATLAS-69 , FALCON-1570
Summary of Data Lineage
Key Benefits:
Enterprises need open
solutions, not single app
vendor
More native connectors
than anyone else with
more coming
Hardened metadata
infrastructure
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Sqoop
Teradata
Connector
Apache
Kafka
Expanded Native Connector: Dataset Lineage
Custom
Activity
Reporter
Metadata
Repository
RDBMS
Any process
using Sqoop is
covered
No other tool
tracks IOT of
the box
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Summer 2016 Release Summary
• Dynamic Access Policies
• Cross component lineage
• Enterprise Readiness
• Business Catalog
Differentiator
Differentiator
Differentiator
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Enterprise Readiness
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Security/Enterprise Readiness
• Highly reliable and scalable components
• Authorization with AD via Ranger
• Rolling upgrade support HDP 2.5 +
• BC & DR capabilities
• Improved performance of 5x from previous version
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Enterprise Readiness:
Scalable and Highly Reliable Components
Solr
Cloud
Kafka
Quorum
Type System
Repository
Search DSL
Bridge
Hive Storm
Falcon Custom
REST API
Graph DB
Search
Kafka
SqoopConnectors
MessagingFramework
HBase
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Summer 2016 Release Summary
• Dynamic Access Policies
• Cross component lineage
• Enterprise Readiness
• Business Catalog
Differentiator
Differentiator
Differentiator
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Business Taxonomy (Catalog)
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Key Concepts
Business Taxonomy (Catalog)
The practice and science of classification of things or
concepts, including the principles that underlie such
classification. The business organization model is
hierarchical making authoritative with no duplication.
Data Lineage (Provenance)
Data lineage is defined as a data life cycle that includes the
data's origins and where it moves over time. It describes
what happens to data as it goes through diverse processes. It
helps provide visibility into the analytics pipeline and
simplifies tracing errors back to their sources
Tags: Traits vs. Labels vs. Business Taxonomy
Atlas has Tags that are authorative and prevent duplication.
Tag can span different parts of the business taxonomy. A tag
PII can be used in HR as well Finance or Sales.
Benefits:
A view of data assets
organized by business
language
Impact analysis, Compliance,
Acceptable use
Common tag though Hadoop
components
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Taxonomies Benefits:
• Search / Discovery – Business catalog of
conceptual, logical and physical assets
• Security --Dynamic metadata based
Access control
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
We conduct open-ended user interviews so that we can learn more
about who are users are and what their needs are. This helps us
validate whether or not we’re solving the right problem.
Research: Focused on Hadoop
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
We test our prototype in InVision - a click through prototyping tool
that allows users to interact with static mockups.
Usability Testing
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Principle Roles & Activities
• Data Steward – Curator, responsible
for catalog veracity
• Data Scientist – Analyst, primary
consumer of Business Catalog
• Administrator – Role management
only
• Data Engineer – Data ingress and
egress, semantic data quality
• 50% - 80%+ Time
spend looking
for data
• Profit Center • Primary User
of Atlas
• Enables
Scientist
Goal: < 25% spent on
finding data
=
Empowering scientist to
spend their time
uncovering insights --
faster
34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Atlas Value
• Designed for Hadoop at platform, not application level
• High Confidence data in Hadoop for regulated verticals
• Compliance and business objectives aligned to data organization
• Faster discovery for analysts – reduce time to value
• Agile and adaptable – ensures information is current by native
connectors
• Dynamic protection with Ranger in simple audited policies
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Additional Atlas Sessions
• Extend Governance in Hadoop with the Atlas Ecosystem:
integrations with partners Waterline, Trifacta and Attivo:
Thursday 4:10PM @ Room 210A
• BOF: Apache Knox and Apache Ranger provide Hadoop security
while Atlas provides a Hadoop metadata store and enterprise
compliance. Come learn and discuss security & governance
innovations and future directions.
Thursday 5-7 PM @ Room 210A
36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Learn More:
• Hortonworks links: http://hortonworks.com/solutions/security-and-
governance/
• Tutorials: https://github.com/hortonworks/tutorials/tree/atlas-ranger-
tp/tutorials/hortonworks/atlas-ranger-preview

More Related Content

What's hot

Open Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache AtlasOpen Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache AtlasDataWorks Summit
 
Real time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafkaReal time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafkaTimothy Spann
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMatei Zaharia
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeKent Graziano
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Databricks
 
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenData Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenDatabricks
 
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...DataWorks Summit
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on AzureTrivadis
 
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFiReal-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFiTimothy Spann
 
Apache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop componentsApache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop componentsDataWorks Summit/Hadoop Summit
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - RangerIsheeta Sanghi
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Timothy Spann
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 

What's hot (20)

Open Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache AtlasOpen Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache Atlas
 
Real time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafkaReal time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafka
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on Snowflake
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenData Discovery at Databricks with Amundsen
Data Discovery at Databricks with Amundsen
 
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
 
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFiReal-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
 
Apache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop componentsApache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop components
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - Ranger
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Nifi workshop
Nifi workshopNifi workshop
Nifi workshop
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 

Similar to Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the Enterprise

Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?DataWorks Summit/Hadoop Summit
 
Classification based security in Hadoop
Classification based security in HadoopClassification based security in Hadoop
Classification based security in HadoopMadhan Neethiraj
 
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...DataWorks Summit/Hadoop Summit
 
What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it DataWorks Summit/Hadoop Summit
 
Atlas and ranger epam meetup
Atlas and ranger epam meetupAtlas and ranger epam meetup
Atlas and ranger epam meetupAlex Zeltov
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceHortonworks
 
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & TrifactaExtend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & TrifactaDataWorks Summit/Hadoop Summit
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
 
Dynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPDynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPHortonworks
 
Building a data-driven authorization framework
Building a data-driven authorization frameworkBuilding a data-driven authorization framework
Building a data-driven authorization frameworkDataWorks Summit
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopHortonworks
 
Hortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your dataHortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your dataScott Clinton
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
Data Governance Initiative
Data Governance InitiativeData Governance Initiative
Data Governance InitiativeDataWorks Summit
 
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...Hortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalWebinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalHortonworks
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalWebinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalHortonworks
 

Similar to Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the Enterprise (20)

Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?
 
Classification based security in Hadoop
Classification based security in HadoopClassification based security in Hadoop
Classification based security in Hadoop
 
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
 
What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it
 
Enterprise Data Classification and Provenance
Enterprise Data Classification and ProvenanceEnterprise Data Classification and Provenance
Enterprise Data Classification and Provenance
 
Atlas and ranger epam meetup
Atlas and ranger epam meetupAtlas and ranger epam meetup
Atlas and ranger epam meetup
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data Governance
 
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & TrifactaExtend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Dynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPDynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDP
 
Building a data-driven authorization framework
Building a data-driven authorization frameworkBuilding a data-driven authorization framework
Building a data-driven authorization framework
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
 
Hortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your dataHortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your data
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Data Governance Initiative
Data Governance InitiativeData Governance Initiative
Data Governance Initiative
 
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalWebinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_final
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalWebinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_final
 

More from DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesDataWorks Summit/Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 

Recently uploaded

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Recently uploaded (20)

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the Enterprise

  • 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the Enterprise June 28, 2016 Apache Atlas
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Disclaimer This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed. Project capabilities are based on information that is publicly available within the Apache Software Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from inception to release through Apache, however, technical feasibility, market demand, user feedback and the overarching Apache Software Foundation community development process can all effect timing and final delivery. This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product. Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. Since this document contains an outline of general product development plans, customers should not rely upon it when making purchasing decisions.
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Atlas Data Governance Organizations need data governance to understand its information to answer questions such as: • What do we know about our information? • Where did this data come from and who can use it? • Does this data adhere to company policies and rules?
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved STRUCTURED UNSTRUCTURED Vision - Enterprise Data Governance Across Platforms TRADITIONAL RDBMS METADATA MPP APPLIANCES Project 1 Project 5 Project 4 Project 3 METADATA Project 6 DATA LAKE Atlas: Metadata Truth in Hadoop Data Management along the entire data lifecycle with integrated provenance and lineage capability Modeling with Metadata enables comprehensive data lineage through a hybrid approach with enhanced tagging and attribute capabilities Interoperable Solutions across the Hadoop ecosystem, through a common metadata store
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Atlas Overview
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Atlas Data Governance Data governance practices provide a holistic approach to managing, improving and leveraging information to help you gain insight and build confidence in business decisions and operations. Atlas helps customers discover information about data objects, their meaning, location, characteristics, and usage.
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Atlas timeline: from DGI to present May 2015 Apache Atlas Incubation DGI group Kickoff Dec 2014 July 2015 HDP 2.3 Foundation GA Release First kickoff to GA in 7 months Global Financial Company * DGI: Data Governance Initiative Key Benefits: • Co-Dev = Built for real customer use cases • Faster & Safer = Customers know business + HWX knows Hadoop Jan 2016 HDP 2.4 Kafka/Storm Sqoop Falcon Tag Based Security Summer 2016 HDP 2.5 Business Catalog AD integration Versioning
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Big Data Management Through Metadata Management Scalability Many traditional tools and patterns do not scale when applied to multi- tenant data lakes. Many enterprise have silo’d data and metadata stores that collide in the data lake. This is compounded by the ability to have very large windows (years). Can traditional EDW tools manage 100 million entities effectively with room to grow ? Metadata Tools Scalable, decoupled, de-centralized manage driven through metadata is the only via solution. This allows quick integration with automation and other metamodels Tags for Management, Discovery and Security Proper metadata is the foundation for business taxonomy, stewardship, attribute based security and self-service. Key Benefits: Modern Data Lakes need new ways to govern because: • Cost – Traditional staff ratio to data size not possible • Diversity – Only way to manage velocity of new datasets • Agility – Quick change based on tags / taxonomy
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved High Level Architecture: 4 Key points Type System Repository Search DSL Bridge Hive Storm Falcon Custo m REST API Graph DB Search Kafka Sqoop Connectors MessagingFramework 3 REST API Modern, flexible access to Atlas services, HDP components, UI & external tools 1 Data Lineage Only product that captures lineage across Hadoop components at platform level. 4 Exchange Leverage existing metadata / models by importing it from current tools. Export metadata to downstream systems 2 Agile Data Modeling: Type system allows custom metadata structures in a hierarchy taxonomy
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Governance Ready Certification Program Discovery Tagging Prep / Cleanse ETL Governance BPM Self Service Visualization Choice: Customers choose features that they want to deploy—a la carte versus vendor lock Curated & Fast: Selected group of vendor partners to provide rich, complimentary and complete features ready to deploy Agile: Low switching costs, Faster deployment and innovation Centralized: Common SLA & common open metadata store Flexibility: Interoperability of products through Atlas metadata Safe: HDP at core to provide stability and interoperability
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Governance Ready Certification Program Completed: • Waterline • Dataguise • Attivo Next: • SAP ILM,VORA • IBM IGC Work in progress: • Collibra • Alation • Meta Integration (Miti) • Paxata • Syncsort • Trifacta
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Near Term Roadmap: Summer 2016
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Summer 2016 Release Summary • Dynamic Access Policies • Cross component lineage • Enterprise Readiness • Business Catalog Differentiato r Differentiato r Differentiato r
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dynamic Access Policy Apache Ranger + Atlas Integration
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Summary of Dynamic Access Policies • Basic Tag policy – PII example. Permission mapped to re-useable tag not resource • Geo-based policy – Policy based on IP address mappings. Rule enforcement dynamically geo aware. • Time-based policy – Timer for data access for resource management, compliance reporting • Prohibitions – Prevention of toxic combinations of Hive tables or columns that may pose a risk together. Key Benefits: New scalable metadata based security paradigm Dynamic, real-time policy Automatically updates to changes in metadata Centralized and simple to manage policy
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved How does Atlas work with Ranger at scale? Atlas provides: Metadata • Business Classification (taxonomy): Company > HR > Driver • Hierarchy with Inheritance of attribute to child objects: Sensitive “PII” tag of department HR will be inherited by group HR> Driver • Atlas will notify Ranger via Kafka Topic for changes Apache Atlas Hive Ranger Falcon Kafka Storm Atlas provides the metadata tag to create policies Ranger provides: Access & Entitlements • Ranger will cache tags and asset mapping for performance • Ranger will have a policy based on tags instead of roles. • Example: PII = <group> This can work for a may assets.
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Scalable Access Control – Reusable Tag Policy User group • AD • Linux Resources: • Files • Tables • Topologies Atlas Tag • PII ANY asset PII • Files • Tables • Topologies Single Admin Group Assigns Many Stewards Tag + Single point of enforcement and audit All future tagging is covered by existing policy
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Automatic update of policies – active protection Metastore • Tags • Assets • Entities Notification Framework Kafka Topics Atlas Atlas Client • Subscribes to Topic • Gets Metadata Updates PDP Resource Cache Ranger Notification Metadata updates Message durability Optimized for Speed Event driven updates
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hadoop Cross Component Data Lineage
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Atlas Component Integration • Cross- component dataset lineage. Centralized location for all metadata inside HDP • Single Interface point for Metadata Exchange with platforms outside of HDP Apache Atlas Hive Ranger Falcon Sqoop Storm Kafka Spark NiFi HBase HDP 2.3 HDP 2.5 Beyond HDP 2.5
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Users in the upcoming release of HDP 2.5 will be able to track lineage across the following components using Atlas:  Sqoop – Import from and export to relational databases, and additional package that leverages sqoop. ATLAS-184 , SQOOP- 2609  Hive - Dataset lineage with entity versioning (including schema changes) ATLAS-75. ATLAS-183, ATLAS-492  Kafka/ Storm - IoT event-level processing, such as syslogs, or sensor data ATLAS-181 , ATLAS-183, STORM-1381  Falcon - Data lifecycle at Feed and Process entity level for replication, and repeating workflows. Tracks period-icy, throttling, ecviction. ATLAS-69 , FALCON-1570 Summary of Data Lineage Key Benefits: Enterprises need open solutions, not single app vendor More native connectors than anyone else with more coming Hardened metadata infrastructure
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Sqoop Teradata Connector Apache Kafka Expanded Native Connector: Dataset Lineage Custom Activity Reporter Metadata Repository RDBMS Any process using Sqoop is covered No other tool tracks IOT of the box
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Summer 2016 Release Summary • Dynamic Access Policies • Cross component lineage • Enterprise Readiness • Business Catalog Differentiator Differentiator Differentiator
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Enterprise Readiness
  • 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Security/Enterprise Readiness • Highly reliable and scalable components • Authorization with AD via Ranger • Rolling upgrade support HDP 2.5 + • BC & DR capabilities • Improved performance of 5x from previous version
  • 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Enterprise Readiness: Scalable and Highly Reliable Components Solr Cloud Kafka Quorum Type System Repository Search DSL Bridge Hive Storm Falcon Custom REST API Graph DB Search Kafka SqoopConnectors MessagingFramework HBase
  • 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Summer 2016 Release Summary • Dynamic Access Policies • Cross component lineage • Enterprise Readiness • Business Catalog Differentiator Differentiator Differentiator
  • 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Business Taxonomy (Catalog)
  • 29. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Key Concepts Business Taxonomy (Catalog) The practice and science of classification of things or concepts, including the principles that underlie such classification. The business organization model is hierarchical making authoritative with no duplication. Data Lineage (Provenance) Data lineage is defined as a data life cycle that includes the data's origins and where it moves over time. It describes what happens to data as it goes through diverse processes. It helps provide visibility into the analytics pipeline and simplifies tracing errors back to their sources Tags: Traits vs. Labels vs. Business Taxonomy Atlas has Tags that are authorative and prevent duplication. Tag can span different parts of the business taxonomy. A tag PII can be used in HR as well Finance or Sales. Benefits: A view of data assets organized by business language Impact analysis, Compliance, Acceptable use Common tag though Hadoop components
  • 30. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Taxonomies Benefits: • Search / Discovery – Business catalog of conceptual, logical and physical assets • Security --Dynamic metadata based Access control
  • 31. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved We conduct open-ended user interviews so that we can learn more about who are users are and what their needs are. This helps us validate whether or not we’re solving the right problem. Research: Focused on Hadoop
  • 32. 32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved We test our prototype in InVision - a click through prototyping tool that allows users to interact with static mockups. Usability Testing
  • 33. 33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Principle Roles & Activities • Data Steward – Curator, responsible for catalog veracity • Data Scientist – Analyst, primary consumer of Business Catalog • Administrator – Role management only • Data Engineer – Data ingress and egress, semantic data quality • 50% - 80%+ Time spend looking for data • Profit Center • Primary User of Atlas • Enables Scientist Goal: < 25% spent on finding data = Empowering scientist to spend their time uncovering insights -- faster
  • 34. 34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Atlas Value • Designed for Hadoop at platform, not application level • High Confidence data in Hadoop for regulated verticals • Compliance and business objectives aligned to data organization • Faster discovery for analysts – reduce time to value • Agile and adaptable – ensures information is current by native connectors • Dynamic protection with Ranger in simple audited policies
  • 35. 35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Additional Atlas Sessions • Extend Governance in Hadoop with the Atlas Ecosystem: integrations with partners Waterline, Trifacta and Attivo: Thursday 4:10PM @ Room 210A • BOF: Apache Knox and Apache Ranger provide Hadoop security while Atlas provides a Hadoop metadata store and enterprise compliance. Come learn and discuss security & governance innovations and future directions. Thursday 5-7 PM @ Room 210A
  • 36. 36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Learn More: • Hortonworks links: http://hortonworks.com/solutions/security-and- governance/ • Tutorials: https://github.com/hortonworks/tutorials/tree/atlas-ranger- tp/tutorials/hortonworks/atlas-ranger-preview

Editor's Notes

  1. 4
  2. How fast ? 7 months !
  3. Apache Atlas is the only open source project created to solve the governance challenge in the open. The founding members of the project include all the members of the data governance initiative and others from the Hadoop community. The core functionality defined by the project includes the following: Data Classification – create an understanding of the data within Hadoop and provide a classification of this data to external and internal sources Centralized Auditing – provide a framework to capture and report on access to and modifications of data within Hadoop Search & Lineage – allow pre-defined and ad hoc exploration of data and metadata while maintaining a history of how a data source or explicit data was constructed Security and Policy Engine – implement engines to protect and rationalize data access and according to compliance policy
  4. Which Vendors would you be interested in ?
  5. The point of Atlas is to leverage metadata to drive exchange, agility and scalability in the HDP gov solution.   The paradigm shift requires that in a true data lake with multi-tenant environment with 10K+ of objects, conventional management of entitlement and enforcement will not work and new patterns must be used.   One group cannot both understand the data and manage policy efficiently — the domain is too large.  These activities must be de-coupled.   The data stewards curate the data as they are the SMEs (tagging), and the policy folks create a policy once based on tags (access rules).    In our thinking, this the ONLY scalable solution.   We have it and CDH does not.
  6. Apache Atlas = low level service like yarn. It will be common to the whole HDP platform, providing core metadata services and enriching the whole HDP stack. We start with Hive in HDP 2.3 and will extend to Ranger and Falcon in M10 and continue with Kafka and Storm by the end of 2015. Yellow + Atlas = governance features.
  7. Show – clearly identify customer metadata. Change Add customer classification example – Aetna – make the use case story have continuity. Use DX procedures to diagnosis ** bring meta from external systems into hadoop – keep it together
  8. Show – clearly identify customer metadata. Change Add customer classification example – Aetna – make the use case story have continuity. Use DX procedures to diagnosis ** bring meta from external systems into hadoop – keep it together
  9. Show – clearly identify customer metadata. Change Add customer classification example – Aetna – make the use case story have continuity. Use DX procedures to diagnosis ** bring meta from external systems into hadoop – keep it together
  10. Apache Atlas is the only open source project created to solve the governance challenge in the open. The founding members of the project include all the members of the data governance initiative and others from the Hadoop community. The core functionality defined by the project includes the following: Data Classification – create an understanding of the data within Hadoop and provide a classification of this data to external and internal sources Centralized Auditing – provide a framework to capture and report on access to and modifications of data within Hadoop Search & Lineage – allow pre-defined and ad hoc exploration of data and metadata while maintaining a history of how a data source or explicit data was constructed Security and Policy Engine – implement engines to protect and rationalize data access and according to compliance policy
  11. - Learn about who are users are and what are their needs to validate if we are solving the right problem Open ended half hour discussions about processes, challenges and current tools We record the interviews so that we can focus on the conversation and analyis them afterward
  12. - Test our prototype in Invision - A click through prototyping tool - Walk users through scenarios and watch how they respond - Remind our participants that we aren’t testing them, we’re testing the design and encourage thinking aloud
  13. Is the product was well understood? Is the product something they would use? Where is the value?