SlideShare une entreprise Scribd logo
1  sur  40
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Enterprise Data
Classification and
Provenance
Apache Atlas
Shwetha Shivalingamurthy
Suma Shivaprasad
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Disclaimer
This document may contain product features and technology directions that are under development, may be
under development in the future or may ultimately not be developed.
Project capabilities are based on information that is publicly available within the Apache Software Foundation
project websites ("Apache"). Progress of the project capabilities can be tracked from inception to release
through Apache, however, technical feasibility, market demand, user feedback and the overarching Apache
Software Foundation community development process can all effect timing and final delivery.
This document’s description of these features and technology directions does not represent a contractual
commitment, promise or obligation from Hortonworks to deliver these features in any generally available
product.
Product features and technology directions are subject to change, and must not be included in contracts,
purchase orders, or sales agreements of any kind.
Since this document contains an outline of general product development plans, customers should not rely upon it
when making purchasing decisions.
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
• Demo
• Big Data Governance
• Overview of Atlas
• Atlas architecture
• Features and Roadmap
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo usecase – Ad network
• Matches advertiser demand with ad space supply from publishers
• Billing based on ad impressions/ad engagement
• Enables targeting, tracking and reporting of ad impressions
• Typical reports/queries:
• Mismatch of demand and supply
• Country/os wise reports
• Top advertisers/publishers
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data landscape
Traditional
warehouse
Ad servers
User
Ad
Impression,
Click,
Billing logs
Metadata
Summaries
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data governance requirements
• Cross platform lineage – impact analysis, forensic, discovery
• Asset search
• Common Business Terms
• Compliance
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo
• Technical and business metadata
• Cross Component Lineage
• Creating views
• Create tags
• Entity deletes
• Search using tags, attributes
• Entity audit
• Business catalog – find assets
• Flexible model, external lineage ingest
HDP 2.5
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data
Governance
Data
Discovery
and
Tagging
Metadata
Management
Data
Lineage/Prov
enance
Access
Management
Data Security &
PrivacyData Quality
Compliance and
Audit
Data Wrangling
Data Lifecycle
Management
Data integration
Data Governance Aspects
Data governance refers to
processes, methods and tools
used in an enterprise
for effective control of
availability, usability, integrity,
and security of data
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Enterprise Data Governance: Apache Atlas
Data Management
along the entire data lifecycle with integrated
provenance and lineage capability
• Cross component lineage
Modeling with Metadata
enables comprehensive business metadata
vocabulary with enhanced tagging and attribute
capabilities
• Common Business Language
• Hierarchically organized – No dupes !
Interoperable Solutions
across the Hadoop ecosystem, through a common
metadata store
• Combine and Exchange Metadata
STRUCTURED
TRADITIONAL
RDBMS
METADATA
MPP
APPLIANCES
Kafka Storm
Sqoop
Hive
ATLAS
METADATA
Falcon
RANGER
Custom
Partners
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Background: DGI Community becomes Apache Atlas
May
2015
Apache
Atlas
Incubation
DGI group
Kickoff
Dec
2014
Aug
2016
HDP 2.5/
Apache 0.7
Release
Global Financial
Company
* DGI: Data Governance Initiative
Key Benefits:
• Co-Dev = Built for real
customer use cases
• Faster & Safer =
Customers know
business + HWX
knows Hadoop
• Code contributors
- Hortonworks, IBM,
Aetna , Merck, Target
Jul
2015
HDP 2.3/
Apache 0.5
Foundation
Release
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Architecture
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Architecture
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Atlas Type System
• Defines model – schema of metadata
• Flexible and powerful to define any model/custom types
• Supports inheritance
• Types
• Primitive types – bool, integer types, string, date, enum
• Collections - array, map
• Struct – set of attributes
• Class – Identifiable struct, hierarchy
• Trait – set of attributes, hierarchy
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hive Model
DataSet
metaType: ClassType
name: String required
hive_db
metaType: ClassType
name: string required
createTime: date required
parameters: map<string,string> optional
hive_table
metaType: ClassType
db: hive_ db required
createTime: date required
columns: array<hive_column>
required
hive_column
metaType: ClassType
name: string required
type: string required
extends references
references
0..n
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Entities
Instances of types
Name: rawlogs
Guid: 1
createTime: 2015-01-01 10:00
Type: hive_db
name: impressions
Guid: 2
Type: hive_table
name: adv_id
type: string
Guid: 3
Type: hive_column
name: user_id
type: string
Guid: 4
Type: hive_column
db column
column
EXPIRES_ON
Time: March, 2016
PII
trait
trait
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Graph Engine
• Graph Database
• Titan with storage backed by HBase
• Types and Entities are translated to the Graph Model
• Classes, Structs and Traits map to a vertex
• Relationships are mapped as edges
• Rich relationships between metadata objects
• Indexing and Search
• Indexing based on type annotations
• External indexing – Titan backed by Solr
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Titan property graph model
Graph Search with Gremlin
saturn =
g.V.has('name','saturn').next()
hercules =
saturn.as(‘x’).in(‘father’).loop(‘x’) {
it.loops > 3}.next()
hercules.outE(‘battled’).has(‘time’,
T.gt, 1).inV.name
cerberus
 hydra
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Search
Find Relevant Assets
based on their attributes ,
associations with business terms
DSL with sql like syntax based on type system
from $type is $trait where $clause select|has
$attributes, repeat
Examples
 Select columns from a hive_table where its name
is “impressions” and db name is “raw”
hive_column where table.name=”impressions",
table.db.name = ‘raw’
 Select all columns from hive tables which are
tagged as “PII”
hive_column is ‘PII’
Full text search
‘(rawlogs) AND hive’
‘(rawlogs OR supply*) AND hive’
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Features and Roadmap
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Atlas Component Integration & Lineage
• Cross- component dataset lineage.
Centralized location for all metadata
inside HDP
• Single Interface point for Metadata
Exchange with platforms outside of
HDP
Apache Atlas
Hive
Ranger
Falcon
Sqoop
Storm
Kafka
Spark
NiFi
HBase
Partner
Custom
HDP 2.3
HDP 2.5 Beyond HDP 2.5
HDP 2.5 External
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Business Catalog for Ease of Use
 Organize data assets along business terms
– Authoritative: Hierarchical Taxonomy Creation
– Agile modeling: Model Conceptual, Logical, Physical assets
– Definition and assignment of tags like PII (Personally
Identifiable Information)
 Comprehensive features for compliance
– Multiple user profiles including Data Steward and Business
Analysts
– Object auditing to track “Who did it”
– Metadata Versioning to track ”what did they do”
 Faster Insight: ( Roadmap )
– Data Quality tab for profiling and sampling
– User Comments
Key Benefits:
Organize data assets along
business terms
Compliance Features:
Faster Insight
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Ranger: Introduction
Centralized authorization and auditing across Hadoop components
• HDFS, Hive, HBase, Knox, Strom, YARN, Kafka, Solr, ..
• Audit logs to: Solr, HDFS, RDBMS, Log4j, ..
Resource based security
• Policies for specific set of resources
• Requires revision of policies as resources get added/moved
Classification based security
• Policies for classifications and not for specific resources
• A single policy protects resources in multiple components
• As classification for resources change, appropriate policies would
automatically be applied
• Enables separation of duties: resource-classification and security policies
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Scalable Access Control – Reusable Tag Policy
User group
• AD
• Linux
Resources:
• Files
• Tables
• Topologies
Atlas Tag
• PII
ANY asset PII
• Files
• Tables
• Topologies
Single Admin Group
Assigns
Many Stewards Tag +
Single point of
enforcement and
audit
All future tagging
is covered by
existing policy
Not Scalable
Scalable
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Open: Governance Ready Certification Program
Choice: Customers choose features that they want to
deploy—a la carte versus vendor lock
Curated & Fast: Selected group of vendor partners to
provide rich, complimentary and complete features ready
to deploy
Agile: Low switching costs, Faster deployment and
innovation
Centralized : Common SLA & common open metadata
store
Flexibility: Interoperability of products through Atlas
metadata
Safe: HDP at core to provide stability and interoperability
Completed:
• Waterline
• Dataguise
• Attivio
• Trifacta
Pending:
• Collibra
• Alation
• Meta
Integration
(Miti)
• Paxata
• Syncsort
• Talend
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Roadmap…
• MultiTenancy
• Titan 1.x Migration
• Hive Column Level Lineage
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Summary
• Designed for Hadoop at platform, not application level
• High Confidence data in Hadoop for regulated verticals
• Compliance and business objectives aligned to data organization
• Faster discovery for analysts – reduce time to value
• Agile and adaptable – ensures information is current by native
connectors
• Dynamic protection with Ranger in simple audited policies
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Learn More:
• Apache Incubator link
http://atlas.incubator.apache.org/
• Hortonworks links: http://hortonworks.com/solutions/security-and-
governance/
• https://community.hortonworks.com/spaces/64/governance-lifecycle-
track.html?topics=Atlas&type=question
• Atlas Technical User Guide -
http://atlas.incubator.apache.org/AtlasTechnicalUserGuide.pdf
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Questions
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Backup
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dynamic Access Policy
Apache Ranger + Atlas Integration
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How does Atlas work with Ranger at scale?
Atlas provides: Metadata
• Business Classification (taxonomy): Company > HR > Driver
• Hierarchy with Inheritance of attribute to child objects: Sensitive
“PII” tag of department HR will be inherited by group HR> Driver
• Atlas will notify Ranger via Kafka Topic for changes
Apache Atlas
Hive
Ranger
Falcon
Kafka
Storm
Atlas provides the
metadata tag to
create policies
Ranger provides: Access & Entitlements
• Ranger will cache tags and asset mapping for performance
• Ranger will have a policy based on tags instead of roles.
• Example: PII = <group> This can work for a may assets.
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Automatic update of policies – active protection
Metastore
• Tags
• Assets
• Entities
Notification
Framework
Kafka Topics
Atlas
Atlas Client
• Subscribes to
Topic
• Gets Metadata
Updates
PDP
Resource Cache
Ranger
Notification Metadata
updates
Message
durability
Optimized
for Speed
Event driven
updates
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Ranger: Authorization and Auditing
HBase
Ranger Administration Portal
HDFS
Hive Server2
Ranger Audit StoreRanger Policy Store
Ranger Plugin
Hadoop
Components
Enterprise
Users
Log4j
Knox
Storm
YARN
Kafka
Solr
HDFS
Solr
Ranger Plugin
Ranger Plugin
Ranger Plugin
Ranger Plugin
Ranger Plugin
Ranger Plugin
Ranger Plugin
RDBMS
34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Big Data Governance
Current Landscape
• Opaque Data and in variety of data stores – HDFS, S3, Data warehouses
• Schema is hardly sufficient – Hive Metastore, Avro, Data Warehouse
• Platform tools like Ranger and Falcon solve parts of the problem
Need for Data governance
Organizations need data governance to understand its information to answer
questions such as:
• What do we know about our information?
• Where did this data come from and how’s it being used?
• Does this data adhere to company policies and rules?
• Need for effective control and consumption of data
Atlas helps customers discover information about data objects, their meaning,
location, characteristics, and usage.
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Business Taxonomy
Business Taxonomy (Catalog)
The practice and science of classification of things or concepts,
including the principles that underlie such classification. The
business organization model is hierarchical making authoritative
with no duplication.
Tags: Traits vs. Labels vs. Business Taxonomy
Atlas has Tags that are authorative and prevent duplication. Tag
can span different parts of the business taxonomy. A tag PII can be
used in HR as well Finance or Sales.
Benefits:
A view of data assets organized
by business language
Compliance, Acceptable use –
Dynamic Metadata based access
control
Common taxonomy through
Hadoop components
36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Principle Roles & Activities in an
Enterprise
• Data Steward – Curator, responsible
for data classification – associate
business taxonomy and tagging,
access policies
• Data Scientist – Analyst, primary
consumer of Business Taxonomy
• Administrator/Operations – Role
management, Data lifecycle
management (Archival, retention)
• Data Engineer – Data ingress and
egress, semantic data quality
• 50% - 80%+ Time
spend looking
for data
• Profit Center • Primary User
of Atlas
• Enables
Scientist
Goal: < 25% spent on
finding data
=
Empowering scientist to
spend their time
uncovering insights --
faster
37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Governance Usecases : Impact analysis
 HortonAdNetwork – A large size Ad network which has an international footprint with multiple
publishers and advertisers across several countries
 Complex ETL jobs and data pipelines processing real-time ad network data from several different
sources and various data processing platforms
 No easy way to determine the root cause when something is off charts
 Data analysts need effective data provenance tools for Impact/Root cause anaylsis
 Cross component lineage is a must
 Data Lineage (Provenance)
Data lineage is defined as a data life cycle that includes the data's origins and where it moves over
time. It describes what happens to data as it goes through diverse processes. It helps provide visibility
into the analytics pipeline and simplifies tracing errors back to their sources
38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Governance Usecases - Compliance
 HortoniaBank – mid size bank expanding from US to international markets
 2 Customer Tables owned by BH: 50K customer records each with 38 fields (PII, PHI,
PCI & non-sensitive data)
– us_customers: USA person data only
– ww_customers: multi-language, multi-country, localized person data
 1 data set of prospects leased from a data broker
– tax_2010: Data lease expired already!
39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User Group Access Privileges
joe_analyst us_employee US Data Only, non-sensitive data only, rest forbidden
depending on sensitivity
kate_hr us_hr US Data Only, All sensitive data (PCI, PII, PHI)
Tag Based Policies
 US HR team members can see all original data (PCI, PII,….)
 Analysts are prohibited from viewing PII data in any of the tables
 Anyone except operations/Admin are prohibited to access tax_2010 after the specified
date - Expires_on policy turns off access on the configured expiry date
40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Sqoop
Teradata
Connector
Apache
Kafka
Expanded Native Connector: Dataset Lineage
Custom
Activity
Reporter
Metadata
Repository
RDBMS
Any process
using Sqoop is
covered
No other tool
tracks IOT of
the box

Contenu connexe

Tendances

ZERO TRUST ARCHITECTURE - DIGITAL TRUST FRAMEWORK
ZERO TRUST ARCHITECTURE - DIGITAL TRUST FRAMEWORKZERO TRUST ARCHITECTURE - DIGITAL TRUST FRAMEWORK
ZERO TRUST ARCHITECTURE - DIGITAL TRUST FRAMEWORKMaganathin Veeraragaloo
 
End-to-End Security Analytics with the Elastic Stack
End-to-End Security Analytics with the Elastic StackEnd-to-End Security Analytics with the Elastic Stack
End-to-End Security Analytics with the Elastic StackElasticsearch
 
Enterprise Security Architecture Design
Enterprise Security Architecture DesignEnterprise Security Architecture Design
Enterprise Security Architecture DesignPriyanka Aash
 
What is SIEM? A Brilliant Guide to the Basics
What is SIEM? A Brilliant Guide to the BasicsWhat is SIEM? A Brilliant Guide to the Basics
What is SIEM? A Brilliant Guide to the BasicsSagar Joshi
 
Security operations center-SOC Presentation-مرکز عملیات امنیت
Security operations center-SOC Presentation-مرکز عملیات امنیتSecurity operations center-SOC Presentation-مرکز عملیات امنیت
Security operations center-SOC Presentation-مرکز عملیات امنیتReZa AdineH
 
Introduction to filesystems and computer forensics
Introduction to filesystems and computer forensicsIntroduction to filesystems and computer forensics
Introduction to filesystems and computer forensicsMayank Chaudhari
 
Windows logging cheat sheet
Windows logging cheat sheetWindows logging cheat sheet
Windows logging cheat sheetMichael Gough
 
Architecture centric support for security orchestration and automation
Architecture centric support for security orchestration and automationArchitecture centric support for security orchestration and automation
Architecture centric support for security orchestration and automationChadni Islam
 
Alfresco DevCon 2019: Encryption at-rest and in-transit
Alfresco DevCon 2019: Encryption at-rest and in-transitAlfresco DevCon 2019: Encryption at-rest and in-transit
Alfresco DevCon 2019: Encryption at-rest and in-transitToni de la Fuente
 
SOC presentation- Building a Security Operations Center
SOC presentation- Building a Security Operations CenterSOC presentation- Building a Security Operations Center
SOC presentation- Building a Security Operations CenterMichael Nickle
 
MITRE ATT&CK framework
MITRE ATT&CK frameworkMITRE ATT&CK framework
MITRE ATT&CK frameworkBhushan Gurav
 
WHY SOC Services needed?
WHY SOC Services needed?WHY SOC Services needed?
WHY SOC Services needed?manoharparakh
 

Tendances (20)

MITRE ATT&CK Framework
MITRE ATT&CK FrameworkMITRE ATT&CK Framework
MITRE ATT&CK Framework
 
ZERO TRUST ARCHITECTURE - DIGITAL TRUST FRAMEWORK
ZERO TRUST ARCHITECTURE - DIGITAL TRUST FRAMEWORKZERO TRUST ARCHITECTURE - DIGITAL TRUST FRAMEWORK
ZERO TRUST ARCHITECTURE - DIGITAL TRUST FRAMEWORK
 
Threat Intelligence
Threat IntelligenceThreat Intelligence
Threat Intelligence
 
End-to-End Security Analytics with the Elastic Stack
End-to-End Security Analytics with the Elastic StackEnd-to-End Security Analytics with the Elastic Stack
End-to-End Security Analytics with the Elastic Stack
 
ISE-802.1X-MAB
ISE-802.1X-MABISE-802.1X-MAB
ISE-802.1X-MAB
 
Enterprise Security Architecture Design
Enterprise Security Architecture DesignEnterprise Security Architecture Design
Enterprise Security Architecture Design
 
What is SIEM? A Brilliant Guide to the Basics
What is SIEM? A Brilliant Guide to the BasicsWhat is SIEM? A Brilliant Guide to the Basics
What is SIEM? A Brilliant Guide to the Basics
 
Security operations center-SOC Presentation-مرکز عملیات امنیت
Security operations center-SOC Presentation-مرکز عملیات امنیتSecurity operations center-SOC Presentation-مرکز عملیات امنیت
Security operations center-SOC Presentation-مرکز عملیات امنیت
 
Introduction to filesystems and computer forensics
Introduction to filesystems and computer forensicsIntroduction to filesystems and computer forensics
Introduction to filesystems and computer forensics
 
Windows logging cheat sheet
Windows logging cheat sheetWindows logging cheat sheet
Windows logging cheat sheet
 
Cloud Security
Cloud SecurityCloud Security
Cloud Security
 
ClearPass Overview
ClearPass OverviewClearPass Overview
ClearPass Overview
 
Cloud Auditing
Cloud AuditingCloud Auditing
Cloud Auditing
 
Network Access Control (NAC)
Network Access Control (NAC)Network Access Control (NAC)
Network Access Control (NAC)
 
Architecture centric support for security orchestration and automation
Architecture centric support for security orchestration and automationArchitecture centric support for security orchestration and automation
Architecture centric support for security orchestration and automation
 
Alfresco DevCon 2019: Encryption at-rest and in-transit
Alfresco DevCon 2019: Encryption at-rest and in-transitAlfresco DevCon 2019: Encryption at-rest and in-transit
Alfresco DevCon 2019: Encryption at-rest and in-transit
 
SOC presentation- Building a Security Operations Center
SOC presentation- Building a Security Operations CenterSOC presentation- Building a Security Operations Center
SOC presentation- Building a Security Operations Center
 
Understanding SASE
Understanding SASE Understanding SASE
Understanding SASE
 
MITRE ATT&CK framework
MITRE ATT&CK frameworkMITRE ATT&CK framework
MITRE ATT&CK framework
 
WHY SOC Services needed?
WHY SOC Services needed?WHY SOC Services needed?
WHY SOC Services needed?
 

En vedette

Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...DataWorks Summit/Hadoop Summit
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasDataWorks Summit/Hadoop Summit
 
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & TrifactaExtend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & TrifactaDataWorks Summit/Hadoop Summit
 
Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Hortonworks
 
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...Artem Ervits
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache RangerDataWorks Summit
 
[2016 데이터 그랜드 컨퍼런스] 2 3(빅데이터). 엑셈 빅데이터 적용 사례 및 플랫폼 구현
[2016 데이터 그랜드 컨퍼런스] 2 3(빅데이터). 엑셈 빅데이터 적용 사례 및 플랫폼 구현[2016 데이터 그랜드 컨퍼런스] 2 3(빅데이터). 엑셈 빅데이터 적용 사례 및 플랫폼 구현
[2016 데이터 그랜드 컨퍼런스] 2 3(빅데이터). 엑셈 빅데이터 적용 사례 및 플랫폼 구현K data
 
Are you paying attention
Are you paying attentionAre you paying attention
Are you paying attentionHiba Hamdan
 
Findability Day 2016 - What is GDPR?
Findability Day 2016 - What is GDPR?Findability Day 2016 - What is GDPR?
Findability Day 2016 - What is GDPR?Findwise
 
DLAB company info and big data case studies
DLAB company info and big data case studiesDLAB company info and big data case studies
DLAB company info and big data case studiesDLAB
 
Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개Seungdon Choi
 
Scale-Out Resource Management at Microsoft using Apache YARN
Scale-Out Resource Management at Microsoft using Apache YARNScale-Out Resource Management at Microsoft using Apache YARN
Scale-Out Resource Management at Microsoft using Apache YARNDataWorks Summit/Hadoop Summit
 
오픈소스 프로젝트 따라잡기_공개
오픈소스 프로젝트 따라잡기_공개오픈소스 프로젝트 따라잡기_공개
오픈소스 프로젝트 따라잡기_공개Hyoungjun Kim
 

En vedette (20)

Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache Atlas
 
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & TrifactaExtend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
 
Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015
 
Modernise your EDW - Data Lake
Modernise your EDW - Data LakeModernise your EDW - Data Lake
Modernise your EDW - Data Lake
 
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
[2016 데이터 그랜드 컨퍼런스] 2 3(빅데이터). 엑셈 빅데이터 적용 사례 및 플랫폼 구현
[2016 데이터 그랜드 컨퍼런스] 2 3(빅데이터). 엑셈 빅데이터 적용 사례 및 플랫폼 구현[2016 데이터 그랜드 컨퍼런스] 2 3(빅데이터). 엑셈 빅데이터 적용 사례 및 플랫폼 구현
[2016 데이터 그랜드 컨퍼런스] 2 3(빅데이터). 엑셈 빅데이터 적용 사례 및 플랫폼 구현
 
Are you paying attention
Are you paying attentionAre you paying attention
Are you paying attention
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Findability Day 2016 - What is GDPR?
Findability Day 2016 - What is GDPR?Findability Day 2016 - What is GDPR?
Findability Day 2016 - What is GDPR?
 
Why is my Hadoop* job slow?
Why is my Hadoop* job slow?Why is my Hadoop* job slow?
Why is my Hadoop* job slow?
 
Big Data at your Desk with KNIME
Big Data at your Desk with KNIMEBig Data at your Desk with KNIME
Big Data at your Desk with KNIME
 
The EDW Ecosystem
The EDW EcosystemThe EDW Ecosystem
The EDW Ecosystem
 
DLAB company info and big data case studies
DLAB company info and big data case studiesDLAB company info and big data case studies
DLAB company info and big data case studies
 
Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개
 
Sql Stream Intro
Sql Stream IntroSql Stream Intro
Sql Stream Intro
 
Scale-Out Resource Management at Microsoft using Apache YARN
Scale-Out Resource Management at Microsoft using Apache YARNScale-Out Resource Management at Microsoft using Apache YARN
Scale-Out Resource Management at Microsoft using Apache YARN
 
오픈소스 프로젝트 따라잡기_공개
오픈소스 프로젝트 따라잡기_공개오픈소스 프로젝트 따라잡기_공개
오픈소스 프로젝트 따라잡기_공개
 

Similaire à Enterprise Data Classification and Provenance with Apache Atlas

Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?DataWorks Summit/Hadoop Summit
 
Classification based security in Hadoop
Classification based security in HadoopClassification based security in Hadoop
Classification based security in HadoopMadhan Neethiraj
 
Data Governance Initiative
Data Governance InitiativeData Governance Initiative
Data Governance InitiativeDataWorks Summit
 
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...DataWorks Summit/Hadoop Summit
 
Atlas and ranger epam meetup
Atlas and ranger epam meetupAtlas and ranger epam meetup
Atlas and ranger epam meetupAlex Zeltov
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceHortonworks
 
Apache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop componentsApache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop componentsDataWorks Summit/Hadoop Summit
 
What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it DataWorks Summit/Hadoop Summit
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies DataWorks Summit/Hadoop Summit
 
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015 Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015 Seetharam Venkatesh
 
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDriving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDataWorks Summit
 
Dynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPDynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPHortonworks
 
Building a data-driven authorization framework
Building a data-driven authorization frameworkBuilding a data-driven authorization framework
Building a data-driven authorization frameworkDataWorks Summit
 
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...DataWorks Summit
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017alanfgates
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseDataWorks Summit
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...DataWorks Summit
 
Managing enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemManaging enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemDataWorks Summit
 

Similaire à Enterprise Data Classification and Provenance with Apache Atlas (20)

HDP Next: Governance
HDP Next: GovernanceHDP Next: Governance
HDP Next: Governance
 
Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?
 
Classification based security in Hadoop
Classification based security in HadoopClassification based security in Hadoop
Classification based security in Hadoop
 
Data Governance Initiative
Data Governance InitiativeData Governance Initiative
Data Governance Initiative
 
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
 
Apache Atlas: Governance for your Data
Apache Atlas: Governance for your DataApache Atlas: Governance for your Data
Apache Atlas: Governance for your Data
 
Atlas and ranger epam meetup
Atlas and ranger epam meetupAtlas and ranger epam meetup
Atlas and ranger epam meetup
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data Governance
 
Apache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop componentsApache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop components
 
What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
 
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015 Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
 
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDriving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
 
Dynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPDynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDP
 
Building a data-driven authorization framework
Building a data-driven authorization frameworkBuilding a data-driven authorization framework
Building a data-driven authorization framework
 
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...
 
Managing enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemManaging enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystem
 

Plus de DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 

Plus de DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Dernier

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 

Dernier (20)

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 

Enterprise Data Classification and Provenance with Apache Atlas

  • 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Enterprise Data Classification and Provenance Apache Atlas Shwetha Shivalingamurthy Suma Shivaprasad
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Disclaimer This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed. Project capabilities are based on information that is publicly available within the Apache Software Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from inception to release through Apache, however, technical feasibility, market demand, user feedback and the overarching Apache Software Foundation community development process can all effect timing and final delivery. This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product. Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. Since this document contains an outline of general product development plans, customers should not rely upon it when making purchasing decisions.
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda • Demo • Big Data Governance • Overview of Atlas • Atlas architecture • Features and Roadmap
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo usecase – Ad network • Matches advertiser demand with ad space supply from publishers • Billing based on ad impressions/ad engagement • Enables targeting, tracking and reporting of ad impressions • Typical reports/queries: • Mismatch of demand and supply • Country/os wise reports • Top advertisers/publishers
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data landscape Traditional warehouse Ad servers User Ad Impression, Click, Billing logs Metadata Summaries
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data governance requirements • Cross platform lineage – impact analysis, forensic, discovery • Asset search • Common Business Terms • Compliance
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo • Technical and business metadata • Cross Component Lineage • Creating views • Create tags • Entity deletes • Search using tags, attributes • Entity audit • Business catalog – find assets • Flexible model, external lineage ingest HDP 2.5
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data Governance Data Discovery and Tagging Metadata Management Data Lineage/Prov enance Access Management Data Security & PrivacyData Quality Compliance and Audit Data Wrangling Data Lifecycle Management Data integration Data Governance Aspects Data governance refers to processes, methods and tools used in an enterprise for effective control of availability, usability, integrity, and security of data
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Enterprise Data Governance: Apache Atlas Data Management along the entire data lifecycle with integrated provenance and lineage capability • Cross component lineage Modeling with Metadata enables comprehensive business metadata vocabulary with enhanced tagging and attribute capabilities • Common Business Language • Hierarchically organized – No dupes ! Interoperable Solutions across the Hadoop ecosystem, through a common metadata store • Combine and Exchange Metadata STRUCTURED TRADITIONAL RDBMS METADATA MPP APPLIANCES Kafka Storm Sqoop Hive ATLAS METADATA Falcon RANGER Custom Partners
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Background: DGI Community becomes Apache Atlas May 2015 Apache Atlas Incubation DGI group Kickoff Dec 2014 Aug 2016 HDP 2.5/ Apache 0.7 Release Global Financial Company * DGI: Data Governance Initiative Key Benefits: • Co-Dev = Built for real customer use cases • Faster & Safer = Customers know business + HWX knows Hadoop • Code contributors - Hortonworks, IBM, Aetna , Merck, Target Jul 2015 HDP 2.3/ Apache 0.5 Foundation Release
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Architecture
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Architecture
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Atlas Type System • Defines model – schema of metadata • Flexible and powerful to define any model/custom types • Supports inheritance • Types • Primitive types – bool, integer types, string, date, enum • Collections - array, map • Struct – set of attributes • Class – Identifiable struct, hierarchy • Trait – set of attributes, hierarchy
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hive Model DataSet metaType: ClassType name: String required hive_db metaType: ClassType name: string required createTime: date required parameters: map<string,string> optional hive_table metaType: ClassType db: hive_ db required createTime: date required columns: array<hive_column> required hive_column metaType: ClassType name: string required type: string required extends references references 0..n
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Entities Instances of types Name: rawlogs Guid: 1 createTime: 2015-01-01 10:00 Type: hive_db name: impressions Guid: 2 Type: hive_table name: adv_id type: string Guid: 3 Type: hive_column name: user_id type: string Guid: 4 Type: hive_column db column column EXPIRES_ON Time: March, 2016 PII trait trait
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Graph Engine • Graph Database • Titan with storage backed by HBase • Types and Entities are translated to the Graph Model • Classes, Structs and Traits map to a vertex • Relationships are mapped as edges • Rich relationships between metadata objects • Indexing and Search • Indexing based on type annotations • External indexing – Titan backed by Solr
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Titan property graph model Graph Search with Gremlin saturn = g.V.has('name','saturn').next() hercules = saturn.as(‘x’).in(‘father’).loop(‘x’) { it.loops > 3}.next() hercules.outE(‘battled’).has(‘time’, T.gt, 1).inV.name cerberus  hydra
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Search Find Relevant Assets based on their attributes , associations with business terms DSL with sql like syntax based on type system from $type is $trait where $clause select|has $attributes, repeat Examples  Select columns from a hive_table where its name is “impressions” and db name is “raw” hive_column where table.name=”impressions", table.db.name = ‘raw’  Select all columns from hive tables which are tagged as “PII” hive_column is ‘PII’ Full text search ‘(rawlogs) AND hive’ ‘(rawlogs OR supply*) AND hive’
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Features and Roadmap
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Atlas Component Integration & Lineage • Cross- component dataset lineage. Centralized location for all metadata inside HDP • Single Interface point for Metadata Exchange with platforms outside of HDP Apache Atlas Hive Ranger Falcon Sqoop Storm Kafka Spark NiFi HBase Partner Custom HDP 2.3 HDP 2.5 Beyond HDP 2.5 HDP 2.5 External
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Business Catalog for Ease of Use  Organize data assets along business terms – Authoritative: Hierarchical Taxonomy Creation – Agile modeling: Model Conceptual, Logical, Physical assets – Definition and assignment of tags like PII (Personally Identifiable Information)  Comprehensive features for compliance – Multiple user profiles including Data Steward and Business Analysts – Object auditing to track “Who did it” – Metadata Versioning to track ”what did they do”  Faster Insight: ( Roadmap ) – Data Quality tab for profiling and sampling – User Comments Key Benefits: Organize data assets along business terms Compliance Features: Faster Insight
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Ranger: Introduction Centralized authorization and auditing across Hadoop components • HDFS, Hive, HBase, Knox, Strom, YARN, Kafka, Solr, .. • Audit logs to: Solr, HDFS, RDBMS, Log4j, .. Resource based security • Policies for specific set of resources • Requires revision of policies as resources get added/moved Classification based security • Policies for classifications and not for specific resources • A single policy protects resources in multiple components • As classification for resources change, appropriate policies would automatically be applied • Enables separation of duties: resource-classification and security policies
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Scalable Access Control – Reusable Tag Policy User group • AD • Linux Resources: • Files • Tables • Topologies Atlas Tag • PII ANY asset PII • Files • Tables • Topologies Single Admin Group Assigns Many Stewards Tag + Single point of enforcement and audit All future tagging is covered by existing policy Not Scalable Scalable
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Open: Governance Ready Certification Program Choice: Customers choose features that they want to deploy—a la carte versus vendor lock Curated & Fast: Selected group of vendor partners to provide rich, complimentary and complete features ready to deploy Agile: Low switching costs, Faster deployment and innovation Centralized : Common SLA & common open metadata store Flexibility: Interoperability of products through Atlas metadata Safe: HDP at core to provide stability and interoperability Completed: • Waterline • Dataguise • Attivio • Trifacta Pending: • Collibra • Alation • Meta Integration (Miti) • Paxata • Syncsort • Talend
  • 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Roadmap… • MultiTenancy • Titan 1.x Migration • Hive Column Level Lineage
  • 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Summary • Designed for Hadoop at platform, not application level • High Confidence data in Hadoop for regulated verticals • Compliance and business objectives aligned to data organization • Faster discovery for analysts – reduce time to value • Agile and adaptable – ensures information is current by native connectors • Dynamic protection with Ranger in simple audited policies
  • 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Learn More: • Apache Incubator link http://atlas.incubator.apache.org/ • Hortonworks links: http://hortonworks.com/solutions/security-and- governance/ • https://community.hortonworks.com/spaces/64/governance-lifecycle- track.html?topics=Atlas&type=question • Atlas Technical User Guide - http://atlas.incubator.apache.org/AtlasTechnicalUserGuide.pdf
  • 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Questions
  • 29. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Backup
  • 30. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dynamic Access Policy Apache Ranger + Atlas Integration
  • 31. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved How does Atlas work with Ranger at scale? Atlas provides: Metadata • Business Classification (taxonomy): Company > HR > Driver • Hierarchy with Inheritance of attribute to child objects: Sensitive “PII” tag of department HR will be inherited by group HR> Driver • Atlas will notify Ranger via Kafka Topic for changes Apache Atlas Hive Ranger Falcon Kafka Storm Atlas provides the metadata tag to create policies Ranger provides: Access & Entitlements • Ranger will cache tags and asset mapping for performance • Ranger will have a policy based on tags instead of roles. • Example: PII = <group> This can work for a may assets.
  • 32. 32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Automatic update of policies – active protection Metastore • Tags • Assets • Entities Notification Framework Kafka Topics Atlas Atlas Client • Subscribes to Topic • Gets Metadata Updates PDP Resource Cache Ranger Notification Metadata updates Message durability Optimized for Speed Event driven updates
  • 33. 33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Ranger: Authorization and Auditing HBase Ranger Administration Portal HDFS Hive Server2 Ranger Audit StoreRanger Policy Store Ranger Plugin Hadoop Components Enterprise Users Log4j Knox Storm YARN Kafka Solr HDFS Solr Ranger Plugin Ranger Plugin Ranger Plugin Ranger Plugin Ranger Plugin Ranger Plugin Ranger Plugin RDBMS
  • 34. 34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Big Data Governance Current Landscape • Opaque Data and in variety of data stores – HDFS, S3, Data warehouses • Schema is hardly sufficient – Hive Metastore, Avro, Data Warehouse • Platform tools like Ranger and Falcon solve parts of the problem Need for Data governance Organizations need data governance to understand its information to answer questions such as: • What do we know about our information? • Where did this data come from and how’s it being used? • Does this data adhere to company policies and rules? • Need for effective control and consumption of data Atlas helps customers discover information about data objects, their meaning, location, characteristics, and usage.
  • 35. 35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Business Taxonomy Business Taxonomy (Catalog) The practice and science of classification of things or concepts, including the principles that underlie such classification. The business organization model is hierarchical making authoritative with no duplication. Tags: Traits vs. Labels vs. Business Taxonomy Atlas has Tags that are authorative and prevent duplication. Tag can span different parts of the business taxonomy. A tag PII can be used in HR as well Finance or Sales. Benefits: A view of data assets organized by business language Compliance, Acceptable use – Dynamic Metadata based access control Common taxonomy through Hadoop components
  • 36. 36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Principle Roles & Activities in an Enterprise • Data Steward – Curator, responsible for data classification – associate business taxonomy and tagging, access policies • Data Scientist – Analyst, primary consumer of Business Taxonomy • Administrator/Operations – Role management, Data lifecycle management (Archival, retention) • Data Engineer – Data ingress and egress, semantic data quality • 50% - 80%+ Time spend looking for data • Profit Center • Primary User of Atlas • Enables Scientist Goal: < 25% spent on finding data = Empowering scientist to spend their time uncovering insights -- faster
  • 37. 37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data Governance Usecases : Impact analysis  HortonAdNetwork – A large size Ad network which has an international footprint with multiple publishers and advertisers across several countries  Complex ETL jobs and data pipelines processing real-time ad network data from several different sources and various data processing platforms  No easy way to determine the root cause when something is off charts  Data analysts need effective data provenance tools for Impact/Root cause anaylsis  Cross component lineage is a must  Data Lineage (Provenance) Data lineage is defined as a data life cycle that includes the data's origins and where it moves over time. It describes what happens to data as it goes through diverse processes. It helps provide visibility into the analytics pipeline and simplifies tracing errors back to their sources
  • 38. 38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data Governance Usecases - Compliance  HortoniaBank – mid size bank expanding from US to international markets  2 Customer Tables owned by BH: 50K customer records each with 38 fields (PII, PHI, PCI & non-sensitive data) – us_customers: USA person data only – ww_customers: multi-language, multi-country, localized person data  1 data set of prospects leased from a data broker – tax_2010: Data lease expired already!
  • 39. 39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved User Group Access Privileges joe_analyst us_employee US Data Only, non-sensitive data only, rest forbidden depending on sensitivity kate_hr us_hr US Data Only, All sensitive data (PCI, PII, PHI) Tag Based Policies  US HR team members can see all original data (PCI, PII,….)  Analysts are prohibited from viewing PII data in any of the tables  Anyone except operations/Admin are prohibited to access tax_2010 after the specified date - Expires_on policy turns off access on the configured expiry date
  • 40. 40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Sqoop Teradata Connector Apache Kafka Expanded Native Connector: Dataset Lineage Custom Activity Reporter Metadata Repository RDBMS Any process using Sqoop is covered No other tool tracks IOT of the box

Notes de l'éditeur

  1. Inventory, publisher, site, supply Advertiser, demand,
  2. Is the product was well understood? Is the product something they would use? Where is the value?
  3. 9
  4. How fast ? 7 months !
  5. Show – clearly identify customer metadata. Change Add customer classification example – Aetna – make the use case story have continuity. Use DX procedures to diagnosis ** bring meta from external systems into hadoop – keep it together
  6. Colibra  —   business process workflow + mapping to regulation in various countries and standards Alation  -  Socializing of analytics - sql / traditional edw based Meta Integration (Miti) Paxata  - wrangling Syncsort  - ETL - specializing traditional system and Mainframe Talend - ETL, metadata management Attivio  — ingestion  / discovery
  7. Show – clearly identify customer metadata. Change Add customer classification example – Aetna – make the use case story have continuity. Use DX procedures to diagnosis ** bring meta from external systems into hadoop – keep it together
  8. Is the product was well understood? Is the product something they would use? Where is the value?
  9. Make sure Audits are demod for policy denials and acceptances
  10. Show – clearly identify customer metadata. Change Add customer classification example – Aetna – make the use case story have continuity. Use DX procedures to diagnosis ** bring meta from external systems into hadoop – keep it together