Contenu connexe Similaire à Data Governance - Atlas 7.12.2015 (20) Data Governance - Atlas 7.12.20151. Page 1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Atlas: Data Governance
July 2015
Partner Solutions
2. Page 2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Agenda
Overview
• Enterprise Goals
• Data Governance
Initative
Demo
• Example: Sqoop
• Walk through step
• Search Tables / Tags
Atlas
• Feature tour
• Roadmap
• UI Tour
3. Page 3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Enterprise Data Governance Goals
GOAL: Provide a common approach to
data governance across all systems
and data within the organization
• Transparent
Governance standards & protocols must be
clearly defined and available to all
• Reproducible
Recreate the relevant data landscape at a
point in time
• Auditable
All relevant events and assets but be
traceable with appropriate historical lineage
• Consistent
Compliance practices must be consistent
ETL/DQ
BPM
Business
Analytics
Visualization
& Dashboards
ERP
CRM
SCM
MDM
ARCHIVE
Governance
Framework
4. Page 4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Data Governance Initiative for Hadoop
ETL/DQ
BPM
Business
Analytics
Visualization
& Dashboards
ERP
CRM
SCM
MDM
ARCHIVE
Data Governance Initiative
Common
Governance
Framework
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
° °
° °
°
°
ApachePig
ApacheHive
ApacheHBase
ApacheAccumulo
ApacheSolr
ApacheSpark
ApacheStorm
TWO Requirements
1. Hadoop must snap in to
the existing frameworks
and be a good citizen
2. Hadoop must also provide
governance within its own
stack of technologies
A group of companies dedicated to meeting
these requirements in the open
Major
Bank
5. Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Atlas Overview
We Do Hadoop
6. Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Atlas Vision
Metadata Services
• Flexible Knowledge Store
• Business Catalog / Operational Data
• Search & Proscriptive Lineage
• Centralized location for all metadata within HDP
• Interface point for Metadata Exchange with platforms
outside of HDP.
Metadata will enrich every component
• Hive – Complete lineage, every HiveQL tracked
• Ranger – Tag or Attribute security ABAC
• Falcon – Business Taxonomy
Apache Atlas
Hive
Ranger
Falcon
Kafka
Storm
7. Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Atlas Capabilities: Overview
Data Classification
• Import or define taxonomy business-oriented annotations for data
• Define, annotate, and automate capture of relationships between data sets and underlying
elements including source, target, and derivation processes
• Export metadata to third-party systems
Centralized Auditing
• Capture security access information for every application, process, and interaction with data
• Capture the operational information for execution, steps, and activities
Search & Lineage (Browse)
• Pre-defined navigation paths to explore the data classification and audit information
• Text-based search features locates relevant data and audit event across Data Lake quickly
and accurately
• Browse visualization of data set lineage allowing users to drill-down into operational, security,
and provenance related information
Security & Policy Engine
• Rationalize compliance policy at runtime based on data classification schemes
• Advanced definition of policies for preventing data derivation based on classification (i.e. re-
identification)
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Policy RulesTaxonomies
Policy Engine
Data Lifecycle
Management
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Custom
CWM
Retail
PCI
PII
Other
8. Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Load Wrapper
Sample Use Case: ETL Offload
RDMS
Business
Catalog
Metadata
Hive:
Landing
Hive:
CTAS
Traditional
EDW
New ETL
Hadoop
Atlas
Sqoop
Reporter
via REST
API
9. Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hive Integration
Apache Atlas
Hive Bridge
(Client)
Hive Hook
(Post-execution)
REST API
10. Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Governance Ready Certification Program
Curated group of vendor partners to provide
rich & complete features
Customers choose features that they want to
deploy – a la carte.
Low switching costs !
HDP at core to provide stability and
interoperability
Discovery
Tagging
Prep /
Cleanse
ETL
Governance
BPM
Self
Service
Visual-
ization
11. Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
© Hortonworks Inc. 2012
• ASF MVP (May) – Preview Core Metadata Services: Type
system, API’s, basic UI, Hive connecter
• HDP 2.3 (July) - GA Core Metadata Services. Preview
Metadata Business Glossary
• M10 – (Sept) – Preview ABAC with Ranger integration and
Preview Sqoop component connector
• M20 – Preview Kafka, Storm connectors, Gov Ready
Certification program, Preview row level & Column masking.
• HDP 2.4 (Q4’15) GA all preview features
11
High Level Roadmap
12. Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Architecture
13. Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
High Level Architecture
Type System
Repository
Search DSL
Bridge
Hive Storm OthersSqoop
REST API
Titan / HBase
Solr/Elastic
14. Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
© Hortonworks Inc. 2012
Technology Stack
• Knowledge Store
o Titan Graph DB
• Pluggable Search Backend
o Elastic search
o Solr
• Rules Engine
o TBD
• Audit Store
o YARN ATS - Time series DB
• Java 1.7
• Dashboard
o TBD
15. Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
© Hortonworks Inc. 2012
Admin
GET: /admin/stack
GET: /admin/version
Entity
GET: /entities/definition/{guid}
POST: /entities/submit/{typeName}
GET: /entities/list/{entityType}
Metadata Discovery
GET: /discovery/search/gremlin/{gremlinQuery}
GET: /discovery/search/relationships/{guid}
GET: /discovery/search/fullText?text=<query>
GET: /discovery/getIndexedFields
Rexster
GET: /graph/vertices/{id}
GET: /graph/vertices/properties/{id}
GET: /graph/vertices
GET: /graph/vertices/{id}/{direction}
GET: /graph/edges/{id}
Types
POST: /types/submit/{typeName}
GET: /types/definition/{typeName}
GET: /types/list
Hive Lineage
GET: /bridge/hive/{id}
GET: /bridge/hive
POST: /bridge/hive
15
APIs: Examples
16. Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Type System – Overview of Types
• Class
• Struct
• Trait
• Primitives
• Collections
• Map
• Array
• Instances (Entity)
• Referenceable
17. Page 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Type System – Data Types
18. Page 18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
_class("Column") {!
"name" ~ (string, required)!
"dataType" ~ (string, required)!
"sd" ~ ("StorageDesc", required)!
}!
!
_class("Table", List()) {!
"name" ~ (string, required, indexed)!
"db" ~ ("DB", required)!
"sd" ~ ("StorageDesc", required)!
}!
!
_trait("Dimension") {}!
_trait("PII") {}!
_trait("Metric") {}!
_trait("ETL") {}!
_trait("JdbcAccess") {}!
!
_class("DB") {!
"name" ~ (string, required,
indexed, unique)!
"owner" ~ (string)!
"createTime" ~ (int)!
}!
!
_class("StorageDesc") {!
"inputFormat" ~ (string,
required)!
"outputFormat" ~ (string,
required)!
}!
19. Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Repository
• Graph Database
• Titan with storage backed by HBase
• Types and instances are mapped to the Graph DB
• Classes, Structs and Traits map to a vertex
• Relationships are mapped as edges
• Search - plugin enabled
• Indexing based on type annotations
• Solr
• Elastic search
20. Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Search
• DSL with SQL Like Syntax
• from $type is $trait where $clause select|has $attributes loop $loopExpression withPath, repeat
• Examples
• from DB
• DB where name="Reporting" select name, owner
• DB has name
• DB is JdbcAccess
• Column where Column is a PII
• Table where name="sales_fact", columns
• Table where name="sales_fact", columns as column select column.name, column.dataType,
column.comment
• Full-text search
21. Page 21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Lineage
• Uses Search DSL Loop expression
• Everything results in search
• Named Queries
• inputs
• Table where (name = "sales_fact_monthly_mv") as src loop (LoadProcess->outputTable
inputTables) as dest select src.name as src_name, dest.name as dest_name withPath
• outputs
• Table where (name = "sales_fact") as src loop (LoadProcess->inputTables outputTables) as dest
select src.name as src_name, dest.name as dest_name withPath
• schema
• Table where name="sales_fact", columns
22. Page 22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hive Integration
Apache Atlas
Hive Bridge
(Client)
Hive Hook
(Post-execution)
REST API
23. Page 23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Atlas Screens
24. Page 24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
© Hortonworks Inc. 2012
24
25. Page 25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
26. Page 26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
27. Page 27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
28. Page 28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
29. Page 29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Demo Atlas
30. Page 30 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Atlas UI demostration
Search DSL
• Type – DB, Table, Column
• Tag - PII
• Keyword
Results
• Details
• Schema
• Lineage
Coming Features
31. Page 31 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Ingestion Demo Objective
• Show Lineage with Sqoop Ingestion of data
• Custom process instrumention
• Use the Hive Hook CTAS Operation
• Atlas Follow Lineage
• Metadata Model in Atlas
• The Open Framework
• Create Custom Types
• Create Custom Process
• Sample Codes
32. Page 32 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Setup
• Source System
• MySQL Database
• DRIVERS
• TIMESHEET
• Destination System
• Single Node HDP 2.3 (Tech Preview)
• Apache Atlas
33. Page 33 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Steps to Create Metadata
• Create a Atlas Client Instance
• Create Type Definitions
– Class Types
– Attributes
– List the Types
• Instantiate Entities
• - Create Entities (Class Type)
• - Search the Types
• Create Process
• Create DataSet Type
• Create Process Type
• Connect a Process Lineage
34. Page 34 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Attribute Definition
• Name
• Data Type
• Multiplicity
• Composite
• isIndexable
• ReverseAttribute
35. Page 35 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Questions and Answers
36. Page 36 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
© Hortonworks Inc. 2012
• HDP 2.3 Preview Sandbox VM:
– http://hortonworks.com/hdp/whats-new/
• Apache Atlas:
– http://atlas.incubator.apache.org/
– http://incubator.apache.org/projects/atlas.html
– https://git-wip-us.apache.org/repos/asf/incubator-atlas.git
• Partner Workshops
– http://hortonworks.com/partners/learn/
• More to come with official GA release of HDP 2.3
36
Atlas Resources
37. Page 37 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Thank you !