SlideShare une entreprise Scribd logo
1  sur  37
Télécharger pour lire hors ligne
Page 1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Atlas: Data Governance
July 2015
Partner Solutions
Page 2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Agenda
Overview
•  Enterprise Goals
•  Data Governance
Initative
Demo
•  Example: Sqoop
•  Walk through step
•  Search Tables / Tags
Atlas
•  Feature tour
•  Roadmap
•  UI Tour
Page 3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Enterprise Data Governance Goals
GOAL: Provide a common approach to
data governance across all systems
and data within the organization
•  Transparent
Governance standards & protocols must be
clearly defined and available to all
•  Reproducible
Recreate the relevant data landscape at a
point in time
•  Auditable
All relevant events and assets but be
traceable with appropriate historical lineage
•  Consistent
Compliance practices must be consistent
ETL/DQ
BPM
Business
Analytics
Visualization
& Dashboards
ERP
CRM
SCM
MDM
ARCHIVE
Governance
Framework
Page 4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Data Governance Initiative for Hadoop
ETL/DQ
BPM
Business
Analytics
Visualization
& Dashboards
ERP
CRM
SCM
MDM
ARCHIVE
Data Governance Initiative
Common
Governance
Framework
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
° °
° °
°
°
ApachePig
ApacheHive
ApacheHBase
ApacheAccumulo
ApacheSolr
ApacheSpark
ApacheStorm
TWO Requirements
1.  Hadoop must snap in to
the existing frameworks
and be a good citizen
2.  Hadoop must also provide
governance within its own
stack of technologies
A group of companies dedicated to meeting
these requirements in the open
Major
Bank
Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Atlas Overview
We Do Hadoop
Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Atlas Vision
Metadata Services
•  Flexible Knowledge Store
•  Business Catalog / Operational Data
•  Search & Proscriptive Lineage
•  Centralized location for all metadata within HDP
•  Interface point for Metadata Exchange with platforms
outside of HDP.
Metadata will enrich every component
•  Hive – Complete lineage, every HiveQL tracked
•  Ranger – Tag or Attribute security ABAC
•  Falcon – Business Taxonomy
Apache Atlas
Hive
Ranger
Falcon
Kafka
Storm
Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Atlas Capabilities: Overview
Data Classification
•  Import or define taxonomy business-oriented annotations for data
•  Define, annotate, and automate capture of relationships between data sets and underlying
elements including source, target, and derivation processes
•  Export metadata to third-party systems
Centralized Auditing
•  Capture security access information for every application, process, and interaction with data
•  Capture the operational information for execution, steps, and activities
Search & Lineage (Browse)
•  Pre-defined navigation paths to explore the data classification and audit information
•  Text-based search features locates relevant data and audit event across Data Lake quickly
and accurately
•  Browse visualization of data set lineage allowing users to drill-down into operational, security,
and provenance related information
Security & Policy Engine
•  Rationalize compliance policy at runtime based on data classification schemes
•  Advanced definition of policies for preventing data derivation based on classification (i.e. re-
identification)
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Policy RulesTaxonomies
Policy Engine
Data Lifecycle
Management
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Custom
CWM
Retail
PCI
PII
Other
Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Load Wrapper
Sample Use Case: ETL Offload
RDMS
Business
Catalog
Metadata
Hive:
Landing
Hive:
CTAS
Traditional
EDW
New ETL
Hadoop
Atlas
Sqoop
Reporter
via REST
API
Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hive Integration
Apache Atlas
Hive Bridge
(Client)
Hive Hook
(Post-execution)
REST API
Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Governance Ready Certification Program
Curated group of vendor partners to provide
rich & complete features
Customers choose features that they want to
deploy – a la carte.
Low switching costs !
HDP at core to provide stability and
interoperability
Discovery
Tagging
Prep /
Cleanse
ETL
Governance
BPM
Self
Service
Visual-
ization
Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
© Hortonworks Inc. 2012
•  ASF MVP (May) – Preview Core Metadata Services: Type
system, API’s, basic UI, Hive connecter
•  HDP 2.3 (July) - GA Core Metadata Services. Preview
Metadata Business Glossary
•  M10 – (Sept) – Preview ABAC with Ranger integration and
Preview Sqoop component connector
•  M20 – Preview Kafka, Storm connectors, Gov Ready
Certification program, Preview row level & Column masking.
•  HDP 2.4 (Q4’15) GA all preview features
11
High Level Roadmap
Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Architecture
Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
High Level Architecture
Type System
Repository
Search DSL
Bridge
Hive Storm OthersSqoop
REST API
Titan / HBase
Solr/Elastic
Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
© Hortonworks Inc. 2012
Technology Stack
•  Knowledge Store
o  Titan Graph DB
•  Pluggable Search Backend
o  Elastic search
o  Solr
•  Rules Engine
o  TBD
•  Audit Store
o  YARN ATS - Time series DB
•  Java 1.7
•  Dashboard
o  TBD
Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
© Hortonworks Inc. 2012
Admin
GET: /admin/stack
GET: /admin/version
Entity
GET: /entities/definition/{guid}
POST: /entities/submit/{typeName}
GET: /entities/list/{entityType}
Metadata Discovery
GET: /discovery/search/gremlin/{gremlinQuery}
GET: /discovery/search/relationships/{guid}
GET: /discovery/search/fullText?text=<query>
GET: /discovery/getIndexedFields
Rexster
GET: /graph/vertices/{id}
GET: /graph/vertices/properties/{id}
GET: /graph/vertices
GET: /graph/vertices/{id}/{direction}
GET: /graph/edges/{id}
Types
POST: /types/submit/{typeName}
GET: /types/definition/{typeName}
GET: /types/list
Hive Lineage
GET: /bridge/hive/{id}
GET: /bridge/hive
POST: /bridge/hive
15
APIs: Examples
Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Type System – Overview of Types
•  Class
•  Struct
•  Trait
•  Primitives
•  Collections
•  Map
•  Array
•  Instances (Entity)
•  Referenceable
Page 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Type System – Data Types
Page 18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
_class("Column") {!
"name" ~ (string, required)!
"dataType" ~ (string, required)!
"sd" ~ ("StorageDesc", required)!
}!
!
_class("Table", List()) {!
"name" ~ (string, required, indexed)!
"db" ~ ("DB", required)!
"sd" ~ ("StorageDesc", required)!
}!
!
	
  
_trait("Dimension") {}!
_trait("PII") {}!
_trait("Metric") {}!
_trait("ETL") {}!
_trait("JdbcAccess") {}!
!
_class("DB") {!
"name" ~ (string, required,
indexed, unique)!
"owner" ~ (string)!
"createTime" ~ (int)!
}!
!
_class("StorageDesc") {!
"inputFormat" ~ (string,
required)!
"outputFormat" ~ (string,
required)!
}!
Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Repository
•  Graph Database
•  Titan with storage backed by HBase
•  Types and instances are mapped to the Graph DB
•  Classes, Structs and Traits map to a vertex
•  Relationships are mapped as edges
•  Search - plugin enabled
•  Indexing based on type annotations
•  Solr
•  Elastic search
Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Search
•  DSL with SQL Like Syntax
•  from $type is $trait where $clause select|has $attributes loop $loopExpression withPath, repeat
•  Examples
•  from DB
•  DB where name="Reporting" select name, owner
•  DB has name
•  DB is JdbcAccess
•  Column where Column is a PII
•  Table where name="sales_fact", columns
•  Table where name="sales_fact", columns as column select column.name, column.dataType,
column.comment
•  Full-text search
Page 21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Lineage
•  Uses Search DSL Loop expression
•  Everything results in search
•  Named Queries
•  inputs
•  Table where (name = "sales_fact_monthly_mv") as src loop (LoadProcess->outputTable
inputTables) as dest select src.name as src_name, dest.name as dest_name withPath
•  outputs
•  Table where (name = "sales_fact") as src loop (LoadProcess->inputTables outputTables) as dest
select src.name as src_name, dest.name as dest_name withPath
•  schema
•  Table where name="sales_fact", columns
Page 22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hive Integration
Apache Atlas
Hive Bridge
(Client)
Hive Hook
(Post-execution)
REST API
Page 23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Atlas Screens
Page 24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
© Hortonworks Inc. 2012
24
Page 25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Page 26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Page 27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Page 28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Page 29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Demo Atlas
Page 30 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Atlas UI demostration
Search DSL
•  Type – DB, Table, Column
•  Tag - PII
•  Keyword
Results
•  Details
•  Schema
•  Lineage
Coming Features
Page 31 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Ingestion Demo Objective
•  Show Lineage with Sqoop Ingestion of data
•  Custom process instrumention
•  Use the Hive Hook CTAS Operation
•  Atlas Follow Lineage
•  Metadata Model in Atlas
•  The Open Framework
•  Create Custom Types
•  Create Custom Process
•  Sample Codes
Page 32 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Setup
•  Source System
•  MySQL Database
•  DRIVERS
•  TIMESHEET
•  Destination System
•  Single Node HDP 2.3 (Tech Preview)
•  Apache Atlas
Page 33 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Steps to Create Metadata
•  Create a Atlas Client Instance
•  Create Type Definitions
–  Class Types
–  Attributes
–  List the Types
•  Instantiate Entities
•  - Create Entities (Class Type)
•  - Search the Types
•  Create Process
•  Create DataSet Type
•  Create Process Type
•  Connect a Process Lineage
Page 34 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Attribute Definition
•  Name
•  Data Type
•  Multiplicity
•  Composite
•  isIndexable
•  ReverseAttribute
Page 35 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Questions and Answers
Page 36 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
© Hortonworks Inc. 2012
•  HDP 2.3 Preview Sandbox VM:
–  http://hortonworks.com/hdp/whats-new/
•  Apache Atlas:
–  http://atlas.incubator.apache.org/
–  http://incubator.apache.org/projects/atlas.html
–  https://git-wip-us.apache.org/repos/asf/incubator-atlas.git
•  Partner Workshops
–  http://hortonworks.com/partners/learn/
•  More to come with official GA release of HDP 2.3
36
Atlas Resources
Page 37 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Thank you !

Contenu connexe

Tendances

GDPR Community Showcase for Apache Ranger and Apache Atlas
GDPR Community Showcase for Apache Ranger and Apache AtlasGDPR Community Showcase for Apache Ranger and Apache Atlas
GDPR Community Showcase for Apache Ranger and Apache Atlas
DataWorks Summit
 

Tendances (20)

Atlas and ranger epam meetup
Atlas and ranger epam meetupAtlas and ranger epam meetup
Atlas and ranger epam meetup
 
GDPR-focused partner community showcase for Apache Ranger and Apache Atlas
GDPR-focused partner community showcase for Apache Ranger and Apache AtlasGDPR-focused partner community showcase for Apache Ranger and Apache Atlas
GDPR-focused partner community showcase for Apache Ranger and Apache Atlas
 
Apache Atlas: Governance for your Data
Apache Atlas: Governance for your DataApache Atlas: Governance for your Data
Apache Atlas: Governance for your Data
 
Open Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache AtlasOpen Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache Atlas
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data Governance
 
Apache Atlas. Data Governance for Hadoop. Strata London 2015
Apache Atlas. Data Governance for Hadoop. Strata London 2015Apache Atlas. Data Governance for Hadoop. Strata London 2015
Apache Atlas. Data Governance for Hadoop. Strata London 2015
 
Spark and Hadoop Perfect Togeher by Arun Murthy
Spark and Hadoop Perfect Togeher by Arun MurthySpark and Hadoop Perfect Togeher by Arun Murthy
Spark and Hadoop Perfect Togeher by Arun Murthy
 
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...
 
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewThe Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture View
 
GDPR Community Showcase for Apache Ranger and Apache Atlas
GDPR Community Showcase for Apache Ranger and Apache AtlasGDPR Community Showcase for Apache Ranger and Apache Atlas
GDPR Community Showcase for Apache Ranger and Apache Atlas
 
Apache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop componentsApache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop components
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache AtlasPartner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
 
Apache Hadoop Crash Course
Apache Hadoop Crash CourseApache Hadoop Crash Course
Apache Hadoop Crash Course
 
Unleashing the power of apache atlas with apache - virtual dataconnector
Unleashing the power of apache atlas with apache  - virtual dataconnectorUnleashing the power of apache atlas with apache  - virtual dataconnector
Unleashing the power of apache atlas with apache - virtual dataconnector
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Interactive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using DruidInteractive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using Druid
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
 
Classification based security in Hadoop
Classification based security in HadoopClassification based security in Hadoop
Classification based security in Hadoop
 

Similaire à Data Governance - Atlas 7.12.2015

Similaire à Data Governance - Atlas 7.12.2015 (20)

Data Governance Initiative
Data Governance InitiativeData Governance Initiative
Data Governance Initiative
 
Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?
 
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloudMoving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloud
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
 
Tableau and hadoop
Tableau and hadoopTableau and hadoop
Tableau and hadoop
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizon
 
One Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and GovernanceOne Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and Governance
 
Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014
 
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat Alwell
 
Big Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San JoseBig Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San Jose
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizon
 
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizon
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
 

Plus de Hortonworks

Plus de Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Dernier

Dernier (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Data Governance - Atlas 7.12.2015

  • 1. Page 1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Atlas: Data Governance July 2015 Partner Solutions
  • 2. Page 2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Agenda Overview •  Enterprise Goals •  Data Governance Initative Demo •  Example: Sqoop •  Walk through step •  Search Tables / Tags Atlas •  Feature tour •  Roadmap •  UI Tour
  • 3. Page 3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Enterprise Data Governance Goals GOAL: Provide a common approach to data governance across all systems and data within the organization •  Transparent Governance standards & protocols must be clearly defined and available to all •  Reproducible Recreate the relevant data landscape at a point in time •  Auditable All relevant events and assets but be traceable with appropriate historical lineage •  Consistent Compliance practices must be consistent ETL/DQ BPM Business Analytics Visualization & Dashboards ERP CRM SCM MDM ARCHIVE Governance Framework
  • 4. Page 4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Data Governance Initiative for Hadoop ETL/DQ BPM Business Analytics Visualization & Dashboards ERP CRM SCM MDM ARCHIVE Data Governance Initiative Common Governance Framework 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ApachePig ApacheHive ApacheHBase ApacheAccumulo ApacheSolr ApacheSpark ApacheStorm TWO Requirements 1.  Hadoop must snap in to the existing frameworks and be a good citizen 2.  Hadoop must also provide governance within its own stack of technologies A group of companies dedicated to meeting these requirements in the open Major Bank
  • 5. Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Atlas Overview We Do Hadoop
  • 6. Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Atlas Vision Metadata Services •  Flexible Knowledge Store •  Business Catalog / Operational Data •  Search & Proscriptive Lineage •  Centralized location for all metadata within HDP •  Interface point for Metadata Exchange with platforms outside of HDP. Metadata will enrich every component •  Hive – Complete lineage, every HiveQL tracked •  Ranger – Tag or Attribute security ABAC •  Falcon – Business Taxonomy Apache Atlas Hive Ranger Falcon Kafka Storm
  • 7. Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Atlas Capabilities: Overview Data Classification •  Import or define taxonomy business-oriented annotations for data •  Define, annotate, and automate capture of relationships between data sets and underlying elements including source, target, and derivation processes •  Export metadata to third-party systems Centralized Auditing •  Capture security access information for every application, process, and interaction with data •  Capture the operational information for execution, steps, and activities Search & Lineage (Browse) •  Pre-defined navigation paths to explore the data classification and audit information •  Text-based search features locates relevant data and audit event across Data Lake quickly and accurately •  Browse visualization of data set lineage allowing users to drill-down into operational, security, and provenance related information Security & Policy Engine •  Rationalize compliance policy at runtime based on data classification schemes •  Advanced definition of policies for preventing data derivation based on classification (i.e. re- identification) Apache Atlas Knowledge Store Audit Store ModelsType-System Policy RulesTaxonomies Policy Engine Data Lifecycle Management Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Custom CWM Retail PCI PII Other
  • 8. Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Load Wrapper Sample Use Case: ETL Offload RDMS Business Catalog Metadata Hive: Landing Hive: CTAS Traditional EDW New ETL Hadoop Atlas Sqoop Reporter via REST API
  • 9. Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hive Integration Apache Atlas Hive Bridge (Client) Hive Hook (Post-execution) REST API
  • 10. Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Governance Ready Certification Program Curated group of vendor partners to provide rich & complete features Customers choose features that they want to deploy – a la carte. Low switching costs ! HDP at core to provide stability and interoperability Discovery Tagging Prep / Cleanse ETL Governance BPM Self Service Visual- ization
  • 11. Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved © Hortonworks Inc. 2012 •  ASF MVP (May) – Preview Core Metadata Services: Type system, API’s, basic UI, Hive connecter •  HDP 2.3 (July) - GA Core Metadata Services. Preview Metadata Business Glossary •  M10 – (Sept) – Preview ABAC with Ranger integration and Preview Sqoop component connector •  M20 – Preview Kafka, Storm connectors, Gov Ready Certification program, Preview row level & Column masking. •  HDP 2.4 (Q4’15) GA all preview features 11 High Level Roadmap
  • 12. Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Architecture
  • 13. Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved High Level Architecture Type System Repository Search DSL Bridge Hive Storm OthersSqoop REST API Titan / HBase Solr/Elastic
  • 14. Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved © Hortonworks Inc. 2012 Technology Stack •  Knowledge Store o  Titan Graph DB •  Pluggable Search Backend o  Elastic search o  Solr •  Rules Engine o  TBD •  Audit Store o  YARN ATS - Time series DB •  Java 1.7 •  Dashboard o  TBD
  • 15. Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved © Hortonworks Inc. 2012 Admin GET: /admin/stack GET: /admin/version Entity GET: /entities/definition/{guid} POST: /entities/submit/{typeName} GET: /entities/list/{entityType} Metadata Discovery GET: /discovery/search/gremlin/{gremlinQuery} GET: /discovery/search/relationships/{guid} GET: /discovery/search/fullText?text=<query> GET: /discovery/getIndexedFields Rexster GET: /graph/vertices/{id} GET: /graph/vertices/properties/{id} GET: /graph/vertices GET: /graph/vertices/{id}/{direction} GET: /graph/edges/{id} Types POST: /types/submit/{typeName} GET: /types/definition/{typeName} GET: /types/list Hive Lineage GET: /bridge/hive/{id} GET: /bridge/hive POST: /bridge/hive 15 APIs: Examples
  • 16. Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Type System – Overview of Types •  Class •  Struct •  Trait •  Primitives •  Collections •  Map •  Array •  Instances (Entity) •  Referenceable
  • 17. Page 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Type System – Data Types
  • 18. Page 18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved _class("Column") {! "name" ~ (string, required)! "dataType" ~ (string, required)! "sd" ~ ("StorageDesc", required)! }! ! _class("Table", List()) {! "name" ~ (string, required, indexed)! "db" ~ ("DB", required)! "sd" ~ ("StorageDesc", required)! }! !   _trait("Dimension") {}! _trait("PII") {}! _trait("Metric") {}! _trait("ETL") {}! _trait("JdbcAccess") {}! ! _class("DB") {! "name" ~ (string, required, indexed, unique)! "owner" ~ (string)! "createTime" ~ (int)! }! ! _class("StorageDesc") {! "inputFormat" ~ (string, required)! "outputFormat" ~ (string, required)! }!
  • 19. Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Repository •  Graph Database •  Titan with storage backed by HBase •  Types and instances are mapped to the Graph DB •  Classes, Structs and Traits map to a vertex •  Relationships are mapped as edges •  Search - plugin enabled •  Indexing based on type annotations •  Solr •  Elastic search
  • 20. Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Search •  DSL with SQL Like Syntax •  from $type is $trait where $clause select|has $attributes loop $loopExpression withPath, repeat •  Examples •  from DB •  DB where name="Reporting" select name, owner •  DB has name •  DB is JdbcAccess •  Column where Column is a PII •  Table where name="sales_fact", columns •  Table where name="sales_fact", columns as column select column.name, column.dataType, column.comment •  Full-text search
  • 21. Page 21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Lineage •  Uses Search DSL Loop expression •  Everything results in search •  Named Queries •  inputs •  Table where (name = "sales_fact_monthly_mv") as src loop (LoadProcess->outputTable inputTables) as dest select src.name as src_name, dest.name as dest_name withPath •  outputs •  Table where (name = "sales_fact") as src loop (LoadProcess->inputTables outputTables) as dest select src.name as src_name, dest.name as dest_name withPath •  schema •  Table where name="sales_fact", columns
  • 22. Page 22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hive Integration Apache Atlas Hive Bridge (Client) Hive Hook (Post-execution) REST API
  • 23. Page 23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Atlas Screens
  • 24. Page 24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved © Hortonworks Inc. 2012 24
  • 25. Page 25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
  • 26. Page 26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
  • 27. Page 27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
  • 28. Page 28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
  • 29. Page 29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Demo Atlas
  • 30. Page 30 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Atlas UI demostration Search DSL •  Type – DB, Table, Column •  Tag - PII •  Keyword Results •  Details •  Schema •  Lineage Coming Features
  • 31. Page 31 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ingestion Demo Objective •  Show Lineage with Sqoop Ingestion of data •  Custom process instrumention •  Use the Hive Hook CTAS Operation •  Atlas Follow Lineage •  Metadata Model in Atlas •  The Open Framework •  Create Custom Types •  Create Custom Process •  Sample Codes
  • 32. Page 32 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Setup •  Source System •  MySQL Database •  DRIVERS •  TIMESHEET •  Destination System •  Single Node HDP 2.3 (Tech Preview) •  Apache Atlas
  • 33. Page 33 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Steps to Create Metadata •  Create a Atlas Client Instance •  Create Type Definitions –  Class Types –  Attributes –  List the Types •  Instantiate Entities •  - Create Entities (Class Type) •  - Search the Types •  Create Process •  Create DataSet Type •  Create Process Type •  Connect a Process Lineage
  • 34. Page 34 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Attribute Definition •  Name •  Data Type •  Multiplicity •  Composite •  isIndexable •  ReverseAttribute
  • 35. Page 35 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Questions and Answers
  • 36. Page 36 © Hortonworks Inc. 2011 – 2015. All Rights Reserved © Hortonworks Inc. 2012 •  HDP 2.3 Preview Sandbox VM: –  http://hortonworks.com/hdp/whats-new/ •  Apache Atlas: –  http://atlas.incubator.apache.org/ –  http://incubator.apache.org/projects/atlas.html –  https://git-wip-us.apache.org/repos/asf/incubator-atlas.git •  Partner Workshops –  http://hortonworks.com/partners/learn/ •  More to come with official GA release of HDP 2.3 36 Atlas Resources
  • 37. Page 37 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Thank you !