SlideShare une entreprise Scribd logo
1  sur  53
Mandy Chessell CBE FREng CEng FBCS
Distinguished Engineer, Master Inventor
Analytics Chief Data Office
 mandy_chessell@uk.ibm.com
18th April 2018
Good analytics needs good data and
that needs good metadata
Apache Atlas as an open innovation platform for metadata management and governance3
Agenda
 Why is metadata so important today?
 What is the challenge?
 Building an open ecosystem
 Apache Atlas and the specifics
 ODPI Data Governance PMC
 Progress report and call to action
Apache Atlas as an open innovation platform for metadata management and governance4
Open Data
Site
The perils of reusing data …
Data Lake
Employee
Directory
Callie Quartile uses (1) open data
from the local government registrar
and (2) data from the employee
directory to (3) create a birthday
card service for the company.
Callie Quartile
Data Scientist
1
3
2
Apache Atlas as an open innovation platform for metadata management and governance5
Open Data
Site
The perils of reusing data …
Data Lake
Employee
Directory
Callie Quartile
Data Scientist
1
3
2
Happy
Birthday
But its not my
birthday
Unfortunately the obvious date in the
registrar record was the registration of
birth date not the date of birth. Date
of birth was not published in the open
data.
Callie needed better information about
the open data to realise she had the
wrong data.
Apache Atlas as an open innovation platform for metadata management and governance6
Metadata
should bring
as much
information
about the
data sets to
Callie’s data
science as is
known
collectively
by the
organization.
Employee Directory
NameBand Job Title
X
Data Set Name: Employee
Directory
X
Description:
Core attributes describing all
employees of OCO
pharmaceuticals created from a
daily extract from Kenexa.
Owner: Penny Payer
Status:
Last accessed: 6th May 2016
Records: 3488
Last Update: 1st May 2016
Contents:
Structure …
Contents …
Lineage …
XColumn:
Band
Classification Ranges:
Confidentiality: Public, Confidential,
Sensitive
Confidence: Authoritative
Retention: Indefinitely
Characteristi
cs
LineageDescription
Position reference number for non-
exempt employees. The value ranges
from 01 to 06 where 01 is the most senior
and 06 is the most junior.
Type: String
Classification: Public
Apache Atlas as an open innovation platform for metadata management and governance7
Different personas need different services
Callie Quartile
Data Scientist
Jules Keeper
Chief Data Officer
Find data
Understand data
Manage analytics models
Build data strategy
Define governance program
Monitor progress
Apache Atlas as an open innovation platform for metadata management and governance8
Different personas need different services
Faith Broker
HR and Privacy Officer
Gary Geeke
IT
Locate personal data
Ensure protection of personal data
Understand employee needs
Maintain “safe” IT Infrastructure
Build and deploy “good” APIs and services
Locate and resolve issues fast
Apache Atlas as an open innovation platform for metadata management and governance9
Different personas need different services
Tanya Tidie
Clinical Trials Administrator
Ivor Padlock
Chief Security Officer
Maintain accurate patient records
Catalog clinical trials data
Demonstrate good data management practices
Understand risks to organization
Set up protection
Monitor for suspicious activity
Apache Atlas as an open innovation platform for metadata management and governance10
Scope of metadata for a data driven organization
Glossary Collaboration
Governance
Models and
Reference Data
Metadata
Discovery
Lineage Data Assets
Base Types, Systems
and Infrastructure
Apache Atlas as an open innovation platform for metadata management and governance11
Curation
00 3809890 6 7 Lemmie Stage 818928 3082 4 New York 4 27 DataStage Expert 1 45324 300 27 Code St Harlem NY 1 3
00 3809890 3 7 Callie Quartile 328080 7432 5 New York 4 27 Data Scientist 1 56944 045 27 Code St Harlem NY 1 3
00 3809890 1 7 Tanya Tidie 209482 4051 2 New York 4 27 Data Steward 1 43800 215 27 Code St Harlem NY 1 3
I know
I wonder
what this
means
Apache Atlas as an open innovation platform for metadata management and governance12
Scared to share
Faith Broker
Business Team
00 3809890 6 7 Lemmie Stage 818928 3082 4 New York 4 27 DataStage Expert 1 45324 300 27 Code St Harlem NY 1 3
00 3809890 3 7 Callie Quartile 328080 7432 5 New York 4 27 Data Scientist 1 56944 045 27 Code St Harlem NY 1 3
00 3809890 1 7 Tanya Tidie 209482 4051 2 New York 4 27 Data Steward 1 43800 215 27 Code St Harlem NY 1 3
Faith Broker has been doing some simple analysis
on the HR data of the company. She wants to share
this data with Callie Quartile to do some detailed
work. However, she does not want Callie to see the
sensitive personal information in the record.
00 3809890 6 7 Lemmie Stage 818928 3082 4 New York 4 27 DataStage Expert 1 XXXXX XXX 27 Code St Harlem NY 1 3
00 3809890 3 7 Callie Quartile 328080 7432 5 New York 4 27 Data Scientist 1 XXXXX XXX 27 Code St Harlem NY 1 3
00 3809890 1 7 Tanya Tidie 209482 4051 2 New York 4 27 Data Steward 1 XXXXX XXX 27 Code St Harlem NY 1 3
Callie Quartile
Data Scientist
Apache Atlas as an open innovation platform for metadata management and governance13
Business
metadata
Structural
metadata for
a data store
Using glossary function for semantic processing
EMPNAME EMPNO JOBCODE SALARY
EMPLOYEE
RECORD
Employee
Work Location
Annual Salary
Job Title
Employee Id
Employee Name
Hourly Pay Rate
Manager Compensation Plan
HAS-A
HAS-A
HAS-A
HAS-A
HAS-A
HAS-A
IS-A IS-A
Sensitive
IS-A
Data
00 3809890 6 7 Lemmie Stage 818928 3082 4 New York 4 27 DataStage Expert 1 45324 300 27 Code St Harlem NY 1 3
Apache Atlas as an open innovation platform for metadata management and governance14
Why do we need metadata?
 Metadata enables data to be used outside of the application that created it.
• Analytics and decision making
• New business applications
• Reporting and compliance
 Metadata describes the format and content of data allowing people to judge which data set
to use for a new project
• Structure
• Meaning
• Origin
• Valid values and quality
• Usage and ownership
• Regulations and classifications that apply
• <more>
 Metadata describes the business context and classification of data allowing automated
governance processes to operate.
Apache Atlas as an open innovation platform for metadata management and governance15
Today’s reality
 Many data platforms do not have metadata support
 Proprietary tools support a range of data sources and governance actions
• No-one supports everything you need and assumes all tools come from their suite
• Each tool starts “empty” requiring effort to populate metadata
• Each tool operates as if it is the only tool
• No integration/interoperability of metadata repositories from different vendors
 Expensive efforts to create an enterprise data catalogue
Apache Atlas as an open innovation platform for metadata management and governance16
Today’s reality
Apache Atlas as an open innovation platform for metadata management and governance17
Manual metadata capture
Apache Atlas as an open innovation platform for metadata management and governance18
Automatic metadata capture
18
Apache Atlas as an open innovation platform for metadata management and governance19
What needs to change?
Open and
Unified Metadata
Apache Atlas as an open innovation platform for metadata management and governance20
A new manifesto for metadata and governance
 Metadata management must be automated
 Metadata management must become ubiquitous
 Metadata must become open and remotely accessible
 Metadata should be used to drive the governance of data
The discovery, maintenance and use of metadata has to be an integral part
of all tools that access, change and move information.
20
Apache Atlas as an open innovation platform for metadata management and governance21
Open metadata management ecosystem
 Peer-to-peer network of repositories
 Metadata stored and managed close
to its source
 Each repository/tool brings unique
value.
 Open, extensible metadata structures
for metadata exchange and federation
– extending coverage of the types of
resources that need to be described.
 Open source infrastructure sharing
cost of development and maintenance
between vendors
 Support for open standards where
available
Collaboration
Space Metadata
Analytics Platform
Metadata
Application
Metadata
Cloud SaaS platform
Metadata
Hadoop Platform
Metadata
Apache Atlas as an open innovation platform for metadata management and governance22
Apache Atlas
http://atlas.apache.org/
 Apache Atlas has just graduated to become a top-level project.
 It began as an incubator open source project on 5th May 2015 to deliver an
open source governance capability focused primarily on the Hadoop platform.
 Apache Atlas is designed to localize operational governance to the operating
data platform such as Hadoop.
 At its heart is a type-agnostic metadata store that can be access through restful
interfaces.
We see Apache Atlas as the reference implementation for open metadata and
governance, for vendors to pick up and use; or test their integration against.
Being open source allows all vendors to enrich/enhance standard.
Apache Atlas as an open innovation platform for metadata management and governance23
Apache Atlas today
Apache Atlas as an open innovation platform for metadata management and governance24
Updates to Apache Atlas  Automation
• Capture of metadata from data platforms,
data movement engines and data
protection engines.
• Exception management and stewardship
 Business Value
• Specialized services for key data roles
such as CDO, Data Scientist, Developer,
DevOps Operator, Asset Owner,
Applications
 Connectivity
• Metadata Highway offering open
metadata exchange, linking and
federation between heterogeneous
metadata repositories.
Apache Atlas as an open innovation platform for metadata management and governance25
Taking guidance from existing metadata standards
 Well-defined
 Complementary
 Integrating
 Decoupled
https://www.w3.org/TR/vocab-dcat/
Apache Atlas as an open innovation platform for metadata management and governance26
Instance representations in the graph
Apache Atlas as an open innovation platform for metadata management and governance27
Open metadata meta-types, types and instances
«relationship»
DataContentForDataSet
*
*
dataContent
supportedDataSets
«entity»
DataSet
createTime : date
modifiedTime : date
«entity»
DataStore
«entity»
Asset
«entity»
GlossaryTerm
«entity»
Referenceable
description : string
expression : string
status : TermAssignmentStatus
confidence : int
steward : string
source : string
«relationship»
SemanticAssignment
*
*
assignedElements
meaning
Apache Atlas as an open innovation platform for metadata management and governance28
Open metadata type model summary
Glossary Collaboration
Governance
Models and
Reference Data
Metadata
Discovery
Lineage Data Assets
4
3
1
5
2
6
7
Base Types, Systems
and Infrastructure
0
Apache Atlas as an open innovation platform for metadata management and governance29
Open metadata type model summary
Policy Metadata (Principles,
Regulations, Standards,
Approaches, Rule Specifications,
Roles and Metrics)
Governance
Actions and
Processes
Augmentation
MappingImplementation
Business Objects and
Relationships, Taxonomies
and Ontologies
Business Attributes
Organization
Teaming Metadata
(people profiles,
communities, projects,
notebooks, …)
Models and Schemas
4
3
1
5
Physical Asset Descriptions
(Data stores, APIs,
models and components)
Asset Collections
(Sets, Typed Sets, Type
Organized Sets)
Information Views
Rights
Management
Reference Data
Feedback Metadata
(tags, comments, ratings, …)
ClassificationSchemes
Classification
Strategy Subject Area Definition
Campaigns and Projects
Rollout
2
Discovery
Metadata (profile data,
technical classification, data
classification,
data quality assessment, …)
Augmentation
Instrument
Association
Information Process
Instrumentation (design lineage)
6
7
O-DEF
O-BDL
ConnectorsBasic Types, Infrastructure and Systems
Access
0
Apache Atlas as an open innovation platform for metadata management and governance30
More detail here …
https://cwiki.apache.org/confluence/display/ATLAS/Building+out+the+Open+Metadata+Typesystem
Apache Atlas as an open innovation platform for metadata management and governance31
Metadata and governance digital platform
Open Metadata
and Governance
Reporting
Platform
ETL Platform
Analytics
Platform
Virtualization
Platform
Governance
Platform
Data
Platform
Apache Atlas as an open innovation platform for metadata management and governance32
Types of tools that may integrate with an open metadata
repository
 BI and visualization tools
• locating data assets and related information about them; defining
reports and publishing their metadata; viewing lineage
 Data Science tool
• wanting to find out about data assets available and manage user
lineage of transformations and analytics models – may also manage
metadata for analytics models
 API developer tool
• wanting to understand proper data structures and data meaning to
use for APIs – plus additional governance requirements that need to
be implemented by API because of the data it exchanges.
 Counter-fraud tools
• ad hoc analysis of logs and error reports, setting up rules
 Curator/owner tool
• for managing the curation of assets, providing access, verifying use of
assets, reviewing discovery results and exceptions, approving change
requests.
 Glossary tool
• for subject matter experts and information architects to share
expertise about a particular subject area – may also define structures
and related reference data
 Enterprise architect tools
• defining the data landscape and related systems.
 DevOps tools
• conformance to polices and standards in development
• metadata capture at deployment
• validation of deployment platform requirements
 Data integration engine
• locating appropriate data and component assets, log design lineage,
log operational lineage
 Information Virtualisation tools
• locate appropriate data assets, build views and publish them, add
design lineage, log operational lineage
 Governance tools
• setting up and monitoring governance program, data quality, …
 Stewardship tools
• reviewing assigned exceptions, making data changes and requesting
approval
 Information security tools
• setting up data access policies and enforcement
 Auditor tools
• view compliance reports and validate policies and policy
implementations
Apache Atlas as an open innovation platform for metadata management and governance33
Open Metadata Access Services
Project Management
Community ProfileAsset Catalog
Stewardship Action
Information View
Governance Program
Information Process
Subject Area
Connected Asset Discovery
Governance Engine
Information Protection
Developer
Data Platform
Asset Owner
Information Landscape
Data Science
DevOps
Asset Consumer
Information
Infrastructure
Apache Atlas as an open innovation platform for metadata management and governance34
OMAS service instance
Both call API and notifications
Apache Atlas as an open innovation platform for metadata management and governance35
Inside the server
Open Metadata and Governance (OMAG) Server
Open Metadata Access Services (OMAS)
OMRS Topic
Connector
OMRS Cohort
Registry Store
Connector
OMRS Archive
Connector
OMRS
AuditLog
Connector
OMRS Event
Mapper
Connector
OMRS
Repository
Connector
Server
Configuration
OMAS REST APIs
and Topics
OMAG
Administration
REST APIs
OMRS
Repository
REST APIs
Open Metadata Repository Services (OMRS)
Apache Atlas as an open innovation platform for metadata management and governance36
Inside the server
Open Metadata and Governance (OMAG) Server
Open Metadata Access Services (OMAS)
OMRS Topic
Connector
OMRS Cohort
Registry Store
Connector
OMRS Archive
Connector
OMRS
AuditLog
Connector
OMRS Event
Mapper
Connector
OMRS
Repository
Connector
Server
Configuration
OMAS REST APIs
and Topics
OMAG
Administration
REST APIs
OMRS
Repository
REST APIs
Administration
Enterprise Repository Services
Local Repository
Services
Cohort
Services
Apache Atlas as an open innovation platform for metadata management and governance37
Integration patterns
https://cwiki.apache.org/confluence/display/ATLAS/Integrating+into+the+Open+Metadata+and+Governance+Ecosystem
IBM Information
Governance Catalog
Apache
Atlas
Apache Atlas as an open innovation platform for metadata management and governance38
Caller Pattern
 A metadata tool can access the
consumer-specific APIs to work
with metadata.
 The Access Layer handles the
calls to metadata repositories
connected to the metadata
highway
Apache Atlas as an open innovation platform for metadata management and governance39
Native Pattern
 Native
implementation of
the open
metadata
governance APIs
 Apache Atlas is a
native
implementation of
the open
metadata and
governance APIs.
Apache Atlas as an open innovation platform for metadata management and governance40
Adapter Pattern
 Simple
components plug
into a repository
proxy to connect
in an existing
metadata
repository.
Apache Atlas as an open innovation platform for metadata management and governance41
Plug-in Pattern
 Open Connector Framework (OCF)
• Connectors to data, analytics etc
 Open Discovery Framework (ODF)
• Metadata discovery services
 Governance action Framework (GAF)
• Stewardship services for triage and
remediation of exceptions
Apache Atlas as an open innovation platform for metadata management and governance42
IBM Unified Governance
Apache Atlas as an open innovation platform for metadata management and governance43
Simple cohort
Cohort A
Chief Data Office
Data Lake
Systems of Record
Apache Atlas as an open innovation platform for metadata management and governance44
Multiple Cohorts
Cohort BCohort A
Chief Data Office
Data Lake
Systems of Record
Mobile
Apps
Data
Lake
Systems of
Record
Marketing
Apache Atlas as an open innovation platform for metadata management and governance45
First server
Apache Atlas as an open innovation platform for metadata management and governance46
Establishing contact
Apache Atlas as an open innovation platform for metadata management and governance47
Federated queries
Apache Atlas as an open innovation platform for metadata management and governance48
Caching metadata for availability and performance
Apache Atlas as an open innovation platform for metadata management and governance49
ODPI - co-creation with practitioners
• Compliance assistance and certification
for vendors
• Subject matter experts sharing best
practices and co-creating content packs
https://github.com/odpi/data-governance
Apache Atlas as an open innovation platform for metadata management and governance50
• Your governance program is based on
established practices and definitions
• Allows a broader range of tools in your
organization
• Automated governance processes
protect and manage your data
Your metadata offerings will deliver value
faster as they tap into metadata collected by
other vendor’s tools.
ODPi packages extend your metadata
system’s and tools’ capabilities
Conformance tests minimize your effort in
being compliant with key standards and
regulations.
Customers have increased confidence in your
tools and services due to ODPi certification.
Data Governance Professionals
Vendors
How ODPi Helps
Apache Atlas as an open innovation platform for metadata management and governance51
Summary
 Big data is creating new opportunities and requirements that needs new types
of systems. Data Lakes are just one part of this story.
 Metadata is critical to make the best use of this data for the widest range of
scenarios.
 Most organizations use tools and platforms from many vendors.
 Open standards have had limited take-up
 Can we use open source to create a digital platform that allows vendors to take
advantage of metadata from a broader ecosystem?
• Open Metadata and Governance defines the standards
• Apache Atlas provides the reference implementation
• ODPi helps to build the ecosystem
Apache Atlas as an open innovation platform for metadata management and governance52
Call to action – how can you help?
 Direct contribution to the Apache Atlas and/or ODPi Data Governance projects.
• There are many features that still need to be developed.
 Encouraging your vendors/partners and projects internal to your organization
to embrace the Open Metadata and Governance standards to grow the
ecosystem of data and processing that is assured by metadata and governance
capability.
52
Apache Atlas as an open innovation platform for metadata management and governance53
https://cwiki.apache.org/confluence/display/ATLAS/Atlas+Projects
Apache Atlas as an open innovation platform for metadata management and governance54
zzzz
z
z
z
Questions?

Contenu connexe

Tendances

Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
 
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...Impetus Technologies
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureDatabricks
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best PracticesDATAVERSITY
 
DAS Slides: Building a Future-State Data Architecture Plan - Where to Begin?
DAS Slides: Building a Future-State Data Architecture Plan - Where to Begin?DAS Slides: Building a Future-State Data Architecture Plan - Where to Begin?
DAS Slides: Building a Future-State Data Architecture Plan - Where to Begin?DATAVERSITY
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?DATAVERSITY
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogDATAVERSITY
 
Data quality and data profiling
Data quality and data profilingData quality and data profiling
Data quality and data profilingShailja Khurana
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?confluent
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Agile & Data Modeling – How Can They Work Together?
Agile & Data Modeling – How Can They Work Together?Agile & Data Modeling – How Can They Work Together?
Agile & Data Modeling – How Can They Work Together?DATAVERSITY
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks DeltaDatabricks
 
Measuring Data Quality with DataOps
Measuring Data Quality with DataOpsMeasuring Data Quality with DataOps
Measuring Data Quality with DataOpsSteven Ensslen
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks
 

Tendances (20)

Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
DAS Slides: Building a Future-State Data Architecture Plan - Where to Begin?
DAS Slides: Building a Future-State Data Architecture Plan - Where to Begin?DAS Slides: Building a Future-State Data Architecture Plan - Where to Begin?
DAS Slides: Building a Future-State Data Architecture Plan - Where to Begin?
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
 
Data quality and data profiling
Data quality and data profilingData quality and data profiling
Data quality and data profiling
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Agile & Data Modeling – How Can They Work Together?
Agile & Data Modeling – How Can They Work Together?Agile & Data Modeling – How Can They Work Together?
Agile & Data Modeling – How Can They Work Together?
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Measuring Data Quality with DataOps
Measuring Data Quality with DataOpsMeasuring Data Quality with DataOps
Measuring Data Quality with DataOps
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 

Similaire à Apache Atlas Open Innovation Platform

The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...DataWorks Summit
 
The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...DataWorks Summit
 
Apache atlas sydney 2017-v4
Apache atlas   sydney 2017-v4Apache atlas   sydney 2017-v4
Apache atlas sydney 2017-v4Nigel Jones
 
Manage tracability with Apache Atlas, a flexible metadata repository
Manage tracability with Apache Atlas, a flexible metadata repositoryManage tracability with Apache Atlas, a flexible metadata repository
Manage tracability with Apache Atlas, a flexible metadata repositorySynaltic Group
 
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo
 
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)Denodo
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarHortonworks
 
Big Data Discovery
Big Data DiscoveryBig Data Discovery
Big Data DiscoveryHarald Erb
 
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...Big Data Week
 
EAP - Accelerating behavorial analytics at PayPal using Hadoop
EAP - Accelerating behavorial analytics at PayPal using HadoopEAP - Accelerating behavorial analytics at PayPal using Hadoop
EAP - Accelerating behavorial analytics at PayPal using HadoopDataWorks Summit
 
Oracle Big Data Governance Webcast Charts
Oracle Big Data Governance Webcast ChartsOracle Big Data Governance Webcast Charts
Oracle Big Data Governance Webcast ChartsJeffrey T. Pollock
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?RTTS
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopInside Analysis
 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsFredReynolds2
 
The State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and BeyondThe State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and BeyondSingleStore
 
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Rittman Analytics
 
Data Blending, Caching and Optimizing
Data Blending, Caching and OptimizingData Blending, Caching and Optimizing
Data Blending, Caching and OptimizingLogi Analytics
 
Tapdata Product Intro
Tapdata Product IntroTapdata Product Intro
Tapdata Product IntroTapdata
 
Unleashing the power of apache atlas with apache - virtual dataconnector
Unleashing the power of apache atlas with apache  - virtual dataconnectorUnleashing the power of apache atlas with apache  - virtual dataconnector
Unleashing the power of apache atlas with apache - virtual dataconnectorNigel Jones
 
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...DataWorks Summit/Hadoop Summit
 

Similaire à Apache Atlas Open Innovation Platform (20)

The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
 
The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...
 
Apache atlas sydney 2017-v4
Apache atlas   sydney 2017-v4Apache atlas   sydney 2017-v4
Apache atlas sydney 2017-v4
 
Manage tracability with Apache Atlas, a flexible metadata repository
Manage tracability with Apache Atlas, a flexible metadata repositoryManage tracability with Apache Atlas, a flexible metadata repository
Manage tracability with Apache Atlas, a flexible metadata repository
 
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
 
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinar
 
Big Data Discovery
Big Data DiscoveryBig Data Discovery
Big Data Discovery
 
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
 
EAP - Accelerating behavorial analytics at PayPal using Hadoop
EAP - Accelerating behavorial analytics at PayPal using HadoopEAP - Accelerating behavorial analytics at PayPal using Hadoop
EAP - Accelerating behavorial analytics at PayPal using Hadoop
 
Oracle Big Data Governance Webcast Charts
Oracle Big Data Governance Webcast ChartsOracle Big Data Governance Webcast Charts
Oracle Big Data Governance Webcast Charts
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of Hadoop
 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential Tools
 
The State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and BeyondThe State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and Beyond
 
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
 
Data Blending, Caching and Optimizing
Data Blending, Caching and OptimizingData Blending, Caching and Optimizing
Data Blending, Caching and Optimizing
 
Tapdata Product Intro
Tapdata Product IntroTapdata Product Intro
Tapdata Product Intro
 
Unleashing the power of apache atlas with apache - virtual dataconnector
Unleashing the power of apache atlas with apache  - virtual dataconnectorUnleashing the power of apache atlas with apache  - virtual dataconnector
Unleashing the power of apache atlas with apache - virtual dataconnector
 
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
 

Plus de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Plus de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Dernier

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 

Dernier (20)

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 

Apache Atlas Open Innovation Platform

  • 1. Mandy Chessell CBE FREng CEng FBCS Distinguished Engineer, Master Inventor Analytics Chief Data Office  mandy_chessell@uk.ibm.com 18th April 2018 Good analytics needs good data and that needs good metadata
  • 2. Apache Atlas as an open innovation platform for metadata management and governance3 Agenda  Why is metadata so important today?  What is the challenge?  Building an open ecosystem  Apache Atlas and the specifics  ODPI Data Governance PMC  Progress report and call to action
  • 3. Apache Atlas as an open innovation platform for metadata management and governance4 Open Data Site The perils of reusing data … Data Lake Employee Directory Callie Quartile uses (1) open data from the local government registrar and (2) data from the employee directory to (3) create a birthday card service for the company. Callie Quartile Data Scientist 1 3 2
  • 4. Apache Atlas as an open innovation platform for metadata management and governance5 Open Data Site The perils of reusing data … Data Lake Employee Directory Callie Quartile Data Scientist 1 3 2 Happy Birthday But its not my birthday Unfortunately the obvious date in the registrar record was the registration of birth date not the date of birth. Date of birth was not published in the open data. Callie needed better information about the open data to realise she had the wrong data.
  • 5. Apache Atlas as an open innovation platform for metadata management and governance6 Metadata should bring as much information about the data sets to Callie’s data science as is known collectively by the organization. Employee Directory NameBand Job Title X Data Set Name: Employee Directory X Description: Core attributes describing all employees of OCO pharmaceuticals created from a daily extract from Kenexa. Owner: Penny Payer Status: Last accessed: 6th May 2016 Records: 3488 Last Update: 1st May 2016 Contents: Structure … Contents … Lineage … XColumn: Band Classification Ranges: Confidentiality: Public, Confidential, Sensitive Confidence: Authoritative Retention: Indefinitely Characteristi cs LineageDescription Position reference number for non- exempt employees. The value ranges from 01 to 06 where 01 is the most senior and 06 is the most junior. Type: String Classification: Public
  • 6. Apache Atlas as an open innovation platform for metadata management and governance7 Different personas need different services Callie Quartile Data Scientist Jules Keeper Chief Data Officer Find data Understand data Manage analytics models Build data strategy Define governance program Monitor progress
  • 7. Apache Atlas as an open innovation platform for metadata management and governance8 Different personas need different services Faith Broker HR and Privacy Officer Gary Geeke IT Locate personal data Ensure protection of personal data Understand employee needs Maintain “safe” IT Infrastructure Build and deploy “good” APIs and services Locate and resolve issues fast
  • 8. Apache Atlas as an open innovation platform for metadata management and governance9 Different personas need different services Tanya Tidie Clinical Trials Administrator Ivor Padlock Chief Security Officer Maintain accurate patient records Catalog clinical trials data Demonstrate good data management practices Understand risks to organization Set up protection Monitor for suspicious activity
  • 9. Apache Atlas as an open innovation platform for metadata management and governance10 Scope of metadata for a data driven organization Glossary Collaboration Governance Models and Reference Data Metadata Discovery Lineage Data Assets Base Types, Systems and Infrastructure
  • 10. Apache Atlas as an open innovation platform for metadata management and governance11 Curation 00 3809890 6 7 Lemmie Stage 818928 3082 4 New York 4 27 DataStage Expert 1 45324 300 27 Code St Harlem NY 1 3 00 3809890 3 7 Callie Quartile 328080 7432 5 New York 4 27 Data Scientist 1 56944 045 27 Code St Harlem NY 1 3 00 3809890 1 7 Tanya Tidie 209482 4051 2 New York 4 27 Data Steward 1 43800 215 27 Code St Harlem NY 1 3 I know I wonder what this means
  • 11. Apache Atlas as an open innovation platform for metadata management and governance12 Scared to share Faith Broker Business Team 00 3809890 6 7 Lemmie Stage 818928 3082 4 New York 4 27 DataStage Expert 1 45324 300 27 Code St Harlem NY 1 3 00 3809890 3 7 Callie Quartile 328080 7432 5 New York 4 27 Data Scientist 1 56944 045 27 Code St Harlem NY 1 3 00 3809890 1 7 Tanya Tidie 209482 4051 2 New York 4 27 Data Steward 1 43800 215 27 Code St Harlem NY 1 3 Faith Broker has been doing some simple analysis on the HR data of the company. She wants to share this data with Callie Quartile to do some detailed work. However, she does not want Callie to see the sensitive personal information in the record. 00 3809890 6 7 Lemmie Stage 818928 3082 4 New York 4 27 DataStage Expert 1 XXXXX XXX 27 Code St Harlem NY 1 3 00 3809890 3 7 Callie Quartile 328080 7432 5 New York 4 27 Data Scientist 1 XXXXX XXX 27 Code St Harlem NY 1 3 00 3809890 1 7 Tanya Tidie 209482 4051 2 New York 4 27 Data Steward 1 XXXXX XXX 27 Code St Harlem NY 1 3 Callie Quartile Data Scientist
  • 12. Apache Atlas as an open innovation platform for metadata management and governance13 Business metadata Structural metadata for a data store Using glossary function for semantic processing EMPNAME EMPNO JOBCODE SALARY EMPLOYEE RECORD Employee Work Location Annual Salary Job Title Employee Id Employee Name Hourly Pay Rate Manager Compensation Plan HAS-A HAS-A HAS-A HAS-A HAS-A HAS-A IS-A IS-A Sensitive IS-A Data 00 3809890 6 7 Lemmie Stage 818928 3082 4 New York 4 27 DataStage Expert 1 45324 300 27 Code St Harlem NY 1 3
  • 13. Apache Atlas as an open innovation platform for metadata management and governance14 Why do we need metadata?  Metadata enables data to be used outside of the application that created it. • Analytics and decision making • New business applications • Reporting and compliance  Metadata describes the format and content of data allowing people to judge which data set to use for a new project • Structure • Meaning • Origin • Valid values and quality • Usage and ownership • Regulations and classifications that apply • <more>  Metadata describes the business context and classification of data allowing automated governance processes to operate.
  • 14. Apache Atlas as an open innovation platform for metadata management and governance15 Today’s reality  Many data platforms do not have metadata support  Proprietary tools support a range of data sources and governance actions • No-one supports everything you need and assumes all tools come from their suite • Each tool starts “empty” requiring effort to populate metadata • Each tool operates as if it is the only tool • No integration/interoperability of metadata repositories from different vendors  Expensive efforts to create an enterprise data catalogue
  • 15. Apache Atlas as an open innovation platform for metadata management and governance16 Today’s reality
  • 16. Apache Atlas as an open innovation platform for metadata management and governance17 Manual metadata capture
  • 17. Apache Atlas as an open innovation platform for metadata management and governance18 Automatic metadata capture 18
  • 18. Apache Atlas as an open innovation platform for metadata management and governance19 What needs to change? Open and Unified Metadata
  • 19. Apache Atlas as an open innovation platform for metadata management and governance20 A new manifesto for metadata and governance  Metadata management must be automated  Metadata management must become ubiquitous  Metadata must become open and remotely accessible  Metadata should be used to drive the governance of data The discovery, maintenance and use of metadata has to be an integral part of all tools that access, change and move information. 20
  • 20. Apache Atlas as an open innovation platform for metadata management and governance21 Open metadata management ecosystem  Peer-to-peer network of repositories  Metadata stored and managed close to its source  Each repository/tool brings unique value.  Open, extensible metadata structures for metadata exchange and federation – extending coverage of the types of resources that need to be described.  Open source infrastructure sharing cost of development and maintenance between vendors  Support for open standards where available Collaboration Space Metadata Analytics Platform Metadata Application Metadata Cloud SaaS platform Metadata Hadoop Platform Metadata
  • 21. Apache Atlas as an open innovation platform for metadata management and governance22 Apache Atlas http://atlas.apache.org/  Apache Atlas has just graduated to become a top-level project.  It began as an incubator open source project on 5th May 2015 to deliver an open source governance capability focused primarily on the Hadoop platform.  Apache Atlas is designed to localize operational governance to the operating data platform such as Hadoop.  At its heart is a type-agnostic metadata store that can be access through restful interfaces. We see Apache Atlas as the reference implementation for open metadata and governance, for vendors to pick up and use; or test their integration against. Being open source allows all vendors to enrich/enhance standard.
  • 22. Apache Atlas as an open innovation platform for metadata management and governance23 Apache Atlas today
  • 23. Apache Atlas as an open innovation platform for metadata management and governance24 Updates to Apache Atlas  Automation • Capture of metadata from data platforms, data movement engines and data protection engines. • Exception management and stewardship  Business Value • Specialized services for key data roles such as CDO, Data Scientist, Developer, DevOps Operator, Asset Owner, Applications  Connectivity • Metadata Highway offering open metadata exchange, linking and federation between heterogeneous metadata repositories.
  • 24. Apache Atlas as an open innovation platform for metadata management and governance25 Taking guidance from existing metadata standards  Well-defined  Complementary  Integrating  Decoupled https://www.w3.org/TR/vocab-dcat/
  • 25. Apache Atlas as an open innovation platform for metadata management and governance26 Instance representations in the graph
  • 26. Apache Atlas as an open innovation platform for metadata management and governance27 Open metadata meta-types, types and instances «relationship» DataContentForDataSet * * dataContent supportedDataSets «entity» DataSet createTime : date modifiedTime : date «entity» DataStore «entity» Asset «entity» GlossaryTerm «entity» Referenceable description : string expression : string status : TermAssignmentStatus confidence : int steward : string source : string «relationship» SemanticAssignment * * assignedElements meaning
  • 27. Apache Atlas as an open innovation platform for metadata management and governance28 Open metadata type model summary Glossary Collaboration Governance Models and Reference Data Metadata Discovery Lineage Data Assets 4 3 1 5 2 6 7 Base Types, Systems and Infrastructure 0
  • 28. Apache Atlas as an open innovation platform for metadata management and governance29 Open metadata type model summary Policy Metadata (Principles, Regulations, Standards, Approaches, Rule Specifications, Roles and Metrics) Governance Actions and Processes Augmentation MappingImplementation Business Objects and Relationships, Taxonomies and Ontologies Business Attributes Organization Teaming Metadata (people profiles, communities, projects, notebooks, …) Models and Schemas 4 3 1 5 Physical Asset Descriptions (Data stores, APIs, models and components) Asset Collections (Sets, Typed Sets, Type Organized Sets) Information Views Rights Management Reference Data Feedback Metadata (tags, comments, ratings, …) ClassificationSchemes Classification Strategy Subject Area Definition Campaigns and Projects Rollout 2 Discovery Metadata (profile data, technical classification, data classification, data quality assessment, …) Augmentation Instrument Association Information Process Instrumentation (design lineage) 6 7 O-DEF O-BDL ConnectorsBasic Types, Infrastructure and Systems Access 0
  • 29. Apache Atlas as an open innovation platform for metadata management and governance30 More detail here … https://cwiki.apache.org/confluence/display/ATLAS/Building+out+the+Open+Metadata+Typesystem
  • 30. Apache Atlas as an open innovation platform for metadata management and governance31 Metadata and governance digital platform Open Metadata and Governance Reporting Platform ETL Platform Analytics Platform Virtualization Platform Governance Platform Data Platform
  • 31. Apache Atlas as an open innovation platform for metadata management and governance32 Types of tools that may integrate with an open metadata repository  BI and visualization tools • locating data assets and related information about them; defining reports and publishing their metadata; viewing lineage  Data Science tool • wanting to find out about data assets available and manage user lineage of transformations and analytics models – may also manage metadata for analytics models  API developer tool • wanting to understand proper data structures and data meaning to use for APIs – plus additional governance requirements that need to be implemented by API because of the data it exchanges.  Counter-fraud tools • ad hoc analysis of logs and error reports, setting up rules  Curator/owner tool • for managing the curation of assets, providing access, verifying use of assets, reviewing discovery results and exceptions, approving change requests.  Glossary tool • for subject matter experts and information architects to share expertise about a particular subject area – may also define structures and related reference data  Enterprise architect tools • defining the data landscape and related systems.  DevOps tools • conformance to polices and standards in development • metadata capture at deployment • validation of deployment platform requirements  Data integration engine • locating appropriate data and component assets, log design lineage, log operational lineage  Information Virtualisation tools • locate appropriate data assets, build views and publish them, add design lineage, log operational lineage  Governance tools • setting up and monitoring governance program, data quality, …  Stewardship tools • reviewing assigned exceptions, making data changes and requesting approval  Information security tools • setting up data access policies and enforcement  Auditor tools • view compliance reports and validate policies and policy implementations
  • 32. Apache Atlas as an open innovation platform for metadata management and governance33 Open Metadata Access Services Project Management Community ProfileAsset Catalog Stewardship Action Information View Governance Program Information Process Subject Area Connected Asset Discovery Governance Engine Information Protection Developer Data Platform Asset Owner Information Landscape Data Science DevOps Asset Consumer Information Infrastructure
  • 33. Apache Atlas as an open innovation platform for metadata management and governance34 OMAS service instance Both call API and notifications
  • 34. Apache Atlas as an open innovation platform for metadata management and governance35 Inside the server Open Metadata and Governance (OMAG) Server Open Metadata Access Services (OMAS) OMRS Topic Connector OMRS Cohort Registry Store Connector OMRS Archive Connector OMRS AuditLog Connector OMRS Event Mapper Connector OMRS Repository Connector Server Configuration OMAS REST APIs and Topics OMAG Administration REST APIs OMRS Repository REST APIs Open Metadata Repository Services (OMRS)
  • 35. Apache Atlas as an open innovation platform for metadata management and governance36 Inside the server Open Metadata and Governance (OMAG) Server Open Metadata Access Services (OMAS) OMRS Topic Connector OMRS Cohort Registry Store Connector OMRS Archive Connector OMRS AuditLog Connector OMRS Event Mapper Connector OMRS Repository Connector Server Configuration OMAS REST APIs and Topics OMAG Administration REST APIs OMRS Repository REST APIs Administration Enterprise Repository Services Local Repository Services Cohort Services
  • 36. Apache Atlas as an open innovation platform for metadata management and governance37 Integration patterns https://cwiki.apache.org/confluence/display/ATLAS/Integrating+into+the+Open+Metadata+and+Governance+Ecosystem IBM Information Governance Catalog Apache Atlas
  • 37. Apache Atlas as an open innovation platform for metadata management and governance38 Caller Pattern  A metadata tool can access the consumer-specific APIs to work with metadata.  The Access Layer handles the calls to metadata repositories connected to the metadata highway
  • 38. Apache Atlas as an open innovation platform for metadata management and governance39 Native Pattern  Native implementation of the open metadata governance APIs  Apache Atlas is a native implementation of the open metadata and governance APIs.
  • 39. Apache Atlas as an open innovation platform for metadata management and governance40 Adapter Pattern  Simple components plug into a repository proxy to connect in an existing metadata repository.
  • 40. Apache Atlas as an open innovation platform for metadata management and governance41 Plug-in Pattern  Open Connector Framework (OCF) • Connectors to data, analytics etc  Open Discovery Framework (ODF) • Metadata discovery services  Governance action Framework (GAF) • Stewardship services for triage and remediation of exceptions
  • 41. Apache Atlas as an open innovation platform for metadata management and governance42 IBM Unified Governance
  • 42. Apache Atlas as an open innovation platform for metadata management and governance43 Simple cohort Cohort A Chief Data Office Data Lake Systems of Record
  • 43. Apache Atlas as an open innovation platform for metadata management and governance44 Multiple Cohorts Cohort BCohort A Chief Data Office Data Lake Systems of Record Mobile Apps Data Lake Systems of Record Marketing
  • 44. Apache Atlas as an open innovation platform for metadata management and governance45 First server
  • 45. Apache Atlas as an open innovation platform for metadata management and governance46 Establishing contact
  • 46. Apache Atlas as an open innovation platform for metadata management and governance47 Federated queries
  • 47. Apache Atlas as an open innovation platform for metadata management and governance48 Caching metadata for availability and performance
  • 48. Apache Atlas as an open innovation platform for metadata management and governance49 ODPI - co-creation with practitioners • Compliance assistance and certification for vendors • Subject matter experts sharing best practices and co-creating content packs https://github.com/odpi/data-governance
  • 49. Apache Atlas as an open innovation platform for metadata management and governance50 • Your governance program is based on established practices and definitions • Allows a broader range of tools in your organization • Automated governance processes protect and manage your data Your metadata offerings will deliver value faster as they tap into metadata collected by other vendor’s tools. ODPi packages extend your metadata system’s and tools’ capabilities Conformance tests minimize your effort in being compliant with key standards and regulations. Customers have increased confidence in your tools and services due to ODPi certification. Data Governance Professionals Vendors How ODPi Helps
  • 50. Apache Atlas as an open innovation platform for metadata management and governance51 Summary  Big data is creating new opportunities and requirements that needs new types of systems. Data Lakes are just one part of this story.  Metadata is critical to make the best use of this data for the widest range of scenarios.  Most organizations use tools and platforms from many vendors.  Open standards have had limited take-up  Can we use open source to create a digital platform that allows vendors to take advantage of metadata from a broader ecosystem? • Open Metadata and Governance defines the standards • Apache Atlas provides the reference implementation • ODPi helps to build the ecosystem
  • 51. Apache Atlas as an open innovation platform for metadata management and governance52 Call to action – how can you help?  Direct contribution to the Apache Atlas and/or ODPi Data Governance projects. • There are many features that still need to be developed.  Encouraging your vendors/partners and projects internal to your organization to embrace the Open Metadata and Governance standards to grow the ecosystem of data and processing that is assured by metadata and governance capability. 52
  • 52. Apache Atlas as an open innovation platform for metadata management and governance53 https://cwiki.apache.org/confluence/display/ATLAS/Atlas+Projects
  • 53. Apache Atlas as an open innovation platform for metadata management and governance54 zzzz z z z Questions?

Notes de l'éditeur

  1. Business metadata describes the data that the business needs, what it means and how it should be classified and protected. Structural metadata describes how the data is actually stored and labelled in the data store. The linkage between the business and technical metadata allows our technology to switch between these two perspectives. For example, A request for data expressed in business terminology can be translated into a query for data from a data store. An integration engine copying data into a sand box can discover which are the fields that the business classifies as sensitive and then mask these values dynamically.
  2. AUTOMATED – Metadata is created by application at the same as the data is created in a standard manner easily consumable for all with necessary permissions Device that took the picture / name of picture / settings picture was taken at / location geo tag of picture etc – all automatic – all done at creation of data time
  3. The maintenance of metadata must be automated to scale to the sheer volumes and variety of data involved in modern business.   Metadata management must become ubiquitous in cloud platforms and large data platforms, such as Apache Hadoop so that the processing engines on these platforms can rely on its availability and build capability around it. Metadata access must become open and remotely accessible so that tools from different vendors can work with metadata located on different platforms. This implies unique identifiers for metadata elements, some level of standardization in the types and formats for metadata and standard interfaces for manipulating metadata. Metadata should be used to drive the governance of data and create a business friendly logical interface to the data landscape. Wherever possible, discovery and maintenance of metadata has to an integral part of all tools that access, change and move information.