Learn how Apache Atlas is being enhanced to provide a universal open metadata and governance platform for all data processing across the enterprise. With open metadata, multiple metadata repositories, potentially from different vendors, can operate collaboratively to create an enterprise catalog of data that can be located, understood, used and governed. In this talk we will provide a detailed description of the extensions to the type system, new APIs, the connector framework, metadata discovery framework, governance action framework and the inter-operability that we are adding to Apache Atlas. We will show examples of these features in operation. For example, (1) how metadata is discovered and gathered into Apache Atlas, (2) how applications and tools access metadata, (3) how enforcement engines such as Apache Ranger keep synchronized with the latest governance requirements and (4) how to build an adapter to allow other vendor's metadata repositories can exchange metadata with Apache Atlas repositories. We will also explain how these features can be deployed together to support the Hadoop platform, and the enterprise beyond. This session will be presented by Nigel Jones - IBM & Ferd Schapers - ING Chief Information Architect
Speaker:
Nigel Jones, Software Architect, IBM Analytics Group, IBM
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Apache Atlas & Open Metadata for Data Governance
1. Apache Atlas & Open Metadata
Dataworks Sydney 2017
Nigel Jones,
Software Architect
IBM
Ferd Scheepers
Chief Information Architect
ING
2. 2
Open Metadata and Governance will allow…
… metadata to be captured when the data is created, moved with the data and
be augmented and processed by any of the vendor tools.
3. Open Metadata and Governance consists of:
1. Standardized, extensible set of metadata types
2. Metadata exchange APIs and notifications
3. Frameworks for automated governance
Open Metadata and Governance will allow you to have:
1. An enterprise data catalogue that lists all of your data, where it is located, its origin (lineage),
owner, structure, meaning, classification and quality
2. New data tools (from any vendor) connect to your data catalogue out of the box
3. Metadata being added automatically to the catalogue as new data is created and analysed
4. Subject matter experts collaborating around the data
5. Automated governance processes protect and manage your data
3
What is Open Metadata and Governance?
4. 4
Positioning of Apache Atlas for Open Metadata
Open and
Unified Metadata
Metadata
repository
Apache Atlas
Metadata
repository
IBM
Metadata
repository
SAS
Open Metadata Repository Service
OMRS
Open Metadata Access Service
OMAS
Components defined
and being developed
by Open Metadata &
Governance project
Metadata
highway
5. • Apache Atlas provides an open community for developing the reference implementation
for open metadata and governance. In essence Apache Atlas delivers 2 main
capabilities:
• it plays a role of a metadata repository (Graph Database) for a metadata end-user tool
• and, it plays the important role of delivering the federated/unified metadata layer
across the entire landscape of an enterprise
• The software development governance from the Apache Software Foundation (ASF)
creates confidence that the technology will be maintained and enhanced as appropriate
in an equitable manner.
Role of Apache Atlas
5
6. … because Apache is mostly focused on development and we are missing a governance
body for managing the adoption of and compliance to the Open Metadata and Governance
standards. We envision the following roles for ODPI:
1. Be an advocate of the Open Metadata and Governance standards, make them visible
and their value understood.
2. Facilitate discussions around the Open Metadata and Governance standards evolution,
maintenance and development.
3. Test and sign-off compliance of vendor offerings to the Open Metadata and Governance
standards.
6
Doing all of this under Apache Atlas flag is not enough…
7. 1. Hands-on Community members:
• ING
• IBM
• HortonWorks
2. Companies we have had conversations with:
• CIBC
• SAS
• Microsoft
• Oracle
• Informatica
• Waterline
• RBC
• DBS
7
Who is in ?
8. 1. Ambition level:
• End of September 2017: Open Metadata working demo.
• Mid-December November 2017: first version of user access.
• Google for Data
2. Next steps:
• End of Q2 2018: production ready version of Virtual Data
Connector.
8
Timeline and next steps
15. Common Core Data model
Data Assets Governance Lineage
Glossary Collaboration
Models &
Reference
Data
Base Types,
Systems &
Infrastructure
Metadata
Discovery
https://cwiki.apache.org/confluence/display/ATLAS/Building+out+the+Open+Metadata+Typesystem
16.
17. Open APIs - OMRS
Metadata Highway
Adapter
Plugin
Open Connector
Framework
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=70258803
18. Open APIs - OMAS
OMRS
Governance
Engine
OMAS
Glossary
OMAS
Asset OMAS
Information
View OMAS
++......
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=70258799
19. OMAS – detail
Project List
Metadata Service
Data/Asset
Community Metadata
Service
Landscape Definition
Metadata Service
Asset Catalog
Metadata Service
Classification and Mapping
Metadata Service
Information View
Metadata Service
Connector Directory
Metadata Service
Governance Definitions
Metadata Service
Information Process
Metadata Service
Glossary and Taxonomy
Metadata Service
Asset
Metadata Service
Discovery
Metadata Service
Governance Action
Metadata Service
Roles and Access
Metadata Service
Models and Schema
Metadata Service
Connector
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=70258799
20. Business
metadata
Structural
metadata for
a data store
New glossary function for semantic processing
EMPNAME EMPNO JOBCODE SALARY
EMPLOYEE
RECORD
Employee
Work Location
Annual Salary
Job Title
Employee Id
Employee Name
Hourly Pay Rate
Manager Compensation Plan
HAS-A
HAS-A
HAS-A
HAS-A
HAS-A
HAS-A
IS-A IS-A
Sensitive
IS-A
Data
00 3809890 6 7 Lemmie Stage 818928 3082 4 New York 4 27 DataStage Expert 1 45324 300 27 Code St Harlem NY 1 3
https://cwiki.apache.org/confluence/display/ATLAS/Area+3+-+Glossary
26. How can I get involved?
Discuss: Mailing List
Document, Explain: Wiki
Report, Design: Jira
Face to face
Code
Vendors!
https://cwiki.apache.org/confluence/display/ATLAS/Getting+Involved