SlideShare a Scribd company logo
1 of 40
Download to read offline
Egeria and Graphs
Graham Wallis, August 2020
Graham Wallis is an open-source developer and maintainer on the ODPi Egeria project. He has worked with graph-
related technologies for about 5 years, so he doesn’t have all the answers but hopes you find this presentation
interesting and useful.
Metadata, Sharing & Automation
An example of open, standardized metadata
4
• In a commercial setting, metadata is used to describe:
• database records and schemas, files and file formats, documents, models, …
• systems, applications, processes such as ETL, archiving, analytics, …
• business concepts as glossaries of terms and their semantic assignments
• In typical commercial organizations:
• the data landscape is vast and distributed
• data is dispersed across multiple data lakes managed by different parts of an organization
• multiple tools from different vendors are used to load, access and manage the data
• multiple tools are used to analyze the data
Commercial metadata and governance
6
Today’s reality – separate tools, disjointed metadata
• Organizations need a business-friendly logical interface to the data landscape. This implies that
the organization develop a common business vocabulary or glossary.
• Organizations need governance of data to be driven by the metadata, requiring that the metadata
is accurate and up-to-date.
• The maintenance of metadata must be automated to scale to the volumes and variety of data
involved in modern business.
• The metadata must be available across different tools and platforms so that processing engines
can build capability around it.
• Wherever possible, discovery and maintenance of metadata must be an integral part of tools that
access, change and move information.
• Metadata access must become open and remotely accessible so that tools from different vendors
can work with metadata located on different platforms.
• This implies unique identifiers for metadata elements, some level of standardization in the types
and formats for metadata and standard interfaces for accessing and manipulating metadata.
Commercial metadata and governance
The ODPi Egeria project
• ODPi Egeria is an open source project dedicated to making metadata open and automatically
exchanged between tools and data platforms
• Egeria provides an Apache 2.0 licensed platform to enable users and vendors to create an open
ecosystem for metadata
• Egeria arose from several years work by Mandy Chessell (IBM), Ferd Scheepers (ING Bank) and others,
on data lakes, data governance & common information models
• Egeria is hosted by the Linux Foundation ODPi project (Open Data Platform Initiative): egeria.odpi.org
• The code is on Github: github.com/odpi/egeria
• The Egeria community includes IBM, ING Bank, Manta and SAS plus contributions and interest from
other organizations and individuals.
Egeria Project & Community
10
Today’s reality – separate tools, disjointed metadata
11
Egeria enables exchange of metadata between tools from different vendors
Open and
Unified Metadata
Development DevOps Data Science
Egeria Servers and Cohorts
Cohort Cohort
External
Tool/Repository
Egeria
Server
Egeria
Server
Egeria
Server
Egeria
Server
Egeria
Repository
Egeria
Repository
Egeria
Server
A server may have a repository or may support a
given tool or external repository.
A server may join multiple cohorts.
Applications
Applications
Graphs in Metadata
Graphs in Metadata
Business
metadata
Structural
metadata for
a data store
EMPNAME EMPNO JOBCODE SALARY
EMPLOYEE
RECORD
Employee
Work Location
Annual Salary
Job Title
Employee Id
Employee Name
Hourly Pay Rate
Manager Compensation Plan
HAS-A
HAS-A
HAS-A
HAS-A
HAS-A
HAS-A
IS-A IS-A
SensitiveIS-A
Data
• The interconnected nature of metadata forms a graph
• The business concepts associated with the data form a graph of terms and classifications
Graphs in Metadata
• Different tools or databases gives rise to graphs at both business and technical levels
Querying across graphs…
• Enterprise integration and queries require that we can query across graphs and
between business and technical metadata
Parallels between graphs…
• The graph of artifacts in a Discovery Analysis Report mirrors the graph of schema elements
• As seen from the foregoing examples (of different tools, business and technical metadata, discovery analysis
reports) there are many graph-like structures in metadata
• Egeria is therefore based on graphs and graph-like approaches; it includes a graph repository and graph-
based tooling
• The Open Metadata Types form graphs - an entity type inheritance graph and a graph of the possible
relationship types for an entity type
• We also see graphs in glossary structure (glossary, terms, categories) as well in the semantic assignment of
glossary terms to metadata instances
• Metadata instances (entities, relationships and classifications) are organized as graphs and can be queried
using graph traversals
Graphs in Egeria
• Within the Egeria integration UI:
• The Type Explorer can be used to visualize entity type inheritance and entity type relationship graphs
• The Repository Explorer can be used to explore graphs of entities and relationships across repositories
• The Admin UI shows the deployed topology of Egeria platforms, servers and cohorts
Egeria UI graph visualizations
• Egeria can transparently federate metadata from multiple repositories, giving rise to a distributed graph
• Entities in different repositories can be related by a relationship in either repository or a further repository
• Entities and relationships in different repositories can be queried and traversed as if they were collocated
• Egeria’s federation capability avoids the need to move or copy metadata
• Ownership remains with the current owner
• There is no duplication, or risk of updates being applied to a copy of the metadata
• Egeria can create a local reference copy of a remote instance, as a locally cached copy, but ownership of the
metadata remains with the tool and repository that created it. Updates are only permitted on the owner’s
original, not on the copies
• When an Egeria user accesses a remote instance, the Egeria server will register interest in the remote
instance
• If the remote instance is modified or deleted, any registered Egeria servers receive events, delivered to the
access services that triggered the interest
• Ownership of an instance can be transferred if necessary
Egeria federation (a distributed graph)
Egeria distributed graph model
21
Database
Column
Glossary
Term
OMAG Server 1 OMAG Server 2
§ A pair of entities may be stored in separate servers
Egeria distributed graph model – using reference copies
22
Database
Column
Glossary
Term
Glossary
Term
Meaning
OMAG Server 1 OMAG Server 2
§ One entity could be replicated to the other server, as a ‘reference copy’
§ The original Glossary Term on OMAG Server 2 is still the authoritative instance; the copy cannot be updated
§ A relationship could be defined between the local DB column and the reference copy of the Glossary Term
Reference Copy
Egeria distributed graph model – using reference copies
23
Database
Column
Glossary
Term
OMAG Server 1
OMAG Server 3
OMAG Server 2
Database
Column
Glossary
Term
Meaning
§ Alternatively, both entities could be replicated to a third server, as reference copies
§ The originals are still the authoritative instances
§ A relationship could be defined between the local reference copies
Egeria distributed graph model – using entity proxies
24
Database
Column
Glossary
Term
OMAG Server 1
OMAG Server 3
OMAG Server 2
Meaning
Database
Column
Glossary
Term
§ Instead of replication, the third server could relate the original entities using entity proxies
Entity Proxy
The Egeria Graph Repository
Egeria OMRS Repositories
26
Search
Open Metadata Access Services
Open Metadata Repository Services
• Egeria includes a choice of metadata repositories, which can be used as additional metadata stores that can plug
functional gaps between other tools and repositories and can provide local access
• One of the Egeria repositories is a graph repository, which lends itself to the types of queries we saw earlier
Egeria Open Metadata Repository Services (OMRS)
• The OMRS defines a protocol and a set of connectors
• The Enterprise Connector performs cohort-wide operations – this
includes issuing queries to the cohort and when metadata is replicated
from another server it can use the local connector and repository to
cache it for availability and performance
• The Local Connector performs local operations and provides a default
Event Mapper that enables events relating to local operations to be sent
to the cohort
• The Repository Connector interfaces to a specific repository – and
optionally, may be accompanied by a custom Event Mapper
• Egeria provides two built-in repositories and there are connectors to
other repositories
• The interface to a repository connector is the MetadataCollection API,
described on the next slide
OMRS Enterprise Connector
OMRS Local Connector
& Event Mapper
OMRS Repository Connector
Repository
Cohort
MetadataCollection
API
The OMRSMetadataCollection interface
• The interface to an Egeria repository is the OMRSMetadataCollection interface
• It includes groups of operations:
• Group 1: Identification of the metadata repository - metadataCollectionId
• Group 2: Type definitions (types, attributes) - add, find, get, remove, …
• Group 3: Find instances (entities, relationships) - get, find, graph-queries, …
• Group 4: Maintain instances (entities, relationships) - addEntity, deleteEntity, …
• Group 5: Change control information (entities, relationships) - reIdentify, reHome, …
• Group 6: Maintenance of reference (replica) copies – save, purge, refresh,…
Egeria Local Graph Repository
• The Egeria distribution includes a persistent repository and a non-persistent repository
• The persistent repository is a graph repository built on JanusGraph, an open-source graph database project, hosted by the
Linux Foundation
• http://janusgraph.org
• http://github.com/janusgraph/janusgraph
• The built-in graph repository provides an OMAG Server with a persistent metadata store and is built using Egeria’s ‘plugin’
pattern
• The graph repository can store instances of metadata owned by the local server
• It can also store reference copies of metadata instances replicated to the local server
• It also supports relationship instances that refer to entity proxy instances
• Other graph databases are available, and Egeria’s pluggable connector architecture enables the creation of repository
connectors for different databases.
• The Conformance Test Suite provides a set of automated tests that can be run against a repository to assess whether it
correctly implements the Egeria types and interfaces
Anatomy of the local graph repository
30
Graph Metadata Store
JanusGraph
persistence
search
OMAG Server
OMAS – access services
OMRS Enterprise Connector OMRS topics
in
out
Apache
Tinkerpop
OMRS Local Connector
& Event Mapper
OMRS Graph Connector
JanusGraph
Management
Cohort
Graph Repository configurations
• The first release of the Egeria Graph Repository used BerkeleyDB and Lucene as embedded persistence
and indexing backends. This provides a relatively simple quick-start configuration, especially good for
development and testing and sufficient for some production uses.
• In production it may be desirable (or essential) to use a different persistence backend (e.g. Cassandra) or
indexing backend (e.g. Elastic).
• ING Bank added to the configuration of the Graph Repository to enable the use of (remote) Cassandra and
Elastic services.
• Discussions have started about work to add a remote JanusGraph Server configuration in order to provide
an HA option.
Graph Repository components
• GraphOMRSRepositoryConnector - implements the open connector framework interface
• GraphOMRSRepositoryConnectorProvider – implements the mechanism for brokering a connector
• GraphOMRSMetadataCollection – top level interface supporting type and instance operations
• GraphOMRSMetadataStore – implements the MetadataCollection using a graph database
• GraphOMRSGraphFactory – creation, schema, indexing - encapsulates JanusGraph-specifics
• Mappers – convert between OMRS objects and graph vertices and edges
• GraphOMRSEntityMapper
• GraphOMRSRelationshipMapper
• GraphOMRSClassificationMapper
• Plus various utility classes – error codes, audit logging, constants and utility methods
https://github.com/odpi/egeria/
See open-metadata-implementation/adapters/open-connectors/repository-services-connectors/
open-metadata-collection-store-connectors/graph-repository-connector
To use the Egeria Graph Repository
• Configure the OMAG Server with repository-mode = ‘local-graph-repository’
• e.g. HTTP POST http://localhost:8080/open-metadata/admin-
services/users/{username}/servers/{servermame}/local-repository/mode/local-graph-repository
• Start the OMRS instance in the server
• e.g. HTTP POST http://localhost:8080/open-metadata/admin-
services/users/{username}/servers/{servername}/instance
• If using the embedded configuration of Berkeley DB for persistence and Lucene for indexing,
when OMRS starts, the graph repository auto-creates a JanusGraph database – including:
• Persistence backend
• Search backend
• Graph schema
• Search indexes
• If using alternative backends for persistence or indexing, ensure that they are correctly configured
and available before starting the OMAG Server.
Graph Schema
The MetadataCollection interface is the formal interface to an Egeria repository.
Whilst it is possible to look at the graph directly (e.g. using Gremlin console):
Please don’t rely on the schema – it is likely to evolve
Type data:
• The Graph Repository does not store type definitions
• It delegates all type operations to the Repository Content Manager
Instance data:
• The Egeria Graph Repository stores instance data, using a JanusGraph schema that has:
• vertices for entities and classifications
• edges for relationships and classifiers
Instance representations in the OMRS
35
Relationship
Instance
Entity
Instance
Entity
Instance
Classification
Instance
Classification
Instance
Primitives
Enums
Collections
Attributes
Attributes
Attributes
Attributes
Primitives
Enums
Collections
Primitives
Enums
Collections
Primitives
Enums
Collections
Primitives
Enums
Collections
Attributes
Graph mapping – vertices and edges Relationship
Instance
Entity
Instance
Entity
Instance
Classification
Instance
Classification
Instance
Primitives
Enums
Collections
Attributes
Attributes
Attributes
Attributes
Primitives
Enums
Collections
Primitives
Enums
Collections
Primitives
Enums
Collections
Primitives
Enums
Collections
Attributes
Classification
Instance
Entity
Instance
Relationship Instance
Attributes
Primitives
Enums
Collections
AttributesAttributes
Primitives
Enums
Collections
Primitives
Enums
Collections
label : “classification” label : “entity” label : “relationship”
Properties Properties Properties
vertex
label : “classifier”
Properties
OMRSinstance
representation
Graphschema
element
vertex edge edge
Graph mapping – vertices and edges
Properties
Properties Properties
Properties
Properties
relationship
classifier
classifier
entity
entity
classification
classification
Relationship
Instance
Entity
Instance
Entity
Instance
Classification
Instance
Classification
Instance
Primitives
Enums
Collections
Attributes
Attributes
Attributes
Attributes
Primitives
Enums
Collections
Primitives
Enums
Collections
Primitives
Enums
Collections
Primitives
Enums
Collections
Attributes
Local instances, reference copies and proxies
38
• The graph contains one vertex per entity – whether the entity is local, a reference copy or a proxy
• If the entity has an associated classification, the classification is stored as a vertex, with an edge from the
entity vertex to the classification vertex
• The graph contains one edge per relationship – whether the relationship is local or a reference copy
• Reference Copies
• The metadataCollectionId core attribute is set to the ‘guid’ of the home repository
• Entity Proxy objects
• Each entity instance has a vertex property of type Boolean, to indicate whether the instance is a
proxy
Metadata Collection ‘graph-query’ methods
• There are 4 sub-graph query methods:
• getRelatedEntities() - optional
• Returns the entity and its immediate neighbors
• getEntityNeighborhood() - optional
• Returns the entity and its neighbors up to the depth specified by
the ‘level’ parameter
• getLinkingEntities() - optional
• Returns the relationships and intermediate entities that connect
the specified pair of entities
• getRelationshipsForEntity() - mandatory
• Returns relationships associated with entity, optionally filtered
by relationship type and status
level = 2
Graph Repository – supported functions
• The GraphRepository supports most of the OMRS MetadataCollection API, including:
• Save and purge of reference copies
• Use of entity proxies
• Delete and restore as well as purge – delete is a soft, restorable delete; purge is permanent
• Re-type of instances
• Re-identify of instances
• Re-home of instances
• The four ‘graph queries’ – described on the previous slide
• The ‘find’ methods – find..ByProperty, find..ByPropertyValue, findEntityByClassification
• The Graph Repository does not (yet) support:
• Historic queries – find methods that specify an asOfTime parameter
• Undo of previous instance updates
• Egeria project website: egeria.odpi.org
• Github: github.com/odpi/egeria
• Slack: https://slack.odpi.org/
More information…

More Related Content

What's hot

On-premise to Microsoft Azure Cloud Migration.
 On-premise to Microsoft Azure Cloud Migration. On-premise to Microsoft Azure Cloud Migration.
On-premise to Microsoft Azure Cloud Migration.Emtec Inc.
 
Azure Application Modernization
Azure Application ModernizationAzure Application Modernization
Azure Application ModernizationKarina Matos
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)James Serra
 
Accenture-Cloud-Data-Migration-POV-Final.pdf
Accenture-Cloud-Data-Migration-POV-Final.pdfAccenture-Cloud-Data-Migration-POV-Final.pdf
Accenture-Cloud-Data-Migration-POV-Final.pdfRajvir Kaushal
 
AWS Webcast - Migrating your Data Center to the Cloud
AWS Webcast - Migrating your Data Center to the CloudAWS Webcast - Migrating your Data Center to the Cloud
AWS Webcast - Migrating your Data Center to the CloudAmazon Web Services
 
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftBDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftAmazon Web Services
 
Cloud Migration Strategy and Best Practices
Cloud Migration Strategy and Best PracticesCloud Migration Strategy and Best Practices
Cloud Migration Strategy and Best PracticesQBurst
 
Migrating and modernizing your data estate to Azure with Data Migration Services
Migrating and modernizing your data estate to Azure with Data Migration ServicesMigrating and modernizing your data estate to Azure with Data Migration Services
Migrating and modernizing your data estate to Azure with Data Migration ServicesMicrosoft Tech Community
 
Multi-cloud integration architecture
Multi-cloud integration architectureMulti-cloud integration architecture
Multi-cloud integration architectureKim Clark
 
Introduction to Azure Synapse Webinar
Introduction to Azure Synapse WebinarIntroduction to Azure Synapse Webinar
Introduction to Azure Synapse WebinarPeter Ward
 
Azure fundamentals
Azure   fundamentalsAzure   fundamentals
Azure fundamentalsRaju Kumar
 
webMethods 10.5 and webMethods.io Integration: Everything You Must Know
webMethods 10.5 and webMethods.io Integration: Everything You Must KnowwebMethods 10.5 and webMethods.io Integration: Everything You Must Know
webMethods 10.5 and webMethods.io Integration: Everything You Must KnowKellton Tech Solutions Ltd
 
Cloud Computing - An Introduction
Cloud Computing - An IntroductionCloud Computing - An Introduction
Cloud Computing - An IntroductionRavindra Dastikop
 
Introduction to azure cosmos db
Introduction to azure cosmos dbIntroduction to azure cosmos db
Introduction to azure cosmos dbRatan Parai
 
Azure cloud migration simplified
Azure cloud migration simplifiedAzure cloud migration simplified
Azure cloud migration simplifiedGirlo
 
Cloud computing 8 cloud service models
Cloud computing 8 cloud service modelsCloud computing 8 cloud service models
Cloud computing 8 cloud service modelsVaibhav Khanna
 

What's hot (20)

On-premise to Microsoft Azure Cloud Migration.
 On-premise to Microsoft Azure Cloud Migration. On-premise to Microsoft Azure Cloud Migration.
On-premise to Microsoft Azure Cloud Migration.
 
App Modernization with Microsoft Azure
App Modernization with Microsoft AzureApp Modernization with Microsoft Azure
App Modernization with Microsoft Azure
 
Azure Application Modernization
Azure Application ModernizationAzure Application Modernization
Azure Application Modernization
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
Accenture-Cloud-Data-Migration-POV-Final.pdf
Accenture-Cloud-Data-Migration-POV-Final.pdfAccenture-Cloud-Data-Migration-POV-Final.pdf
Accenture-Cloud-Data-Migration-POV-Final.pdf
 
AWS Webcast - Migrating your Data Center to the Cloud
AWS Webcast - Migrating your Data Center to the CloudAWS Webcast - Migrating your Data Center to the Cloud
AWS Webcast - Migrating your Data Center to the Cloud
 
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftBDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
 
Azure Messaging Services #1
Azure Messaging Services #1Azure Messaging Services #1
Azure Messaging Services #1
 
Cloud Migration Strategy and Best Practices
Cloud Migration Strategy and Best PracticesCloud Migration Strategy and Best Practices
Cloud Migration Strategy and Best Practices
 
Migrating and modernizing your data estate to Azure with Data Migration Services
Migrating and modernizing your data estate to Azure with Data Migration ServicesMigrating and modernizing your data estate to Azure with Data Migration Services
Migrating and modernizing your data estate to Azure with Data Migration Services
 
Multi-cloud integration architecture
Multi-cloud integration architectureMulti-cloud integration architecture
Multi-cloud integration architecture
 
Introduction to Azure Synapse Webinar
Introduction to Azure Synapse WebinarIntroduction to Azure Synapse Webinar
Introduction to Azure Synapse Webinar
 
Azure fundamentals
Azure   fundamentalsAzure   fundamentals
Azure fundamentals
 
webMethods 10.5 and webMethods.io Integration: Everything You Must Know
webMethods 10.5 and webMethods.io Integration: Everything You Must KnowwebMethods 10.5 and webMethods.io Integration: Everything You Must Know
webMethods 10.5 and webMethods.io Integration: Everything You Must Know
 
Cloud Computing - An Introduction
Cloud Computing - An IntroductionCloud Computing - An Introduction
Cloud Computing - An Introduction
 
Introduction to azure cosmos db
Introduction to azure cosmos dbIntroduction to azure cosmos db
Introduction to azure cosmos db
 
original.pptx
original.pptxoriginal.pptx
original.pptx
 
Azure cloud migration simplified
Azure cloud migration simplifiedAzure cloud migration simplified
Azure cloud migration simplified
 
Cloud computing 8 cloud service models
Cloud computing 8 cloud service modelsCloud computing 8 cloud service models
Cloud computing 8 cloud service models
 
Azure 101
Azure 101Azure 101
Azure 101
 

Similar to Egeria and graphs

OSS NA 2019 - Demo Booth deck overview of Egeria
OSS NA 2019 - Demo Booth deck overview of EgeriaOSS NA 2019 - Demo Booth deck overview of Egeria
OSS NA 2019 - Demo Booth deck overview of EgeriaODPi
 
Technical Challenges in Open Metadata
Technical Challenges in Open MetadataTechnical Challenges in Open Metadata
Technical Challenges in Open MetadataAll Things Open
 
Metadata-powered dissemination of content
Metadata-powered dissemination of contentMetadata-powered dissemination of content
Metadata-powered dissemination of contentNikos Manouselis
 
Diksha sda presentation
Diksha sda presentationDiksha sda presentation
Diksha sda presentationdikshagupta111
 
Become an data driven organization through unified metadata using ODPi Egeria
Become an data driven organization through unified metadata using ODPi EgeriaBecome an data driven organization through unified metadata using ODPi Egeria
Become an data driven organization through unified metadata using ODPi EgeriaData Con LA
 
Open Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache AtlasOpen Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache AtlasDataWorks Summit
 
Ladies Be Architects - Integration - Multi-Org, Security, JSON, Backup & Restore
Ladies Be Architects - Integration - Multi-Org, Security, JSON, Backup & RestoreLadies Be Architects - Integration - Multi-Org, Security, JSON, Backup & Restore
Ladies Be Architects - Integration - Multi-Org, Security, JSON, Backup & Restoregemziebeth
 
1-informatica-training
1-informatica-training1-informatica-training
1-informatica-trainingKrishna Sujeer
 
Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -Aucfan
 
Semantic Technologies for Enterprise Cloud Management
Semantic Technologies for Enterprise Cloud ManagementSemantic Technologies for Enterprise Cloud Management
Semantic Technologies for Enterprise Cloud ManagementPeter Haase
 
Co 4, session 2, aws analytics services
Co 4, session 2, aws analytics servicesCo 4, session 2, aws analytics services
Co 4, session 2, aws analytics servicesm vaishnavi
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
Apache atlas sydney 2017-v4
Apache atlas   sydney 2017-v4Apache atlas   sydney 2017-v4
Apache atlas sydney 2017-v4Nigel Jones
 
Portal and Intranets
Portal and Intranets Portal and Intranets
Portal and Intranets Redar Ismail
 
Common Data Model - A Business Database!
Common Data Model - A Business Database!Common Data Model - A Business Database!
Common Data Model - A Business Database!Pedro Azevedo
 
IncQuery Server for Teamwork Cloud - Talk at IW2019
IncQuery Server for Teamwork Cloud - Talk at IW2019IncQuery Server for Teamwork Cloud - Talk at IW2019
IncQuery Server for Teamwork Cloud - Talk at IW2019Istvan Rath
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Cask Data
 
Delivering a Linked Data warehouse and realising the power of graphs
Delivering a Linked Data warehouse and realising the power of graphsDelivering a Linked Data warehouse and realising the power of graphs
Delivering a Linked Data warehouse and realising the power of graphsBen Gardner
 
Big SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on HadoopBig SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on HadoopWilfried Hoge
 
Informatica intro
Informatica introInformatica intro
Informatica introvam1
 

Similar to Egeria and graphs (20)

OSS NA 2019 - Demo Booth deck overview of Egeria
OSS NA 2019 - Demo Booth deck overview of EgeriaOSS NA 2019 - Demo Booth deck overview of Egeria
OSS NA 2019 - Demo Booth deck overview of Egeria
 
Technical Challenges in Open Metadata
Technical Challenges in Open MetadataTechnical Challenges in Open Metadata
Technical Challenges in Open Metadata
 
Metadata-powered dissemination of content
Metadata-powered dissemination of contentMetadata-powered dissemination of content
Metadata-powered dissemination of content
 
Diksha sda presentation
Diksha sda presentationDiksha sda presentation
Diksha sda presentation
 
Become an data driven organization through unified metadata using ODPi Egeria
Become an data driven organization through unified metadata using ODPi EgeriaBecome an data driven organization through unified metadata using ODPi Egeria
Become an data driven organization through unified metadata using ODPi Egeria
 
Open Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache AtlasOpen Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache Atlas
 
Ladies Be Architects - Integration - Multi-Org, Security, JSON, Backup & Restore
Ladies Be Architects - Integration - Multi-Org, Security, JSON, Backup & RestoreLadies Be Architects - Integration - Multi-Org, Security, JSON, Backup & Restore
Ladies Be Architects - Integration - Multi-Org, Security, JSON, Backup & Restore
 
1-informatica-training
1-informatica-training1-informatica-training
1-informatica-training
 
Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -
 
Semantic Technologies for Enterprise Cloud Management
Semantic Technologies for Enterprise Cloud ManagementSemantic Technologies for Enterprise Cloud Management
Semantic Technologies for Enterprise Cloud Management
 
Co 4, session 2, aws analytics services
Co 4, session 2, aws analytics servicesCo 4, session 2, aws analytics services
Co 4, session 2, aws analytics services
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Apache atlas sydney 2017-v4
Apache atlas   sydney 2017-v4Apache atlas   sydney 2017-v4
Apache atlas sydney 2017-v4
 
Portal and Intranets
Portal and Intranets Portal and Intranets
Portal and Intranets
 
Common Data Model - A Business Database!
Common Data Model - A Business Database!Common Data Model - A Business Database!
Common Data Model - A Business Database!
 
IncQuery Server for Teamwork Cloud - Talk at IW2019
IncQuery Server for Teamwork Cloud - Talk at IW2019IncQuery Server for Teamwork Cloud - Talk at IW2019
IncQuery Server for Teamwork Cloud - Talk at IW2019
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
 
Delivering a Linked Data warehouse and realising the power of graphs
Delivering a Linked Data warehouse and realising the power of graphsDelivering a Linked Data warehouse and realising the power of graphs
Delivering a Linked Data warehouse and realising the power of graphs
 
Big SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on HadoopBig SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on Hadoop
 
Informatica intro
Informatica introInformatica intro
Informatica intro
 

Recently uploaded

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 

Recently uploaded (20)

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 

Egeria and graphs

  • 1. Egeria and Graphs Graham Wallis, August 2020 Graham Wallis is an open-source developer and maintainer on the ODPi Egeria project. He has worked with graph- related technologies for about 5 years, so he doesn’t have all the answers but hopes you find this presentation interesting and useful.
  • 2. Metadata, Sharing & Automation
  • 3. An example of open, standardized metadata 4
  • 4. • In a commercial setting, metadata is used to describe: • database records and schemas, files and file formats, documents, models, … • systems, applications, processes such as ETL, archiving, analytics, … • business concepts as glossaries of terms and their semantic assignments • In typical commercial organizations: • the data landscape is vast and distributed • data is dispersed across multiple data lakes managed by different parts of an organization • multiple tools from different vendors are used to load, access and manage the data • multiple tools are used to analyze the data Commercial metadata and governance
  • 5. 6 Today’s reality – separate tools, disjointed metadata
  • 6. • Organizations need a business-friendly logical interface to the data landscape. This implies that the organization develop a common business vocabulary or glossary. • Organizations need governance of data to be driven by the metadata, requiring that the metadata is accurate and up-to-date. • The maintenance of metadata must be automated to scale to the volumes and variety of data involved in modern business. • The metadata must be available across different tools and platforms so that processing engines can build capability around it. • Wherever possible, discovery and maintenance of metadata must be an integral part of tools that access, change and move information. • Metadata access must become open and remotely accessible so that tools from different vendors can work with metadata located on different platforms. • This implies unique identifiers for metadata elements, some level of standardization in the types and formats for metadata and standard interfaces for accessing and manipulating metadata. Commercial metadata and governance
  • 7. The ODPi Egeria project
  • 8. • ODPi Egeria is an open source project dedicated to making metadata open and automatically exchanged between tools and data platforms • Egeria provides an Apache 2.0 licensed platform to enable users and vendors to create an open ecosystem for metadata • Egeria arose from several years work by Mandy Chessell (IBM), Ferd Scheepers (ING Bank) and others, on data lakes, data governance & common information models • Egeria is hosted by the Linux Foundation ODPi project (Open Data Platform Initiative): egeria.odpi.org • The code is on Github: github.com/odpi/egeria • The Egeria community includes IBM, ING Bank, Manta and SAS plus contributions and interest from other organizations and individuals. Egeria Project & Community
  • 9. 10 Today’s reality – separate tools, disjointed metadata
  • 10. 11 Egeria enables exchange of metadata between tools from different vendors Open and Unified Metadata Development DevOps Data Science
  • 11. Egeria Servers and Cohorts Cohort Cohort External Tool/Repository Egeria Server Egeria Server Egeria Server Egeria Server Egeria Repository Egeria Repository Egeria Server A server may have a repository or may support a given tool or external repository. A server may join multiple cohorts. Applications Applications
  • 13. Graphs in Metadata Business metadata Structural metadata for a data store EMPNAME EMPNO JOBCODE SALARY EMPLOYEE RECORD Employee Work Location Annual Salary Job Title Employee Id Employee Name Hourly Pay Rate Manager Compensation Plan HAS-A HAS-A HAS-A HAS-A HAS-A HAS-A IS-A IS-A SensitiveIS-A Data • The interconnected nature of metadata forms a graph • The business concepts associated with the data form a graph of terms and classifications
  • 14. Graphs in Metadata • Different tools or databases gives rise to graphs at both business and technical levels
  • 15. Querying across graphs… • Enterprise integration and queries require that we can query across graphs and between business and technical metadata
  • 16. Parallels between graphs… • The graph of artifacts in a Discovery Analysis Report mirrors the graph of schema elements
  • 17. • As seen from the foregoing examples (of different tools, business and technical metadata, discovery analysis reports) there are many graph-like structures in metadata • Egeria is therefore based on graphs and graph-like approaches; it includes a graph repository and graph- based tooling • The Open Metadata Types form graphs - an entity type inheritance graph and a graph of the possible relationship types for an entity type • We also see graphs in glossary structure (glossary, terms, categories) as well in the semantic assignment of glossary terms to metadata instances • Metadata instances (entities, relationships and classifications) are organized as graphs and can be queried using graph traversals Graphs in Egeria
  • 18. • Within the Egeria integration UI: • The Type Explorer can be used to visualize entity type inheritance and entity type relationship graphs • The Repository Explorer can be used to explore graphs of entities and relationships across repositories • The Admin UI shows the deployed topology of Egeria platforms, servers and cohorts Egeria UI graph visualizations
  • 19. • Egeria can transparently federate metadata from multiple repositories, giving rise to a distributed graph • Entities in different repositories can be related by a relationship in either repository or a further repository • Entities and relationships in different repositories can be queried and traversed as if they were collocated • Egeria’s federation capability avoids the need to move or copy metadata • Ownership remains with the current owner • There is no duplication, or risk of updates being applied to a copy of the metadata • Egeria can create a local reference copy of a remote instance, as a locally cached copy, but ownership of the metadata remains with the tool and repository that created it. Updates are only permitted on the owner’s original, not on the copies • When an Egeria user accesses a remote instance, the Egeria server will register interest in the remote instance • If the remote instance is modified or deleted, any registered Egeria servers receive events, delivered to the access services that triggered the interest • Ownership of an instance can be transferred if necessary Egeria federation (a distributed graph)
  • 20. Egeria distributed graph model 21 Database Column Glossary Term OMAG Server 1 OMAG Server 2 § A pair of entities may be stored in separate servers
  • 21. Egeria distributed graph model – using reference copies 22 Database Column Glossary Term Glossary Term Meaning OMAG Server 1 OMAG Server 2 § One entity could be replicated to the other server, as a ‘reference copy’ § The original Glossary Term on OMAG Server 2 is still the authoritative instance; the copy cannot be updated § A relationship could be defined between the local DB column and the reference copy of the Glossary Term Reference Copy
  • 22. Egeria distributed graph model – using reference copies 23 Database Column Glossary Term OMAG Server 1 OMAG Server 3 OMAG Server 2 Database Column Glossary Term Meaning § Alternatively, both entities could be replicated to a third server, as reference copies § The originals are still the authoritative instances § A relationship could be defined between the local reference copies
  • 23. Egeria distributed graph model – using entity proxies 24 Database Column Glossary Term OMAG Server 1 OMAG Server 3 OMAG Server 2 Meaning Database Column Glossary Term § Instead of replication, the third server could relate the original entities using entity proxies Entity Proxy
  • 24. The Egeria Graph Repository
  • 25. Egeria OMRS Repositories 26 Search Open Metadata Access Services Open Metadata Repository Services • Egeria includes a choice of metadata repositories, which can be used as additional metadata stores that can plug functional gaps between other tools and repositories and can provide local access • One of the Egeria repositories is a graph repository, which lends itself to the types of queries we saw earlier
  • 26. Egeria Open Metadata Repository Services (OMRS) • The OMRS defines a protocol and a set of connectors • The Enterprise Connector performs cohort-wide operations – this includes issuing queries to the cohort and when metadata is replicated from another server it can use the local connector and repository to cache it for availability and performance • The Local Connector performs local operations and provides a default Event Mapper that enables events relating to local operations to be sent to the cohort • The Repository Connector interfaces to a specific repository – and optionally, may be accompanied by a custom Event Mapper • Egeria provides two built-in repositories and there are connectors to other repositories • The interface to a repository connector is the MetadataCollection API, described on the next slide OMRS Enterprise Connector OMRS Local Connector & Event Mapper OMRS Repository Connector Repository Cohort MetadataCollection API
  • 27. The OMRSMetadataCollection interface • The interface to an Egeria repository is the OMRSMetadataCollection interface • It includes groups of operations: • Group 1: Identification of the metadata repository - metadataCollectionId • Group 2: Type definitions (types, attributes) - add, find, get, remove, … • Group 3: Find instances (entities, relationships) - get, find, graph-queries, … • Group 4: Maintain instances (entities, relationships) - addEntity, deleteEntity, … • Group 5: Change control information (entities, relationships) - reIdentify, reHome, … • Group 6: Maintenance of reference (replica) copies – save, purge, refresh,…
  • 28. Egeria Local Graph Repository • The Egeria distribution includes a persistent repository and a non-persistent repository • The persistent repository is a graph repository built on JanusGraph, an open-source graph database project, hosted by the Linux Foundation • http://janusgraph.org • http://github.com/janusgraph/janusgraph • The built-in graph repository provides an OMAG Server with a persistent metadata store and is built using Egeria’s ‘plugin’ pattern • The graph repository can store instances of metadata owned by the local server • It can also store reference copies of metadata instances replicated to the local server • It also supports relationship instances that refer to entity proxy instances • Other graph databases are available, and Egeria’s pluggable connector architecture enables the creation of repository connectors for different databases. • The Conformance Test Suite provides a set of automated tests that can be run against a repository to assess whether it correctly implements the Egeria types and interfaces
  • 29. Anatomy of the local graph repository 30 Graph Metadata Store JanusGraph persistence search OMAG Server OMAS – access services OMRS Enterprise Connector OMRS topics in out Apache Tinkerpop OMRS Local Connector & Event Mapper OMRS Graph Connector JanusGraph Management Cohort
  • 30. Graph Repository configurations • The first release of the Egeria Graph Repository used BerkeleyDB and Lucene as embedded persistence and indexing backends. This provides a relatively simple quick-start configuration, especially good for development and testing and sufficient for some production uses. • In production it may be desirable (or essential) to use a different persistence backend (e.g. Cassandra) or indexing backend (e.g. Elastic). • ING Bank added to the configuration of the Graph Repository to enable the use of (remote) Cassandra and Elastic services. • Discussions have started about work to add a remote JanusGraph Server configuration in order to provide an HA option.
  • 31. Graph Repository components • GraphOMRSRepositoryConnector - implements the open connector framework interface • GraphOMRSRepositoryConnectorProvider – implements the mechanism for brokering a connector • GraphOMRSMetadataCollection – top level interface supporting type and instance operations • GraphOMRSMetadataStore – implements the MetadataCollection using a graph database • GraphOMRSGraphFactory – creation, schema, indexing - encapsulates JanusGraph-specifics • Mappers – convert between OMRS objects and graph vertices and edges • GraphOMRSEntityMapper • GraphOMRSRelationshipMapper • GraphOMRSClassificationMapper • Plus various utility classes – error codes, audit logging, constants and utility methods https://github.com/odpi/egeria/ See open-metadata-implementation/adapters/open-connectors/repository-services-connectors/ open-metadata-collection-store-connectors/graph-repository-connector
  • 32. To use the Egeria Graph Repository • Configure the OMAG Server with repository-mode = ‘local-graph-repository’ • e.g. HTTP POST http://localhost:8080/open-metadata/admin- services/users/{username}/servers/{servermame}/local-repository/mode/local-graph-repository • Start the OMRS instance in the server • e.g. HTTP POST http://localhost:8080/open-metadata/admin- services/users/{username}/servers/{servername}/instance • If using the embedded configuration of Berkeley DB for persistence and Lucene for indexing, when OMRS starts, the graph repository auto-creates a JanusGraph database – including: • Persistence backend • Search backend • Graph schema • Search indexes • If using alternative backends for persistence or indexing, ensure that they are correctly configured and available before starting the OMAG Server.
  • 33. Graph Schema The MetadataCollection interface is the formal interface to an Egeria repository. Whilst it is possible to look at the graph directly (e.g. using Gremlin console): Please don’t rely on the schema – it is likely to evolve Type data: • The Graph Repository does not store type definitions • It delegates all type operations to the Repository Content Manager Instance data: • The Egeria Graph Repository stores instance data, using a JanusGraph schema that has: • vertices for entities and classifications • edges for relationships and classifiers
  • 34. Instance representations in the OMRS 35 Relationship Instance Entity Instance Entity Instance Classification Instance Classification Instance Primitives Enums Collections Attributes Attributes Attributes Attributes Primitives Enums Collections Primitives Enums Collections Primitives Enums Collections Primitives Enums Collections Attributes
  • 35. Graph mapping – vertices and edges Relationship Instance Entity Instance Entity Instance Classification Instance Classification Instance Primitives Enums Collections Attributes Attributes Attributes Attributes Primitives Enums Collections Primitives Enums Collections Primitives Enums Collections Primitives Enums Collections Attributes Classification Instance Entity Instance Relationship Instance Attributes Primitives Enums Collections AttributesAttributes Primitives Enums Collections Primitives Enums Collections label : “classification” label : “entity” label : “relationship” Properties Properties Properties vertex label : “classifier” Properties OMRSinstance representation Graphschema element vertex edge edge
  • 36. Graph mapping – vertices and edges Properties Properties Properties Properties Properties relationship classifier classifier entity entity classification classification Relationship Instance Entity Instance Entity Instance Classification Instance Classification Instance Primitives Enums Collections Attributes Attributes Attributes Attributes Primitives Enums Collections Primitives Enums Collections Primitives Enums Collections Primitives Enums Collections Attributes
  • 37. Local instances, reference copies and proxies 38 • The graph contains one vertex per entity – whether the entity is local, a reference copy or a proxy • If the entity has an associated classification, the classification is stored as a vertex, with an edge from the entity vertex to the classification vertex • The graph contains one edge per relationship – whether the relationship is local or a reference copy • Reference Copies • The metadataCollectionId core attribute is set to the ‘guid’ of the home repository • Entity Proxy objects • Each entity instance has a vertex property of type Boolean, to indicate whether the instance is a proxy
  • 38. Metadata Collection ‘graph-query’ methods • There are 4 sub-graph query methods: • getRelatedEntities() - optional • Returns the entity and its immediate neighbors • getEntityNeighborhood() - optional • Returns the entity and its neighbors up to the depth specified by the ‘level’ parameter • getLinkingEntities() - optional • Returns the relationships and intermediate entities that connect the specified pair of entities • getRelationshipsForEntity() - mandatory • Returns relationships associated with entity, optionally filtered by relationship type and status level = 2
  • 39. Graph Repository – supported functions • The GraphRepository supports most of the OMRS MetadataCollection API, including: • Save and purge of reference copies • Use of entity proxies • Delete and restore as well as purge – delete is a soft, restorable delete; purge is permanent • Re-type of instances • Re-identify of instances • Re-home of instances • The four ‘graph queries’ – described on the previous slide • The ‘find’ methods – find..ByProperty, find..ByPropertyValue, findEntityByClassification • The Graph Repository does not (yet) support: • Historic queries – find methods that specify an asOfTime parameter • Undo of previous instance updates
  • 40. • Egeria project website: egeria.odpi.org • Github: github.com/odpi/egeria • Slack: https://slack.odpi.org/ More information…