Enterprise knowledge graphs

Enterprise Knowledge Graphs
Sören Auer
https://www.eccenca.com

The three Big Data „V“ – Variety is often neglected
Quelle: Gesellschaft für Informatik
Sören Auer 2

Linked Data Principles
Addressing the neglected third V (Variety)
1. Use URIs to identify the “things” in your data
2. Use http:// URIs so people (and machines) can
look them up on the web
3. When a URI is looked up, return a description of
the thing (in RDF format)
4. Include links to related things
http://www.w3.org/DesignIssues/LinkedData.html
3
[1] Auer, Lehmann, Ngomo, Zaveri: Introduction to Linked Data and Its Lifecycle on the Web. Reasoning Web 2013

Linked (Open) Data: The RDF Data Model
4
RDF = Resource Description Framework
located in
label
industry
headquarters
full nameDHL
Post Tower
162.5 m
Bonn
Logistics Logistik
DHL International GmbH
height
物流
label
Sören Auer

RDF Data Model (a bit more technical)
– Graph consists of:
• Resources (identified via URIs)
• Literals: data values with data type (URI) or language (multilinguality integrated)
• Attributes of resources are also URI-identified (from vocabularies)
– Various data sources and vocabularies can be arbitrarily mixed and meshed
– URIs can be shortened with namespace prefixes; e.g. dbp: → http://dbpedia.org/resource/
gn:locatedIn
rdfs:label
dbo:industry
ex:headquarters
foaf:namedbp:DHL_International_GmbH
dbp:Post_Tower
"162.5"^^xsd:decimal
dbp:Bonn
dbp:Logistics
"Logistik"@de
"DHL International GmbH"^^xsd:string
ex:height
"物流"@zh
rdfs:label
rdf:value
unit:Meter
ex:unit

RDF mediates between different Data Models &
bridges between Conceptual and Operational Layers
Id Title Screen
5624 SmartTV 104cm
5627 Tablet 21cm
Prod:5624 rdf:type Electronics
Prod:5624 rdfs:label “SmartTV”
Prod:5624 hasScreenSize “104”^^unit:cm
...
Electronics
Vehicle
Car Bus Truck
Vehicle rdf:type owl:Thing
Car rdfs:subClassOf Vehicle
Bus rdfs:subClassOf Vehicle
...
Tabular/Relational Data
Taxonomic/Tree Data
Logical Axioms / Schema
Male rdfs:subClassOf Human
Female rdfs:subClassOf Human
Male owl:disjointWith Female
...
Sören Auer 6

© Fraunhofer · Seite 7
Vocabulary Example
Vocabulary Schema Instantiation
PostTower rdf:type Building
PostTower locatedIn dbpedia:Bonn
PostTower height "162.5"^^meter
located in
label
industry
headquarters
full nameDHL
Post Tower
162.5 m
Bonn
Logistics Logistik
DHL International GmbH
height
物流
label
Class: Company
Property Expected type
inIndustry Industry
fullName String
headquarter Building
Class: Building
Property Expected type
locatedIn Industry
height unit:meter
RDFRepresentationVisualRepresentation
Company rdf:type rdfs:Class
Building rdf:type rdfs:Class
inIndustry rdf:type rdfs:Property
inIndustry rdfs:domain Company
inIndustry rdfs:range Industry
headquarter rdf:type rdfs:Property
headquarter rdfs:domain Company
headquarter rdfs:range Building
DHL rdf:type Company
DHL fullName "DHL Int. GmbH"
DHL inIndustry Logistics
DHL headquarter PostTower

Semantic Web Layer Cake 2001
http://www.w3.org/2001/10/03-sww-1/slide7-0.html
• Monolithic based on XML
• Focus on heavyweight
Semantic (Ontologies, Logic,
Reasoning)

© Fraunhofer
The Semantic Web Layer Cake 2015 –
Bridging between Big & Smart Data
Unicode URIs
XML JSON CSV RDB HTML
RDF
RDF/XML JSON-LD CSV2RDF R2RML RDFa
RDF Data
Shapes
RDF-Schema
Vocabularies
OntologienSKOS Thesauri
LogikSWRL Regeln
SPARQL
(Accesscontrol),Signatur,
Encryption(HTTPS/CERT/DANE),
• Lingua Franca of Data
integration with many
technology interfaces (XML,
HTML, JSON, CSV, RDB,…)
• Focus on lightweight
vocabularies, rules,
thesauri etc.
• Less “invasive”

© Fraunhofer
RDF - the Lingua Franca of Data Integration
• RDF is simple
• We can easily encode and combine all kinds of data models (relational, taxonomic,
graphs, object-oriented, …)
• RDF supports distributed data and schema
• We can seamlessly evolve simple semantic representations (vocabularies) to more
complex ones (e.g. ontologies)
• Small representational units (URI/IRIs, triples) facilitate mixing and mashing
• RDF can be viewed from many perspectives: facts, graphs, ER, logical axioms,
graphs, objects
• RDF integrates well with other formalisms - HTML (RDFa), XML (RDF/XML), JSON
(JSON-LD), CSV, …
• Linking and referencing between different knowledge bases, systems and platforms
facilitates the creation of sustainable data ecosystems
10

© Fraunhofer
Successful application domains
Linked Data & Semantic Integration
Search Engine Optimization & Web-Commerce
 Schema.org used by >20% of Web sites
 Major search engines exploit semantic desciptions
Pharma, Lifesciences
 Mature, comprehensive vocabularies and ontologies
 Billions of disease, drug, clinical trial descriptions
Digital Libraries
 Many established vocabularies (DublinCore, FRBR, EDM)
 Millions of aggregated from thousends of memory
institutions in Europeana, German Digital Library

© Fraunhofer-Institut für Intelligente
Analyse- und Informationssysteme IAIS
The Web evolves into a Web of Data
Sören Auer 12
Linked Open Data
Facebook
Open Graph

Knowledge Graphs – A definition
• Fabric of concept, class, property, relationships,
entity descriptions
• Uses a knowledge representation formalism
(typically RDF, RDF-Schema, OWL)
• Holistic knowledge (multi-domain, source, granularity):
• instance data (ground truth),
• open (e.g. DBpedia, WikiData), private (e.g. supply chain data), closed
data (product models),
• derived, aggregated data,
• schema data (vocabularies, ontologies)
• meta-data (e.g. provenance, versioning, documentation licensing)
• comprehensive taxonomies to categorize entities
• links between internal and external data
• mappings to data stored in other systems and databases

Knowledge Graph Challenges & Opportunities
Knowledge graphs typically cover
• Multiple domains
• Various levels of granularity
• Data from multiple sources
• Various degrees of structure
Challenges
• Quality
• Coherence
• Co-evolution
• Update propagation
• Curation & interaction
Opportunities
• Background knowledge for various applications (e.g. question answering, data
integration, machine learning)
• Facilitate intra-organizational data sharing and exchange (data value chains)
14

Comparison of various enterprise data integration
paradigms
Paradigm Data
Model
Integr.
Strategy
Conceptual/
operational
Hetero-
geneous
data
Intern./
extern.
data
No. of
sources
Type of
integr.
Domain
coverage
Se-
mantic
repres.
XML
Schema
DOM trees LaV operational   medium both medium high
Data
Warehouse
relational GaV operational - partially medium physical small medium
Data Lake various LaV operational   large physical high medium
MDM UML GaV conceptual - - small physical small medium
PIM / PCS trees GaV operational partially partially - physical medium medium
Enterprise
search
document - operational  partially large virtual high low
EKG RDF LaV both   medium both high very high
[1] Michael Galkin, Sören Auer, Simon Screrri: Enterprise Knowledge Graphs: A Survey.
Submitted to 37th International Conference on Information Systems. 2016.

Knowledge Graph Technology
16

Adding a Semantic Layer to Data Lakes
17
Management
Accounting
Marketing Sales SupportR&D
Semantic Data Lake
• central place for
model, schema and
data historization
• Combination of Scale
Out (cost reduction)
and semantics
(increased control &
flexibility)
• grows incrementally
(pay-as-you-go)
Inbound
Data Sources
Outbound and
Consumption
Inbound Raw Data Store
Data Lake (order of magnitude cheaper scalable data store)
Knowledge Graph for Relationship Definition and Meta Data
Frontend to Access Relationship and KPI Definition
/ Documentation
Frontend to Access (ad hoc) Reports
Outbound Data Delivery to
Target Systems
JSON-LD CSVW R2RMLXML2RDF
© eccenca.com See also https://www.eccenca.com/en/products-corporate-memory.html

W3C R2RML – Relational to RDF Mapping
Sören Auer 18
R2RML: RDB to RDF Mapping Language, W3C Recommendation 27 September 2012
Editors: Souripriya Das, Seema Sundara, Richard Cyganiak
http://www.w3.org/TR/r2rml/

Example R2RML Mapping
Sören Auer 19

1. Either resulting RDF knowledge base is materialized in a triple store &
2. subsequently queried using SPARQL
3. or the materialization step is avoided by dynamically mapping an input SPAQRL query
into a corresponding SQL query, which renders exactly the same results as the SPARQL
query being executed against the materialized RDF dump
SPARQLMap – Mapping RDB 2 RDF

Example: Sparqlify
• Rationale: Exploit existing formalisms
(SQL, SPARQL Construct) as much as
possible
• flexible & versatile mapping language
• translating one SPARQL query into
exactly one efficiently executable SQL
query
• Solid theoretical formalization based
on SPARQL-relational algebra
transformations
• Extremely scalable through elaborated
view candidate selection mechanism
• Used to publish 20B triples for
LinkedGeoData
[1] Stadler, Unbehauen, Auer, Lehmann: Sparqlify – Very Large Scale Linked Data Publication from Relational Databases.
[2] Unbehauen, Stadler, Auer: Optimizing SPARQL-to-SQL Rewriting. iiWAS 2013
[3] Auer, et al.: Triplify: light-weight linked data publication from relational databases. WWW 2009
SPARQL
Construct
SQL
View
Bridge

Semantified Big Data Architecture Blueprint
Sören Auer 22
[1] Mami, Scerri, Auer, Vidal: Towards the Semantification of Big Data Technology. DEXA 2016
Datasources Ingestion Storage
Semantic Lifting
with Mappings
Querys
Storing of semantic and semantified data
in Apache Parquet files on HDFS

SEBIDA Implementation Architecture
Sören Auer 23

SEBIDA Evaluation Results
• Loads data faster
• Has quite different query
performance
characteristics –
faster in 5 out of 12
queries,
similar performance in 2,
slower in 5
Sören Auer 24

VOCOL: COLLABORATIVE
VOCABULARY CURATION
ENVIRONMENT
Comprehensive Support for Evolving Vocabularies

Industry 4.0
Semantic Models as Bridge between Shop & Office Floor

Semantic Administrative Shell &
Reference Architecture for Industry 4.0 (RAMI4.0)
Administrative Shell (Verwaltungsschale)
provides a digital identity for arbitrary
Industry 4.0 components (e.g. sensors,
actors/robots) exposing data covering the
whole life-cycle
Reference Architecture for Industry 4.0
(RAMI4.0) provides a conceptual framework
for implementing comprehensive Industry 4.0
scenarios
We have implemented both concepts along
with a number of IEC and ISO standards
in a comprehensive information model
ready to be implemented in productive
environments

VoCol collaborative Development Environment for
Vocabularies
Versioning
Git/Bitbucket
Issue
tracking
GitLab/
GitHub
Syntax
validation
Docu-
mentation
generation
Authoring
Turtle
Visualization
vOWL
Publishing
LOD/Sparql
Integrates a number of tools &
services for different aspects of
vocabulary development
Is centered around Git version
control (or Bitbucket), thus
supporting the branching and
merging of vocabularies
Supports the roundtrip between
• Schema/vocabulary development
• Competency questions
(expressed in SPARQL)
• Example data
 Bridges between conceptual
models and executable code
http://eis.iai.uni-bonn.de/Projects/VoCol.html

Development based on
Git – Version Control
Git is meanwhile the most widely used version control system.
It is a distributed revision control system with an emphasis on speed, data integrity,
and support for distributed, non-linear workflows.
Git was initially designed and developed in 2005 by Linux kernel developers for
Linux kernel development
Git is the basis for a variety of open-source or commercial services and products
such as:
GitHub/Bitbucket - Web-based Git repository hosting service with millions of users
GitLab/Gitolite - open-source Web-based Git repository management platforms
Since TeamFoundationServer release 2013, Microsoft added native support for Git
Git is easily extensible and integratable into arbitrary workflows via GitHooks

VoCol
Collaborative
Vocabulary
Development
Environment
Entry Page

Environment: Dynamic Documentation

VoCol Environment: Dynamic
Visualization

VoCol Environment: Analytics

VoCol
Environment:
Version
Control with
Git/GitHub/Git
Lab/Bitbucket

VoCol Environment:
Integrated SPARQL
Querying, e.g. for
checking
competency
questions

VoCol
Environment:
Direct Turtle
Editing

VoCol
Environment:
Vocabulary
Evolution
Report

INDUSTRIAL DATA SPACE

Vocabulary-based Integration facilitates Data-driven
Businesses
Vocablary

© Fraunhofer ·· Seite 42
Die Arbeiten zum Industrial Data Space sind
komplementär verzahnt mit der Plattform Industrie 4.0
Handel 4.0 Bank 4.0Versicherung
4.0
…Industrie 4.0
Fokus auf die
produzierende
Industrie Smart Services
Übertragung,
Netzwerke
Echtzeitsysteme
Industrial Data Space
Fokus auf Daten
Daten
…

The Industrial Data Space Initiative
Community of >30 large German and European Companies
Pre-competitive, publicly funded innovation project involving 11 Fraunhofer
institutes for developing IDS reference architecture
Current members of the
Industrial Data
Space Association

Bilder: ©Fotolia
Francesco De Paoli, Nmedia, hakandogu
Semantic Data Linking for Enterprise Data Value Chains
Data Lake Pure Internet
centralized, monopolistic
federated, secure, „trusted“,
standard-based
completely dezentral, open,
unsecure
Data management Central Repository Decentral Decentral
Data Ownership Central Decentral Decentral
Data Linking Single provider Federated, on demand Missing
Data Security Bilateral Certified system Bilateral
Market structure Central Provider Role system Unstructured
Transport infrastructure Internet Internet Internet
Industrial
Data Space

Bilder: © Fotolia
77260795 ∙ 73040142
58947296 ∙ 68898041
Basic principles of the Industrial Data Space
On Demand
Vernetzung
Linked Light
Semantics
Security
with Industrial
Data Container
Certified Roles
On Demand
Interlinking

Bildquellen: Istockphoto
Industrial Data Space:
On Demand Interlinking
Service A
Service C
Service E
Service B
Service D
Service G
Service F
Enterprise 4
Enterprise 1
Enterprise 6
Enterprise 2
Enterprise 3
Enterprise 5
All Data stays with its Ownern and are controlled and secured. Only on request for a
service data will be shared. No central platform.

© Fraunhofer · Seite 47 --- VERTRAULICH ---
Upload / Download / Search
Internet
AppsVocabulary
Broker
Clearing
RegistryIndex
App Store
Internal IDS
Connector
Company A Internal IDS
Connector
Company B
External IDS
Connector
External IDS
Connector
Upload
Third Party
Cloud Provider
Download
Upload / Download
© Fraunhofer
IDS Architecture Overview

Big Data is not Just Volume and Velocity
Variety (& Varacity) are key challenges
Linked Data helps dealing with both
• Linked Data life-cycle requires to integrate
and adapt results from a number of
disciplines
– NLP,
– Machine Learning,
– Knowledge Representation,
– Data Management,
– User Interaction
– …
• Applications in a number of domains
– cultural heritage,
– life sciences,
– industry 4.0 / cyber-physical systems,
– smart cities,
– mobility,
– …
Sören Auer 48
Linked Data links not only data but also:
• Various disciplines
• Applications and Use cases

Creating Knowledge
out of Interlinked Data
Thanks for your attention!
Sören Auer
http://www.iai.uni-bonn.de/~auer | http://eis.iai.uni-bonn.de
auer@cs.uni-bonn.de
https://www.eccenca.com

Enterprise knowledge graphs

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Enterprise knowledge graphs

Similaire à Enterprise knowledge graphs (20)

Plus de Sören Auer

Plus de Sören Auer (13)

Dernier

Dernier (20)

Enterprise knowledge graphs

Notes de l'éditeur