Enterprise knowledge graphs use semantic technologies like RDF, RDF Schema, and OWL to represent knowledge as a graph consisting of concepts, classes, properties, relationships, and entity descriptions. They address the "variety" aspect of big data by facilitating integration of heterogeneous data sources using a common data model. Key benefits include providing background knowledge for various applications and enabling intra-organizational data sharing through semantic integration. Challenges include ensuring data quality, coherence, and managing updates across the knowledge graph.
2. The three Big Data „V“ – Variety is often neglected
Quelle: Gesellschaft für Informatik
Sören Auer 2
3. Linked Data Principles
Addressing the neglected third V (Variety)
1. Use URIs to identify the “things” in your data
2. Use http:// URIs so people (and machines) can
look them up on the web
3. When a URI is looked up, return a description of
the thing (in RDF format)
4. Include links to related things
http://www.w3.org/DesignIssues/LinkedData.html
3
[1] Auer, Lehmann, Ngomo, Zaveri: Introduction to Linked Data and Its Lifecycle on the Web. Reasoning Web 2013
4. Linked (Open) Data: The RDF Data Model
4
RDF = Resource Description Framework
located in
label
industry
headquarters
full nameDHL
Post Tower
162.5 m
Bonn
Logistics Logistik
DHL International GmbH
height
物流
label
Sören Auer
5. RDF Data Model (a bit more technical)
– Graph consists of:
• Resources (identified via URIs)
• Literals: data values with data type (URI) or language (multilinguality integrated)
• Attributes of resources are also URI-identified (from vocabularies)
– Various data sources and vocabularies can be arbitrarily mixed and meshed
– URIs can be shortened with namespace prefixes; e.g. dbp: → http://dbpedia.org/resource/
gn:locatedIn
rdfs:label
dbo:industry
ex:headquarters
foaf:namedbp:DHL_International_GmbH
dbp:Post_Tower
"162.5"^^xsd:decimal
dbp:Bonn
dbp:Logistics
"Logistik"@de
"DHL International GmbH"^^xsd:string
ex:height
"物流"@zh
rdfs:label
rdf:value
unit:Meter
ex:unit
6. RDF mediates between different Data Models &
bridges between Conceptual and Operational Layers
Id Title Screen
5624 SmartTV 104cm
5627 Tablet 21cm
Prod:5624 rdf:type Electronics
Prod:5624 rdfs:label “SmartTV”
Prod:5624 hasScreenSize “104”^^unit:cm
...
Electronics
Vehicle
Car Bus Truck
Vehicle rdf:type owl:Thing
Car rdfs:subClassOf Vehicle
Bus rdfs:subClassOf Vehicle
...
Tabular/Relational Data
Taxonomic/Tree Data
Logical Axioms / Schema
Male rdfs:subClassOf Human
Female rdfs:subClassOf Human
Male owl:disjointWith Female
...
Sören Auer 6
20. 1. Either resulting RDF knowledge base is materialized in a triple store &
2. subsequently queried using SPARQL
3. or the materialization step is avoided by dynamically mapping an input SPAQRL query
into a corresponding SQL query, which renders exactly the same results as the SPARQL
query being executed against the materialized RDF dump
SPARQLMap – Mapping RDB 2 RDF
21. Example: Sparqlify
• Rationale: Exploit existing formalisms
(SQL, SPARQL Construct) as much as
possible
• flexible & versatile mapping language
• translating one SPARQL query into
exactly one efficiently executable SQL
query
• Solid theoretical formalization based
on SPARQL-relational algebra
transformations
• Extremely scalable through elaborated
view candidate selection mechanism
• Used to publish 20B triples for
LinkedGeoData
[1] Stadler, Unbehauen, Auer, Lehmann: Sparqlify – Very Large Scale Linked Data Publication from Relational Databases.
[2] Unbehauen, Stadler, Auer: Optimizing SPARQL-to-SQL Rewriting. iiWAS 2013
[3] Auer, et al.: Triplify: light-weight linked data publication from relational databases. WWW 2009
SPARQL
Construct
SQL
View
Bridge
22. Semantified Big Data Architecture Blueprint
Sören Auer 22
[1] Mami, Scerri, Auer, Vidal: Towards the Semantification of Big Data Technology. DEXA 2016
Datasources Ingestion Storage
Semantic Lifting
with Mappings
Querys
Storing of semantic and semantified data
in Apache Parquet files on HDFS
24. SEBIDA Evaluation Results
• Loads data faster
• Has quite different query
performance
characteristics –
faster in 5 out of 12
queries,
similar performance in 2,
slower in 5
Sören Auer 24
48. Big Data is not Just Volume and Velocity
Variety (& Varacity) are key challenges
Linked Data helps dealing with both
• Linked Data life-cycle requires to integrate
and adapt results from a number of
disciplines
– NLP,
– Machine Learning,
– Knowledge Representation,
– Data Management,
– User Interaction
– …
• Applications in a number of domains
– cultural heritage,
– life sciences,
– industry 4.0 / cyber-physical systems,
– smart cities,
– mobility,
– …
Sören Auer 48
Linked Data links not only data but also:
• Various disciplines
• Applications and Use cases
49. Creating Knowledge
out of Interlinked Data
Thanks for your attention!
Sören Auer
http://www.iai.uni-bonn.de/~auer | http://eis.iai.uni-bonn.de
auer@cs.uni-bonn.de
https://www.eccenca.com
Data Lake is a storage repository for big data scale raw data in original data formats.
late binding approach to schema: “Let us decide, when we need it.”
scale out architecture on commodity infrastructure, mostly with HFS/Hadoop/Spark, which gives a huge cost advantage – about factor 10 compared to data warehouses.
Semantic Data Lake = Data Lake + Knowledge Graph
management of structure (vocabularies/schemas, KPIs trees, metadata, …) on top of the Data Lake is performed in a knowledge graph - a complex data fabric representing all kinds of things and how they relate to each other.
A knowledge graph is unique regarding flexibility, multiple views and metadata capabilities.
Based on the Resource Description Framework (RDF) standard and Linked Data principles.
Die Plattform bietet einen sicheren Raum zur Vernetzung
Daten bleiben bei den Enterprise und werden nur bei Bedarf vernetzt
Marktorientiertes Modell ohne Abhängigkeiten von einzelnen Anbietern
Wertschöpfung und Servicee bleiben beim Enterprise
Finanzierung über Servicee, nicht über Werbung oder Datenverkauf
Keine zentrale Datenkrake wie Google, sondern Kontrolle über Daten bleibt bei den Daten-Ownern
Kunde (Endnutzer) ist nicht Produkt, sondern Souverän über seine Daten
Das Ganze ist mehr als die Summe der einzelnen Teile (Ende-zu-Ende-Servicee auf Basis der Daten von mehreren bieten überproportional höheren Mehrwert)
Kein zentraler Datentopf, sondern ein Netz gesunder, sicherer Daten
Governance nicht monopolistisch, sondern föderal
Linked Data approach can help to establish data value chains
Linked Data life-cycle requires to integrate and adapt results from a number of disciplines (NLP, Machine Learning, Knowledge Representation, Data Management)
Applications in a number of domains (cultural heritage, life sciences, industry 4.0 / cyber-physical systems, smart cities, mobility,…)