Professor for Data Science & Digital Libraries à Leibniz Universität Hannover
6 Nov 2018•0 j'aime•1,851 vues
1 sur 48
Cognitive data
6 Nov 2018•0 j'aime•1,851 vues
Télécharger pour lire hors ligne
Signaler
Données & analyses
Slides of my talk at OSLCfest in Stockholm Nov 6, 2019
Video recording of the talk is available here:
https://www.facebook.com/oslcfest/videos/2261640397437958/
4. --- VERTRAULICH ---
We can make things
more intuitive
Picture: The illustrated recipes
of lucy eldridge
http://thefoxisblack.com/2013/
07/18/the-illustrated-recipes-
of-lucy-eldridge/
10. Page 10
Machine Learning and Big Data
http://www.spacemachine.net/views/2016/3/datasets-over-algorithms
AI is not just the next hype after Big Data, Big Data is the
reason why we have AI!
12. Linked Data Principles
Addressing the neglected third V (Variety)
1. Use URIs to identify the “things” in your data
2. Use http:// URIs so people (and machines) can look them up on the web
3. When a URI is looked up, return a description of the thing in the W3C
Resource Description Format (RDF)
4. Include links to related things
http://www.w3.org/DesignIssues/LinkedData.html
12
[1] Auer, Lehmann, Ngomo, Zaveri: Introduction to Linked Data and Its Lifecycle on the Web. Reasoning Web 2013
13. Page 13
1. Graph based RDF data model consisting of S-P-O statements (facts)
RDF & Linked Data in a Nutshell
OSLCFest
dbpedia:Stockholm
05.11.2018
KTH
conf:organizes
conf:starts
conf:takesPlaceIn
2. Serialised as RDF Triples:
KTH conf:organizes OSLCFest .
OSLCFest conf:starts “2018-11-05”^^xsd:date .
OSLCFest conf:takesPlaceAt dbpedia:Stockholm .
3. Publication under URL in Web, Intranet, Extranet
Subject Predicate Object
14. Page 14
Creating Knowledge Graphs with RDF
Linked Data
located in
label
industry
headquarters
full nameDHL
Post Tower
162.5 m
Bonn
Logistics Logistik
DHL International GmbH
height
物流
label
15. Page 15
Graph consists of:
Resources (identified via URIs)
Literals: data values with data type (URI) or language (multilinguality integrated)
Attributes of resources are also URI-identified (from vocabularies)
Various data sources and vocabularies can be arbitrarily mixed and meshed
URIs can be shortened with namespace prefixes; e.g. dbp: → http://dbpedia.org/resource/
RDF Data Model (a bit more technical)
gn:locatedIn
rdfs:label
dbo:industry
ex:headquarters
foaf:namedbp:DHL_International_GmbH
dbp:Post_Tower
"162.5"^^xsd:decimal
dbp:Bonn
dbp:Logistics
"Logistik"@de
"DHL International GmbH"^^xsd:string
ex:height
"物流"@zh
rdfs:label
rdf:value
unit:Meter
ex:unit
16. Vocabularies – Breaking the mold!
• Semantic data virtualization allows for continuous expansion and enhancement of data and
metadata across data sources without loosing the overall perspective
Relational
data models
1:1 Relation between
Data Model und Application
Graph based
data model
Subject
Predicate
Object / Subject
Predicate
Object / Subject
1:n Relation between
Data Model and Application
17. RDF mediates between different Data Models & bridges between
Conceptual and Operational Layers
Id Title Screen
5624 SmartTV 104cm
5627 Tablet 21cm
Prod:5624 rdf:type Electronics
Prod:5624 rdfs:label “SmartTV”
Prod:5624 hasScreenSize “104”^^unit:cm
...
Electronics
Vehicle
Car Bus Truck
Vehicle rdf:type owl:Thing
Car rdfs:subClassOf Vehicle
Bus rdfs:subClassOf Vehicle
...
Tabular/Relational Data
Taxonomic/Tree Data
Logical Axioms / Schema
Male rdfs:subClassOf Human
Female rdfs:subClassOf Human
Male owl:disjointWith Female
...
Sören Auer 17
18. 18
Engineering Manufactur. Logistics Marketing. . .
Parts of data are being curated, duplicated, annotated and simply
changed over time, making reconciliation and interpretation a challenge
Perspectives on data turn into silos
20. Page 20
The Trinity of Semantic Integration
Knowledge Graphs
• Complex fabric of concepts
& relationships
• Focus on heterogenous,
multi-domain knowledge
representation
Data Spaces
• Community of
organizations agreeing on
standards for data access/
security/ semantics/
governance/ licenses
• Focus on data sharing &
exchange
Semantic Data Lakes
• Storage facility for
enterprise/research data
• Use Big Data (HDFS)
management
• Focus on scalable data
access
Use in a single organization Intra-organizational use
21. Page 21
• Fabric of concept, class, property, relationships, entity descriptions
• Uses a knowledge representation formalism
(typically RDF, RDF-Schema, OWL)
• Holistic knowledge (multi-domain, source, granularity):
• instance data (ground truth),
• open (e.g. DBpedia, WikiData), private (e.g. supply chain data),
closed data (product models),
• derived, aggregated data,
• schema data (vocabularies, ontologies)
• meta-data (e.g. provenance, versioning, documentation licensing)
• comprehensive taxonomies to categorize entities
• links between internal and external data
• mappings to data stored in other systems and databases
Knowledge Graphs – A definition
Smart Data for
Machine Learning
23. Page 23
Search Engine Optimization & Web-Commerce
Schema.org used by >20% of Web sites
Major search engines exploit semantic descriptions
Pharma, Lifesciences
Mature, comprehensive vocabularies and ontologies
Billions of disease, drug, clinical trial descriptions
Digital Libraries
Many established vocabularies (DublinCore, FRBR,
EDM)
Millions of aggregated from thousands of memory
institutions in Europeana, German Digital Library
Emerging Knowledge Graphs & Data Spaces
26. Management
Accounting
Risk Management
Regulatory Reporting
Treasury MarketingAccounting
Corporate
Memory
Inbound
Data Sources
Outbound and
Consumption
Inbound Raw Data Store
Knowledge Graph for Meta Data, KPI Definition and Data Models
Frontend to Access Relationship and KPI Definition /
Documentation
Frontend to Access (ad hoc) Reports
Outbound Data Delivery to Target
Systems
Big Data DWH-
Infrastructure
High Level Architecture
Corporate Memory
27. Integration via Knowledge Graph and
Semantic Data Models
27
Knowledge Graph
(RDF)
XML
EDI
CSV
iDoc
RDF
JSON
XML
EDI
CSV
iDoc
RDF
JSON
Supplier OnBoarding cost/time reduction due to rich and flexible pivot format
OEMSupplier
29. Ingestion / Cataloging
• Cataloging of datasets and
vocabularies
• Rich meta data model
• Automatic profiling of datasets
• DataLake (HDFS) integration
• Extraction of metadata
• Continuous monitoring for new
versions and structural changes
29
Ingestion
Cataloging
Mapping Discovery Linking Selection Analytics, Experiments
32. Mapping
• Sophisticated mapping management
• Mapping towards semantic vocabularies
(lifting)
• Self documentation of data (data
dictionary)
• Normalization of data
• Mapping suggestions
• Mapping reuse based on data profiling
• Advanced mapping suggestions
• machine learning
• data fingerprinting
32
Ingestion
Cataloging
Mapping Discovery Linking Selection Analytics, Experiments
33. Discovery
• Calculation of dataset
relatedness / similarity
• Visual exploration of
data neighborhood
• Similarity measure based
on profiling and mapping
• Similarity measure based
on data fingerprinting
33
Ingestion
Cataloging
Mapping Discovery Linking Selection Analytics, Experiments
34. Linking
• Linking based on expressive rule
trees
• Interactive machine learning of
linkage rules
• Continuous integration of gold
standard for quality assurance
• Data fusion support
34
Ingestion
Cataloging
Mapping Discovery Linking Selection Analytics, Experiments
42. Integration Millions of Metadata
Records from >2000 Memory
Institutions for the German Digital
Library
A Cultural Heritage Data Space
43. --- VERTRAULICH ---
43
Dataspace with
• 2000 memory institutions in Germany alone
• Common semantic data model: EDM
• Common data governance: CC0
• Common access scheme: OAI-PMH
47. Page 47
Hybrid AI – combination of smart data (knowledge graphs) and smart analytics
Distributed semantic technologies – knowledge representation using vocabularies,
ontologies
Question Answering
• Open Question Answering architecture – flexible, knowledge-based integration
architecture for QA components and pipelines
• Dialogue Systems - combination of language models and goal-driven question
answering
Integration with Crowdsourcing
Knowlege Graphs, Semantic Data Lakes
Robotics – usage of semantics for actuation
Agile Interoperability – leveraging community driven vocabulary development
Cognitive Data challenges where we can
make a difference
Systematic Enterprise
Linked Data Framework
(GDPR is a driver)
Die Z3 war der erste funktionsfähige Digitalrechner weltweit und wurde 1941 von Konrad Zuse in Zusammenarbeit mit Helmut Schreyer in Berlin gebaut. Die Z3 wurde in elektromagnetischer Relaistechnik mit 600 Relais für das Rechenwerk und 1400 Relais für das Speicherwerk ausgeführt.
Longquan stoneware incense burner, China, 12th-13th century AD. Part of the Percival David Collection of Chinese Ceramics.
Breakthroughs in AI come after data is available, not after algorithmic discoveries
If you think about AI, think about the data, not algorithms
Fun fact: most major AI companies share their internal deep learning toolkits
Map the silos to their domain appropriate schemas
Link the nodes (Linked Data)
The schema can be virtual – multiple schemas/views may be appropriate
Map the silos to their domain appropriate schemas
Link the nodes (Linked Data)
The schema can be virtual – multiple schemas/views may be appropriate
You could argue: That MDM & BI Hub-Spoke systems have had the objective of the “Solution Tomorrow”, but were never ableto fulfill on this promise due to their reliance on relational paradigm that prevent them from having the flexibility to truly providean unlimited amount of perspectives on the same data. MDM & BI Hubs in the opposite have required all perspectives to be alignedwith the one single truth that was physically incorporated in the backbone and paradigm of these respective approaches.
Black current features
Gray future / planned features
Black current features
Gray future / planned features
Black current features
Gray future / planned features
Black current features
Gray future / planned features
Plattform Industrie 4.0: Gemeinschaftsprojekt der Wirtschaftsverbände BITKOM (IuK), VDMA (Maschinen/Anlagen), ZVEI (Elektro/Elektronik).
Eine gleichnamige Plattform gibt’s auch in Österreich.