Direct Style Effect Systems -The Print[A] Example- A Comprehension Aid
Getting Started with Knowledge Graphs
1. Getting Started with Knowledge Graphs
Smart Data Conference
January 30, 2017, San Francisco Bay
Peter Haase
2. 2
Peter Haase
• Interest and experience in
ontologies, semantic
technologies and Linked Data
• PhD in KR and semantic
technologies
• 15 years in academic research
and software development
• Contributor to OWL 2
standard
metaphacts Company Facts
• Founded in Q4 2014
• Headquartered in Walldorf,
Germany
• Currently ~10 people
• Platform for knowledge graph
interaction & application
development
About the Speaker
3. 3
Introduction: What are Knowledge Graphs?
Examples and Applications
• Wikidata
• Cultural Heritage
• Industrial Applications
Standards and Principles
metaphactory Knowledge Graph Platform
Hands-on Exercises
Agenda
6. 6
• We need a structured and formal representation of
knowledge
• We are surrounded by entities, which are connected by
relations
• Graphs are a natural way to represent entities and their
relationships
• Graphs can be managed efficiently
Why (Knowledge) Graphs?
7. 7
A (very small) Knowledge Graph
http://www.w3.org/TR/2014/NOTE-rdf11-primer-20140225/example-graph.jpg
8. 8
• Semantic descriptions of entities and their relationships
• Uses a knowledge representation formalism
(Focus here: RDF, RDF-Schema, OWL)
• Entities: real world objects (things, places, people) and
abstract concepts (genres, religions, professions)
• Relationships: graph-based data model where
relationships are first-class
• Semantic descriptions: types and properties with a well-
defined meaning (e.g. through an ontology)
• Possibly axiomatic knowledge (e.g. rules) to support
automated reasoning
What are Knowledge Graphs?
19. 19
Wikipedia page A query against Wikipedia
Query the Knowledge of Wikipedia like a Database
19
20. 20
• Collecting structured data. Unlike the
Wikipedias, which produce encyclopedic
articles, Wikidata collects data, in a
structured form.
• Collaborative. The data in Wikidata is
entered and maintained by Wikidata editors,
who decide on the rules of content creation
and management in Wikidata supporting the
notion of verifiability.
• Free. The data in Wikidata is published
under the Creative Commons
• Large.
• 25 million entities
• 130 million statements
• 130 million labels
• 350 languages
• >1500 million triples
Wikidata
21.
22. 22
• Build your applications using Wikidata
• Free corpus of structured knowledge
• Easily accessible and standards-based
• See http://query.wikidata.org/
• Contextualize your enterprise data
• Wikidata provides stable identifiers into the open data world
• Seamless integration of private data with open data
• Enrich Wikidata with your data
• Contribute your data to Wikidata
• Link to your own data, make it visible
• Examples:
• Open biomedical databases – Wikidata as a central hub
• Cultural heritage
Use Cases for the Wikidata Knowledge Graph
26. 26
• Challenge:
• Very context-rich data
• Multi-disciplinary data, e.g. archaeologists, historians, librarians
• Multi-institutional data
• Complex domain, relationships, e.g. temporal, spatial, historical, political
• Benefits of Knowledge Graphs
• Integration and interchange of heterogeneous cultural heritage information
• Rich ontologies for knowledge representation
• Deep semantics for true conceptual merging
• Multi-lingual knowledge representation
• Knowledge access across museums and organizations
• Enabling knowledge sharing and collaboration
Benefits of Knowledge Graphs for Cultural Heritage
27. 27
• Collaboration environment for
researchers in Cultural Heritage
• Expert users: researchers, curators
• Based on CIDOC-CRM: very rich,
expressive ontology
• Large, cross-museum data sets
• E.g. British Museum: 100s millions of
triples
• Advanced search capabilities
• Supporting query construction
• Sharing of searches, results,
visualizations
• Knowledge sharing
• Discussions around cultural heritage
annotations
• Argumentation support:
Representation of conflicting views and
opionions
ResearchSpace: Knowledge Graphs for Cultural Heritage
http://researchspace.org/
30. 30
Challenge:
• Much of the relevant
knowledge in external
databases
• Many disparate
databases / data silos
• Many different
data formats
• Complex domain, complex relationships
e.g. compounds, targets, pathways, diseases and tissues
Benefits of Knowledge Graphs in the Life Sciences
31. 31
• Integrated knowledge
representation
• Common format
• Stable, global identifiers
• Federated queries
• Integrated knowledge access:
One-stop portals
• Rich semantic search on a
conceptual level
• Entry points to further data,
in-house and external
• Crossing boundaries between
private and open data
Benefits of Knowledge Graphs in the Life Sciences
33. 34
Semantics on the Web
Semantic Web Stack
Berners-Lee (2006)
Syntactic basis
Basic data model
Simple vocabulary
(schema) language
Expressive vocabulary
(ontology) language
Query language
Application specific
declarative-knowledge
Digital signatures,
recommendations
Proof generation,
exchange, validation
34. 35
Knowledge Graphs Built on the Semantic Web Layer Cake
Unicode URIs
RDF (Resource Description Framework)
RDF-Schema
OWLSKOS
SPARQL
Query language
Entities
Relationships
Vocabularies
Ontologies
Expressive Ontology
Language
Thesauri,
classification
schemes
Graph data
model
Simple vocabulary
language
35. 36
Linked Data
• Set of standards, principles for
publishing, sharing
and interrelating structured
knowledge
• From data silos to interconnected
knowledge graphs
Linked Data Principles
• Use URIs as names for things.
• Use HTTP URIs so that people can
look up those names.
• When someone looks up a URI,
provide useful information, using
the standards: RDF, SPARQL.
• Include links to other URIs, so that
they can discover more things.
Knowledge Graphs Built on Linked Data Principles
37. 38
Graph consists of:
• Resources
(identified via
URIs)
• Literals: data
values with data
type (URI) or
language
(multilinguality
integrated)
• Attributes of
resources are
also URI-
identified (from
vocabularies)
Our Knowledge Graph again (a bit more technical)
• Various data sources and vocabularies can be arbitrarily mixed and meshed
• URIs can be shortened with namespace prefixes; e.g. schema: →
http://schema.org/
38. 39
Allows one to talk about anything
Uniform Resource Identifier (URI) can be used to identify entities
http://dbpedia.org/resource/Leonardo_da_Vinci
is a name for Leonardo da Vinci
http://www.wikidata.org/entity/Q12418
is a name for the Mona Lisa painting
Resource Description Framework (RDF)
dbpedia:
Leonardo_da_Vinci
wd:Q12418
39. 40
Allows one to express statements
An RDF statement consists of:
• Subject: resource identified by a URI
• Predicate: resource identified by a URI
• Object: resource or literal
Variety of RDF syntaxes,
e.g. Turtle (Terse RDF Triple Language):
Resource Description Framework (RDF)
dbpedia:
Leonardo_da_Vinci
wd:Q12418
dcterms:creator
wd:Q12418 dcterms:creator dbpedia:Leonardo_da_Vinci .
40. 41
• Language for two tasks w.r.t. the RDF data model:
• Definition of vocabulary – nominate:
• the ‘types’, i.e., classes, of things we might make assertions
about, and
• the properties we might apply, as predicates in these
assertions, to capture their relationships.
• Inference – given a set of assertions, using these classes
and properties, specify what should be inferred about
assertions that are implicitly made.
RDF-S – RDF Schema
41. 42
• rdfs:Class – Example:
foaf:Person – Represents the class of persons
• rdf:Property – Class of RDF properties. Example:
foaf:knows – Represents that a person “knows” another
• rdfs:domain – States that any resource that has a given property
is an instance of one or more classes
foaf:knows rdfs:domain foaf:Person
• rdfs:range – States that the values of a property are instances of
one or more classes
foaf:knows rdfs:range foaf:Person
RDF-S – RDF Schema
42. 43
RDF-S – RDF Schema
foaf:knows
rdfs:range
foaf:Person .
<http://example.org/bob#me>
foaf:knows
<http://example.org/alice#me>.
<http://example.org/alice#me>
rdf:type
foaf:Person.
Schema
Existing
fact
Inferred
fact
We expect to use this
vocabulary to make
assertions about persons.
Having made such an
assertion...
Inferences can be drawn that
we did not explicitly make
43. 44
• RDFS provides a simplified ontological language for
defining vocabularies about specific domains.
• OWL provides more ontological constructs for knowledge
representation.
• Semantics grounded in Description Logics.
• OWL 2 is divided into sub-languages denominated profiles:
• OWL 2 EL: Limited to basic classification,
but with polynomial-time reasoning
• OWL 2 QL: Designed to be translatable
to relational database querying
• OWL 2 RL: Designed to be efficiently
implementable in rule-based systems
• Most graph databases concentrate on the use of RDFS with
a subset of OWL features.
OWL – Web Ontology Language
More restrictive
than OWL DL
44. 45
OWL is made up of terms which provide for:
• Class construction: forming new classes from
membership of existing ones (e.g., unionOf,
intersectionOf, etc.).
• Property construction: distinction between OWL
ObjectProperties (resources as values) and OWL
DatatypeProperties (literals as values).
• Class axioms: sub-class, equivalence and disjointness
relationships.
• Property axioms: sub-property relationship, equivalence
and disjointness, and relationships between properties.
• Individual axioms: statements about individuals
(sameIndividual, differentIndividuals).
OWL – Web Ontology Language
45. 46
Example: CIDOC-CRM Ontology
Class: Person
SubClassOf: Actor
SubClassOf: Biological Object
SubClassOf: was_born exactly 1
SubClassOf: has_parent min 2
Class: Physical Thing
SubClassOf: Legal Object
SubClassOf: Spacetime Volume
DisjointWith: Conceptual Object
SubClassOf: consists_of some Material
46. 47
• Data model for knowledge organization systems (thesauri,
classification scheme, taxonomies)
• Conceptual resources (concepts) can be
• identified with URIs,
• labeled with lexical strings in natural language,
• documented with various types of note,
• semantically related to each other in informal hierarchies
and association networks and
• aggregated into concept schemes.
SKOS - Simple Knowledge Organization System
http://www.w3.org/TR/skos-reference/
48. 49
• Query language for RDF-based knowledge graphs.
• Designed to use a syntax similar to SQL for retrieving data
from relational databases.
• Different query forms:
• SELECT returns variables and their bindings directly.
• CONSTRUCT returns a single RDF graph specified by a graph
template.
• ASK test whether or not a query pattern has a solution. Returns
yes/no.
• DESCRIBE returns a single RDF graph containing RDF data about
resources.
SPARQL – * Protocol and RDF Query Language
49. 50
Main idea: Pattern matching
• Queries describe sub-graphs of the queried graph
• Graph patterns are RDF graphs specified in Turtle syntax, which contain variables (prefixed by
either “?” or “$”)
• Sub-graphs that match the graph patterns yield a result
• The syntax of a SELECT query is as follows:
• SELECT nominates which components of the matches made against the data should be returned.
• FROM (optional) indicates the sources for the data against which to find matches.
• WHERE defines patterns to match against the data.
• ORDER BY defines a means to order the selected matches.
SPARQL – * Protocol and RDF Query Language
dbpedia:
Leonardo_da_Vinci
?var
dcterms:creator
50. 51
Example:
Select the creator of the things
that Bob is interested in.
SPARQL – * Protocol and RDF Query Language
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT ?creator
WHERE {
<http://example.org/bob#me> foaf:topic_interest ?interest .
?interest dcterms:creator ?creator
}
dbpedia:Leonardo_da_VinciResults:
52. 53
metaphacts – Our Mission
The metaphacts team offers an unmatched
experience and know-how around enterprise
knowledge graphs for our clients in areas
such as business, finance, life science, and
cultural heritage.
The metaphactory is our end-to-end
platform to create and utilize enterprise
knowledge graphs - from semantic graph
data management to data-driven application
development.
Built entirely on open standards and
technologies, our platform covers the entire
lifecycle of dealing with knowledge graphs.
As a main benefit our platform enables
knowledge workers to create and gain
meaningful insight into their data with one
comprehensive software solution.
53. 54
metaphactory Features
KNOWLEDGE GRAPH
BACKEND
• Scalable data processing
• Easy-to-use interface
• High-performance
querying and analytics
• Built-in inferencing and
custom services
• Standard connectors for a
variety of data formats
• Single server, embedded
mode, high availability,
and scale out
KNOWLEDGE GRAPH
CREATION
• Semi-automatic creation
of knowledge graphs
• Curation and interlinking
of data from
heterogeneous sources
• Collaborative
management and
authoring
• Custom query and
templates catalogs
• Data annotation
• Capturing of provenance
information
KNOWLEDGE GRAPH
APPLICATIONS
• Rapid development of
end-user oriented
applications
• Web components for end-
user friendly presentation
and interaction
• Interactive visualization
• Rich semantic search with
visual query construction
and faceting
• Customizable semantic
clipboard
54. 55
metaphactory as an Open Platform
BUILT IN OPEN SOURCE
ü Dual licensing (LGPL & commercial license)
ü Open Platform API and SDK
ü Integration of external tools and application via APIs
ü Easy development of own web components and services
ü Full HTML5 compliance
ü Re-usable, declaratively configurable Web Components
= Easy modification, customization, and extensibility
BUILT ON OPEN STANDARDS
ü W3C Web Components
ü W3C Open Annotation Data Model
ü W3C Linked Data Platform Containers
ü Data processing based on W3C standards such as RDF, SPARQL
ü Expressive ontologies for schema modeling based on OWL 2, SKOS
ü Rules, constraints, and query specification based on SPIN and RDF Data
Shapes
= Sustainable Solution
56. 57
Users and their Benefits
EXPERT USERS
• Collaboratively construct
and manage knowledge
graphs
• Integrate data from
heterogeneous sources
• Use standard connectors
for a variety of data
formats
• Benefit from scalable data
processing for big graphs
• Conduct high-
performance querying and
analytics
DEVELOPERS
• Rapidly develop Web and
mobile end-user oriented
applications
• Benefit from various
deployment modes: stand-
alone, HA, scale-up, scale-
out
• Interact with an easy-to-
use interface
• Collaboratively manage,
annotate and author data
• Use large set of custom
query and templates
catalogs
• Capture of provenance
information
END USERS
• Benefit from user-friendly
interaction with data
• Gain insights into
complex relationships
• Enable transparency and
extract value
• Ask questions and obtain
precise results
• Reduce effort for data
analysis
• Reduce noise – obtain
targeted, high quality
results
• Enhance quality of
business decisions
57. 58
metaphacts Supports the Whole Data Lifecyle
Data
Extraction &
Integration
Data Linking
& Enrichment
Storage &
Repositories
Querying &
Inferencing
Search
Visualization
Authoring
end-to-end
platform
58. 59
Search
• Domain independent, fully
customizable search widget
• Satisfy complex information needs
without learning SPARQL
• Search functionalities
• Graphical query construction
• End user friendly search
interfaces for building and
sharing complex queries
• Semantic auto suggestion
• Interactive result visualization
• Faceted search and exploration
of item collections
• Ability to invoke external full text
search indices such as Solr including
the possibility to score, rank and limit
the results for responsive
autosuggestion
• Saving and sharing of queries and
search results
Search
59. 60
Table
Transform your queries into
durable, interactive tables
Many customization
possibilities, e.g. pagination,
filters and cell templates
Graph
Visualize and explore connections in a graph view
Custom styling of the graph
Variety of graph layouts
Carousel
Animated browsing through a list of
result items
Chart
Visualize trends and
relationships between
numbers, ratios, or
proportions
Visuali-
zation
Tree Table
Tree-based
visualization,
navigation and
browsing through sub-
tree structures
Map
Displaying spatial
data on a
geographic map
Visualization
60. 61
Autho-
ring
• Annotations
• Based on W3C Open
Annotation Data Model
• Automated semantic link
extraction
• Form based authoring
• Manually author and update
instance data, backed by
query templates, data
dependencies, and type
constraints
• Rich editing components for
special data types
• Customizable flexible forms
• Autosuggestion and
validation against the
knowledge graph
• Capturing of provenance
information
• User group management
Authoring
61. 62
Install & Go: Out-of-the-Box Functionality
Getting Started Tutorial
to guide you through your
first steps with metaphactory
Get
started
Management of Queries in
Catalog
for easy reuse and updating
Keyword Search Interface
with semantic autosuggestion,
driven by SPARQL
Search
Data Overview Pages
with Web components for
end-user friendly data
presentation and interaction
Template-based Data Browser
used to define generic views
which are automatically applied
to entire sets of instances
Explore
62. 63
Example: Simple Semantic Search
Keyword search with semantic
autosuggestion, driven by SPARQL
Set up in ~2 minutes!
Declarative Components
Developer embeds
‘semantic-simple-search’
into the page
<semantic-simple-search data-
config='{
"query":"
SELECT ?result ?label ?desc
?img WHERE {
?result rdfs:label ?label .
?result rdfs:comment ?desc .
?result foaf:thumbnail ?img .
FILTER(CONTAINS(?label,
?token))
}",
"searchTermVariable":"token", //
user input
"template":"
<span title="{{result}}">
<img src="{{img}}"
height="30"/>
{{label}} ({{desc}})</span>"
}'/>
1
Rendered component is
displayed to the user and
can be used right away
2
Autosuggestions are
dynamically computed
based on query and user
input
3
63. 64
• Associate a class in the knowledge graph with a template
• The template is applied to instances of the class
HTML5 Template Pages
Bob
foaf:Person
rdf:type
65. 66
Hands-on Exercises
DATA LOADING &
QUERYING
• Loading your
data
• Querying your
data
VISUALIZATION
• Visualizing
results in a table
• Visualizing
results in a graph
SEARCH
• Embedding a
simple search
interface
AUTHORING
• Creating a
template
• Inserting and
updating data
66. 67
• Download metaphactory (or copy from USB stick)
http://www.knowledgegraph.info/
• Follow README, start metaphactory
start.sh / start.bat
• Open start page
http://localhost:10214
• Follow “Getting started tutorial”
http://localhost:10214/resource/Help:Tutorial
• Have fun and ask questions ;-)
Hands-on Exercises
67. 68
Data Loading & Querying
Load data into the store via
the data import and export
administration page
1 2
Query the data via the
SPARQL endpoint.
E.g.: issue a query for all
statements made about Bob
as a subject
3
Visualize results in a table
… or as raw data
68. 69
Visualizing Results in a Table
3
Visualize results in a table
displaying thumbnails as
images, the labels of the
resources as captions, and
links to the individual
resource pages
1
Embed ‘semantic-table’
component
<semantic-table config='{
"query":"SELECT * WHERE {
<http://example.org/bob#me>
?predicate ?object }“
}'>
</semantic-table>
to visualize previous query
as a table in a page
2
Customize the query to
embed thumbnail images
in the result visualization
SELECT ?uri ?label
?thumbnail WHERE { ?uri
rdfs:label ?label;
<http://schema.org/thumbnail
> ?thumbnail }
Use tupleTemplate to
define a template for
displaying the new table
69. 70
Visualizing Results in a Graph
1
Embed ‘semantic-graph’
component
<semantic-graph
query="CONSTRUCT WHERE { ?s ?p ?o
}">
</semantic-graph>
2
Visualize results in a graph
70. 71
Embedding a Simple Search Interface
Embed ‘semantic-simple-
search’ into the page
<semantic-simple-search
config='{
"query":"
SELECT ?uri ?label
WHERE {
FILTER REGEX(?label,
"?token", "i")
?uri rdfs:label
?label
} LIMIT 10
",
"searchTermVariable":"token",
"resourceSelection":{
"resourceBindingName":"uri",
"template":"<span
style="color: blue;"
title="{{uri.value}}">{{label.
value}}</span>"
},
"inputPlaceholder":"Search for
something e.g. "Bob""
}'>
</semantic-simple-search>
1
Rendered component is
displayed and can be used
right away
2
Autosuggestions are
dynamically computed
based on query and user
input
3
71. 72
Creating a Template
1
Use the templating mechanism to
create a template for the resource
type ‘Person’, to display:
• the person's name
• an image, if available
• his interests
• his friendship relationship
2
Visualize the result on
Bob’s instance page
72. 73
Inserting and Updating Data
1
Use a SPARQL UPDATE operation
against the SPARQL endpoint to
create and add new instance data
• the person's name
• an image, if available
• his interests
• his friendship relationship
2
Visualize the result
73. 74
• Knowledge graphs as a flexible model for data integration and knowledge representation
• Standards for “semantic” knowledge graphs
• RDF as graph-based data model
• OWL as expressive ontology language
• SKOS for taxonomic knowledge
• SPARQL as query language
• Application areas
• Open knowledge graphs, e.g. Wikidata
• Cultural Heritage
• Life Sciences
• And many more
• Get started with the metaphactory Knowledge Graph platform today!
Summary