SlideShare une entreprise Scribd logo
1  sur  50
Télécharger pour lire hors ligne
02/14/22 Heiko Paulheim 1
New Adventures in RDF2vec
Heiko Paulheim
University of Mannheim
Heiko Paulheim
also includes
the latest
adventures
Alert:
contains spoilers
on future
publications
02/14/22 Heiko Paulheim 2
Graphs vs. Vectors
• Data Science tools for prediction etc.
– Python, Weka, R, RapidMiner, …
– Algorithms that work on vectors, not graphs
• Bridges built over the past years:
– FeGeLOD (Weka, 2012), RapidMiner LOD Extension (2015),
Python KG Extension (2021)
?
02/14/22 Heiko Paulheim 3
Graphs vs. Vectors
• Transformation strategies (aka propositionalization)
– e.g., types: type_horror_movie=true
– e.g., data values: year=2011
– e.g., aggregates: nominations=7
?
02/14/22 Heiko Paulheim 4
Graphs vs. Vectors
• Observations with simple propositionalization strategies
– Even simple features (e.g., add all numbers and types)
can help on many problems
– More sophisticated features often bring additional improvements
• Combinations of relations and individuals
– e.g., movies directed by Steven Spielberg
• Combinations of relations and types
– e.g., movies directed by Oscar-winning directors
• …
– But
• The search space is enormous!
• Generate first, filter later does not scale well
02/14/22 Heiko Paulheim 5
Towards RDF2vec
• Excursion: word embeddings
– word2vec proposed by Mikolov et al. (2013)
– predict a word from its context or vice versa
• Idea: similar words appear in similar contexts, like
– Jobs, Wozniak, and Wayne founded Apple Computer Company in April
1976
– Google was officially founded as a company in January 2006
– usually trained on large text corpora
• projection layer: embedding vectors
02/14/22 Heiko Paulheim 6
From Word Embeddings to Graph Embeddings
• Basic idea:
– extract random walks from an RDF graph:
Mulholland Dr. David Lynch US
– feed walks into word2vec algorithm
• Order of magnitude (e.g., DBpedia)
– ~6M entities (“words”)
– start up to 500 random walks per entity, length up to 8
→ corpus of >20B tokens
• Result:
– entity embeddings
– most often outperform other propositionalization techniques
director nationality
Ristoski and Paulheim (2016): RDF2vec: RDF graph embeddings for data mining
02/14/22 Heiko Paulheim 7
A First Glance at RDF2vec Embeddings
• Observation: close projection of similar entities
– can be exploited by downstream ML algorithms (think: k-NN)
Ristoski and Paulheim (2016): RDF2vec: RDF graph embeddings for data mining
02/14/22 Heiko Paulheim 8
The End of Petar’s PhD Journey…
• ...and the beginning of the RDF2vec adventure
02/14/22 Heiko Paulheim 9
Embeddings for Link Prediction
• RDF2vec example
– similar instances form clusters, direction of relation is ~stable
– link prediction by analogy reasoning (Japan – Tokyo ≈ China – Beijing)
Ristoski & Paulheim: RDF2vec: RDF Graph Embeddings for Data Mining. ISWC, 2016
02/14/22 Heiko Paulheim 10
Embeddings for Link Prediction
• In RDF2vec, relation preservation is a by-product
• TransE (and its descendants): direct modeling
– Formulates RDF embedding as an optimization problem
– Find mapping of entities and relations to Rn
so that
• across all triples <s,p,o>
Σ ||s+p-o|| is minimized
• try to obtain a smaller error
for existing triples
than for non-existing ones
Bordes et al: Translating Embeddings for Modeling Multi-relational Data. NIPS 2013.
Fan et al.: Learning Embedding Representations for Knowledge Inference on Imperfect and Incomplete
Repositories. WI 2016
02/14/22 Heiko Paulheim 11
Link Prediction vs. Node Embedding
• Hypothesis:
– Embeddings for link prediction also cluster similar entities
– Node embeddings can also be used for link prediction
Portisch et al. (to appear): Knowledge Graph Embedding for Data Mining vs. Knowledge Graph Embedding
for Link Prediction - Two Sides of the Same Coin?
02/14/22 Heiko Paulheim 12
Using RDF2vec for Link Prediction
• Use embeddings for head and relation, predict tail
– Train separate network for head prediction
02/14/22 Heiko Paulheim 13
Local Embeddings: RDF2vec Light
• Recap: order of magnitude (e.g., DBpedia)
– ~6M entities (“words”)
– start up to 500 random walks per entity, length up to 8
→ corpus of >20B tokens
– “Train once, reuse often”
• In some cases, only a small subset (of 6M) is of interest
– RDF2vec light: “train when needed”
– Runtime: minutes instead of days
Portisch et al. (2020): RDF2Vec Light – A Lightweight Approach for Knowledge
Graph Embeddings
02/14/22 Heiko Paulheim 14
Local Embeddings: RDF2vec Light
• Results:
– Many classification and regression tasks work fine with light
• As good as or sometimes even better (!) than normal RDF2vec
– ...but there is a huge performance drop in tasks like document similarity
• First take away: RDF2vec light works better for
homogeneous sets of entities
Portisch et al. (2020): RDF2Vec Light – A Lightweight Approach for Knowledge
Graph Embeddings
02/14/22 Heiko Paulheim 15
Random vs. non-random Walks
• Maybe random walks are not such a good idea
– They may give too much weight on less important entities and facts
• Strategies:
– Prefer edges with more frequent predicates
– Prefer nodes with higher indegree or PageRank
– …
– They may cover less important entities and facts too little
• Strategies:
– The opposite of all of the above strategies
• The results are mixed
• External signals (e.g., human notions of importance)
– generally work better than graph-internal signals
Cochez et al. (2017): Biased Graph Walks for RDF Graph Embeddings
Al Taweel and Paulheim (2020): Towards Exploiting Implicit Human Feedback for Improving RDF2vec
Embeddings
02/14/22 Heiko Paulheim 16
Random vs. non-random Walks
• Other walking strategies include, but are not limited to…
– Walks with community hops (i.e., random jumps between similar nodes)
– Walklets (i.e., smaller subwalks fed into word2vec)
– Hierarchical walks (i.e., ignoring rarer hops, putting more emphasis on
common connections)
– Walks with wildcards
• The results, again, are mixed
Steenwinckel et al. (2021): Walk Extraction Strategies for Node Embeddings with RDF2Vec in Knowledge
Graphs. Database and Expert Systems Applications - DEXA 2021 Workshops
02/14/22 Heiko Paulheim 17
Similarity vs. Relatedness
• Closest 10 entities to Angela Merkel in different vector spaces
Portisch et al. (2022): Knowledge Graph Embedding for Data Mining vs. Knowledge Graph Embedding for
Link Prediction - Two Sides of the Same Coin?
02/14/22 Heiko Paulheim 18
Similarity vs. Relatedness
• Why bother?
– Use case: table interpretation (a special case of entity disambiguation)
related
similar
02/14/22 Heiko Paulheim 19
Similarity vs. Relatedness
• Recap word embeddings:
– Jobs, Wozniak, and Wayne founded Apple Computer Company in April
1976
– Google was officially founded as a company in January 2006
• Graph walks:
– Hamburg → country → Germany → leader → Angela_Merkel
– Germany → leader → Angela_Merkel → birthPlace → Hamburg
– Hamburg → leader → Peter_Tschentscher → residence → Hamburg
Germany
Angela_Merkel Hamburg
birthPlace
country
leader
Peter_Tschentscher
leader
residence
country
02/14/22 Heiko Paulheim 20
Similarity vs. Relatedness
• Surrounding entities indicate relatedness
– Hamburg → country → Germany → leader → Angela_Merkel
– Germany → leader → Angela_Merkel → birthPlace → Hamburg
• Same entities in similar positions indicate similarity
– Germany → leader → Angela_Merkel → birthPlace → Hamburg
– Hamburg → leader → Peter_Tschentscher → residence → Hamburg
• Someone is a leader vs. something has a leader
• Solution approach: use embedding approach that respects positions
– CWINDOW / Structured Skip-ngram
Portisch and Paulheim (2021): Putting RDF2vec in Order.
02/14/22 Heiko Paulheim 21
Order-Aware RDF2vec
• Using an order-aware variant of word2vec
• Experimental results:
– order-aware RDF2vec most often outperforms classic RDF2vec
– a bit more computation heavy, but still scales to DBpedia etc.
Ling et al. (2015): Two/Too Simple Adaptations of Word2Vec for Syntax Problems.
02/14/22 Heiko Paulheim 22
Similarity vs. Relatedness
• (s-)RDF2vec allows an explicit trade off w/ different walk strategies
Mannheim
Baden-
Württemberg
Germany
Adler
Mannheim
SAP Arena
Reiss-
Engelhorn
-Museum
location
location
location
federal
state
country
location
city
stadium
Knowledge Graph
Walk Generation
Adler_Mannheim → city → Mannheim → country → Germany
Adler_Mannheim → stadium → SAP_Arena → location → Mannheim
SAP_Arena → location → Mannheim → country → Germany
...
“Classic” RDF2vec walks
city → Mannheim → country
stadium → SAP_Arena → location
location → Mannheim → country
...
s-RDF2vec walks
+
RDF2vec “union walks”
RDF2vec “classic”
RDF2vec “edge”
concatenated
vector
Global PCA
Test Cases
concatenated
vector
(task-specific
subset)
w
2
w
1
(weighted)
local PCA
Portisch et al. (under review): s-RDF2vec: Injecting Knowledge Graph Structure
Into RDF2vec Entity Embeddings.
02/14/22 Heiko Paulheim 23
Similarity vs. Relatedness
• s-RDF2vec
– using different walk strategies
– combining different vector spaces (weighted combinations are possible)
• 10 closest neighbors to Mannheim:
Portisch et al. (under review): s-RDF2vec: Injecting Knowledge Graph Structure
Into RDF2vec Entity Embeddings.
02/14/22 Heiko Paulheim 24
To Materialize or Not to Materialize?
May I ask you a question?
Sure, go ahead!
02/14/22 Heiko Paulheim 25
To Materialize or Not to Materialize?
Rumor has it that
RDF2vec performs worse
if you run a reasoner to add inferences
to the graph first...
???
02/14/22 Heiko Paulheim 26
To Materialize or Not to Materialize?
I know it sounds
counter intuitive...
Hmmm...
02/14/22 Heiko Paulheim 27
To Materialize or Not to Materialize?
Hmmm… sounds reasonable.
(Pun intended)
Okay, there might
be an explanation...
02/14/22 Heiko Paulheim 28
To Materialize or Not to Materialize?
We need more beer
experiments
02/14/22 Heiko Paulheim 29
Back Home...
We need more beer
experiments
OK, let’s go!
02/14/22 Heiko Paulheim 30
Experimental Setup
RDF2vec
+ inferences
●
Classification
●
Regression
●
Entity Similarity
●
Entity Relatedness
●
Document Similarity
(a) (b)
Iana and Paulheim (2020): More is not always better: The negative impact of a-box materialization on
RDF2vec knowledge graph embeddings
02/14/22 Heiko Paulheim 31
Experimental Results
• Classification: unmaterialized is better in 60/80 cases
• Regression: unmaterialized is better in 39/60 cases
• Entity similarity: unmaterialized is better in 16/20 cases
• Entity relatedness: unmaterialized is better in 13/20 cases
• But: document similarity: materialized is always better
– task has a very different nature
– more heterogeneity
Iana and Paulheim (2020): More is not always better: The negative impact of a-box materialization on
RDF2vec knowledge graph embeddings
02/14/22 Heiko Paulheim 32
To Materialize or not to Materialize?
• Explanation 1: materialization skews property distributions
Iana and Paulheim (2020): More is not always better: The negative impact of a-box materialization on
RDF2vec knowledge graph embeddings
02/14/22 Heiko Paulheim 33
To Materialize or not to Materialize?
• Explanation 2 is a bit more complex...
• Thought experiment:
– DBpedia mostly does not include persons’ gender
– learn classifier for gender
• Spouse is a symmetric property, but…
– distribution is highly uneven
– 80% of all subjects of spouse are women
spouse
Ayda_Field spouse Robbie_Williams . Graells-Garrido et al: (2012): First Women,
Second Sex: Gender Bias in Wikipedia
02/14/22 Heiko Paulheim 34
To Materialize or not to Materialize?
• Thought experiment: learn classifier for gender
• Spouse is a symmetric property, but…
– 80% of all subjects of spouse are women
• Assume that an embedding captures that information
– e.g., order-aware RDF2vec
→ a downstream classifier can reach >80% accuracy
• On the other hand
– Materialization completely erases that information
• Bottom line: missing information can be a signal
– Machine learning terminology: MAR vs. MNAR
Iana and Paulheim (2021): More is not Always Better: The Negative Impact of A-box Materialization
on RDF2vec Knowledge Graph Embeddings
02/14/22 Heiko Paulheim 35
Dynamic Knowledge Graphs
• In theory, RDF2vec can
also produce embeddings for
dynamic knowledge graphs
to a certain extent
– given that the neighbors are
all known
– Experiments are still
under way
02/14/22 Heiko Paulheim 36
Understanding the RDF2vec Model Zoo
• Variations
– Walk extraction (e.g., classic, s-RDF2vec, e-RDF2vec)
– Ordered vs. non-ordered
– Skip-gram vs. CBOW
• This alone gives us 12 combinations
of how to train an RDF2vec model
• We assume that not all of them are equally good
02/14/22 Heiko Paulheim 37
Understanding the RDF2vec Model Zoo
02/14/22 Heiko Paulheim 38
Understanding the RDF2vec Model Zoo
• Variations
– Walk extraction (e.g., classic, s-RDF2vec, e-RDF2vec)
– Ordered vs. non-ordered
– Skip-gram vs. CBOW
• Build a systematic collection of basic classification problems
• For example, r.{e} vs. ¬r.{e}
– e.g., person born in NYC vs. person not born in NYC
– here, s-RDF2vec should not be able to solve this
02/14/22 Heiko Paulheim 39
Embeddings and Interpretability
• Hot topic: Explainable AI
– Knowledge Graphs are a favorable ingredient
– Human/machine interpretable knowledge → explainable systems
• However:
– Embeddings replace interpretable axioms
with numeric vectors over non-interpretable dimensions
– Where did the semantics go?
Paulheim (2018): Make Embeddings Semantic Again!
02/14/22 Heiko Paulheim 40
The 2009 Semantic Web Layer Cake
02/14/22 Heiko Paulheim 41
The 2018 Semantic Web Layer Cake
Embeddings
02/14/22 Heiko Paulheim 42
Towards Semantic Vector Space Embeddings
cartoon
superhero
Paulheim (2018): Make Embeddings Semantic Again!
02/14/22 Heiko Paulheim 43
Towards Semantic Vector Space Embeddings
cartoon
superhero
• Approach 1: learn interpretation function
• Each dimension of the embedding model
is a target for a separate learning problem
• Learn a function to explain the dimension
• E.g.:
• Just an approximation used for explanations and justifications
y≈−|∃character .Superhero|
02/14/22 Heiko Paulheim 44
Towards Semantic Vector Space Embeddings
cartoon
superhero
• Approach 2: learn inherently
interpretable embeddings
• Step 1: learn typical patterns
that exist in a knowledge graph
– e.g., graph pattern learning
– e.g., Horn clauses
• Step 2a: use those patterns
as embedding dimensions
– probably not low dimensional
• Step 2b: compact the space
– e.g., use dimensions for mutually exclusive patterns
02/14/22 Heiko Paulheim 45
Towards Semantic Vector Space Embeddings
• Different angle: learn interpretation for similarity function
~similar
type
~same
country
~connected
to same
entity
02/14/22 Heiko Paulheim 46
Explaining Predictions with RDF2vec
• Recap: we can, in principle, create vectors for new entities
• Some explanation models, like LIME, do this:
– Create new artificial entities by perturbation
• In our KG context: add/remove connections
• Predict for new entities
• Learn explanation for predictions
• With that approach, LIME should be applicable to predictions
w/ RDF2vec
Ribeiro et al. (2016): "Why Should I Trust You?": Explaining the Predictions of Any Classifier
02/14/22 Heiko Paulheim 47
Summary
• Knowledge Graph Embeddings with RDF2vec
– Effective processing of large-scale knowledge sources
• Light variant possible for scalability
– Variations visited: walk extraction, order-awareness, materialization, ...
– Encoding of similarity and/or relatedness
• RDF2vec: explicit trade-off is possible!
– Additional insights that are not explicit in the graph
• aka latent semantics
02/14/22 Heiko Paulheim 48
More on RDF2vec
• Collection of
– Implementations
– Pre-trained models
– >45 use cases
in various domains
02/14/22 Heiko Paulheim 49
Thank you!
http://www.heikopaulheim.com
@heikopaulheim
02/14/22 Heiko Paulheim 50
New Adventures in RDF2vec
Heiko Paulheim
University of Mannheim
Heiko Paulheim

Contenu connexe

Tendances

Training Week: Create a Knowledge Graph: A Simple ML Approach
Training Week: Create a Knowledge Graph: A Simple ML Approach Training Week: Create a Knowledge Graph: A Simple ML Approach
Training Week: Create a Knowledge Graph: A Simple ML Approach
Neo4j
 
Introduction to RDF & SPARQL
Introduction to RDF & SPARQLIntroduction to RDF & SPARQL
Introduction to RDF & SPARQL
Open Data Support
 

Tendances (20)

Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
 
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
 
System Design Interviews.pdf
System Design Interviews.pdfSystem Design Interviews.pdf
System Design Interviews.pdf
 
Slides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property GraphsSlides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property Graphs
 
Introduction to Pig
Introduction to PigIntroduction to Pig
Introduction to Pig
 
Training Week: Create a Knowledge Graph: A Simple ML Approach
Training Week: Create a Knowledge Graph: A Simple ML Approach Training Week: Create a Knowledge Graph: A Simple ML Approach
Training Week: Create a Knowledge Graph: A Simple ML Approach
 
Introduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AIIntroduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AI
 
Introduction to RDF & SPARQL
Introduction to RDF & SPARQLIntroduction to RDF & SPARQL
Introduction to RDF & SPARQL
 
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...
 
Neo4j Bloom: What’s New with Neo4j's Data Visualization Tool
Neo4j Bloom: What’s New with Neo4j's Data Visualization ToolNeo4j Bloom: What’s New with Neo4j's Data Visualization Tool
Neo4j Bloom: What’s New with Neo4j's Data Visualization Tool
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
Vector databases and neural search
Vector databases and neural searchVector databases and neural search
Vector databases and neural search
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph Databases
 
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
 
Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020
 
GraphTour 2020 - Danish Business Authority: First line of Defence
GraphTour 2020 - Danish Business Authority: First line of DefenceGraphTour 2020 - Danish Business Authority: First line of Defence
GraphTour 2020 - Danish Business Authority: First line of Defence
 
How Expedia’s Entity Graph Powers Global Travel
How Expedia’s Entity Graph Powers Global TravelHow Expedia’s Entity Graph Powers Global Travel
How Expedia’s Entity Graph Powers Global Travel
 
A Primer on Entity Resolution
A Primer on Entity ResolutionA Primer on Entity Resolution
A Primer on Entity Resolution
 
Mining Data Streams
Mining Data StreamsMining Data Streams
Mining Data Streams
 
Spark graphx
Spark graphxSpark graphx
Spark graphx
 

Similaire à New Adventures in RDF2vec

The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
Sören Auer
 
DM2E Project meeting Bergen: WP2 RDF Validation, Kai Eckert (University of Ma...
DM2E Project meeting Bergen: WP2 RDF Validation, Kai Eckert (University of Ma...DM2E Project meeting Bergen: WP2 RDF Validation, Kai Eckert (University of Ma...
DM2E Project meeting Bergen: WP2 RDF Validation, Kai Eckert (University of Ma...
Digitised Manuscripts to Europeana
 
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talkDistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
Gezim Sejdiu
 

Similaire à New Adventures in RDF2vec (20)

New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
 
Machine Learning & Embeddings for Large Knowledge Graphs
Machine Learning & Embeddings  for Large Knowledge GraphsMachine Learning & Embeddings  for Large Knowledge Graphs
Machine Learning & Embeddings for Large Knowledge Graphs
 
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge BasesLOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
 
Hadoop and Beyond
Hadoop and BeyondHadoop and Beyond
Hadoop and Beyond
 
Programming the Semantic Web
Programming the Semantic WebProgramming the Semantic Web
Programming the Semantic Web
 
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
 
Linked Data Visualization Model - KEG VŠE
Linked Data Visualization Model - KEG VŠELinked Data Visualization Model - KEG VŠE
Linked Data Visualization Model - KEG VŠE
 
Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.
 
Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)
 
Release webinar: Sansa and Ontario
Release webinar: Sansa and OntarioRelease webinar: Sansa and Ontario
Release webinar: Sansa and Ontario
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
 
UnifiedViews: Towards ETL Tool for Simple yet Powerful RDF Data Management.
UnifiedViews: Towards ETL Tool for Simple yet Powerful RDF Data Management.UnifiedViews: Towards ETL Tool for Simple yet Powerful RDF Data Management.
UnifiedViews: Towards ETL Tool for Simple yet Powerful RDF Data Management.
 
Modern PHP RDF toolkits: a comparative study
Modern PHP RDF toolkits: a comparative studyModern PHP RDF toolkits: a comparative study
Modern PHP RDF toolkits: a comparative study
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And Visualization
 
DM2E Project meeting Bergen: WP2 RDF Validation, Kai Eckert (University of Ma...
DM2E Project meeting Bergen: WP2 RDF Validation, Kai Eckert (University of Ma...DM2E Project meeting Bergen: WP2 RDF Validation, Kai Eckert (University of Ma...
DM2E Project meeting Bergen: WP2 RDF Validation, Kai Eckert (University of Ma...
 
Linked data and voyager
Linked data and voyagerLinked data and voyager
Linked data and voyager
 
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talkDistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
 
The Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of LeipzigThe Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of Leipzig
 
The nature.com ontologies portal: nature.com/ontologies
The nature.com ontologies portal: nature.com/ontologiesThe nature.com ontologies portal: nature.com/ontologies
The nature.com ontologies portal: nature.com/ontologies
 

Plus de Heiko Paulheim

Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids  on the Knowledge Graph BlockBeyond DBpedia and YAGO – The New Kids  on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Heiko Paulheim
 

Plus de Heiko Paulheim (20)

Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsKnowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
 
From Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsFrom Wikis to Knowledge Graphs
From Wikis to Knowledge Graphs
 
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids  on the Knowledge Graph BlockBeyond DBpedia and YAGO – The New Kids  on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
 
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphFrom Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!
 
How much is a Triple?
How much is a Triple?How much is a Triple?
How much is a Triple?
 
Machine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsMachine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge Graphs
 
Weakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterWeakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on Twitter
 
Towards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingTowards Knowledge Graph Profiling
Towards Knowledge Graph Profiling
 
Knowledge Graphs on the Web
Knowledge Graphs on the WebKnowledge Graphs on the Web
Knowledge Graphs on the Web
 
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and OntologyData-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
 
Fast Approximate A-box Consistency Checking using Machine Learning
Fast Approximate  A-box Consistency Checking using Machine LearningFast Approximate  A-box Consistency Checking using Machine Learning
Fast Approximate A-box Consistency Checking using Machine Learning
 
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on TopServing DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
 
Combining Ontology Matchers via Anomaly Detection
Combining Ontology Matchers via Anomaly DetectionCombining Ontology Matchers via Anomaly Detection
Combining Ontology Matchers via Anomaly Detection
 
Gathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia EntitiesGathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia Entities
 
What the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open DataWhat the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open Data
 
Linked Open Data enhanced Knowledge Discovery
Linked Open Data enhanced  Knowledge DiscoveryLinked Open Data enhanced  Knowledge Discovery
Linked Open Data enhanced Knowledge Discovery
 
Mining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMinerMining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMiner
 

Dernier

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 

Dernier (20)

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 

New Adventures in RDF2vec

  • 1. 02/14/22 Heiko Paulheim 1 New Adventures in RDF2vec Heiko Paulheim University of Mannheim Heiko Paulheim also includes the latest adventures Alert: contains spoilers on future publications
  • 2. 02/14/22 Heiko Paulheim 2 Graphs vs. Vectors • Data Science tools for prediction etc. – Python, Weka, R, RapidMiner, … – Algorithms that work on vectors, not graphs • Bridges built over the past years: – FeGeLOD (Weka, 2012), RapidMiner LOD Extension (2015), Python KG Extension (2021) ?
  • 3. 02/14/22 Heiko Paulheim 3 Graphs vs. Vectors • Transformation strategies (aka propositionalization) – e.g., types: type_horror_movie=true – e.g., data values: year=2011 – e.g., aggregates: nominations=7 ?
  • 4. 02/14/22 Heiko Paulheim 4 Graphs vs. Vectors • Observations with simple propositionalization strategies – Even simple features (e.g., add all numbers and types) can help on many problems – More sophisticated features often bring additional improvements • Combinations of relations and individuals – e.g., movies directed by Steven Spielberg • Combinations of relations and types – e.g., movies directed by Oscar-winning directors • … – But • The search space is enormous! • Generate first, filter later does not scale well
  • 5. 02/14/22 Heiko Paulheim 5 Towards RDF2vec • Excursion: word embeddings – word2vec proposed by Mikolov et al. (2013) – predict a word from its context or vice versa • Idea: similar words appear in similar contexts, like – Jobs, Wozniak, and Wayne founded Apple Computer Company in April 1976 – Google was officially founded as a company in January 2006 – usually trained on large text corpora • projection layer: embedding vectors
  • 6. 02/14/22 Heiko Paulheim 6 From Word Embeddings to Graph Embeddings • Basic idea: – extract random walks from an RDF graph: Mulholland Dr. David Lynch US – feed walks into word2vec algorithm • Order of magnitude (e.g., DBpedia) – ~6M entities (“words”) – start up to 500 random walks per entity, length up to 8 → corpus of >20B tokens • Result: – entity embeddings – most often outperform other propositionalization techniques director nationality Ristoski and Paulheim (2016): RDF2vec: RDF graph embeddings for data mining
  • 7. 02/14/22 Heiko Paulheim 7 A First Glance at RDF2vec Embeddings • Observation: close projection of similar entities – can be exploited by downstream ML algorithms (think: k-NN) Ristoski and Paulheim (2016): RDF2vec: RDF graph embeddings for data mining
  • 8. 02/14/22 Heiko Paulheim 8 The End of Petar’s PhD Journey… • ...and the beginning of the RDF2vec adventure
  • 9. 02/14/22 Heiko Paulheim 9 Embeddings for Link Prediction • RDF2vec example – similar instances form clusters, direction of relation is ~stable – link prediction by analogy reasoning (Japan – Tokyo ≈ China – Beijing) Ristoski & Paulheim: RDF2vec: RDF Graph Embeddings for Data Mining. ISWC, 2016
  • 10. 02/14/22 Heiko Paulheim 10 Embeddings for Link Prediction • In RDF2vec, relation preservation is a by-product • TransE (and its descendants): direct modeling – Formulates RDF embedding as an optimization problem – Find mapping of entities and relations to Rn so that • across all triples <s,p,o> Σ ||s+p-o|| is minimized • try to obtain a smaller error for existing triples than for non-existing ones Bordes et al: Translating Embeddings for Modeling Multi-relational Data. NIPS 2013. Fan et al.: Learning Embedding Representations for Knowledge Inference on Imperfect and Incomplete Repositories. WI 2016
  • 11. 02/14/22 Heiko Paulheim 11 Link Prediction vs. Node Embedding • Hypothesis: – Embeddings for link prediction also cluster similar entities – Node embeddings can also be used for link prediction Portisch et al. (to appear): Knowledge Graph Embedding for Data Mining vs. Knowledge Graph Embedding for Link Prediction - Two Sides of the Same Coin?
  • 12. 02/14/22 Heiko Paulheim 12 Using RDF2vec for Link Prediction • Use embeddings for head and relation, predict tail – Train separate network for head prediction
  • 13. 02/14/22 Heiko Paulheim 13 Local Embeddings: RDF2vec Light • Recap: order of magnitude (e.g., DBpedia) – ~6M entities (“words”) – start up to 500 random walks per entity, length up to 8 → corpus of >20B tokens – “Train once, reuse often” • In some cases, only a small subset (of 6M) is of interest – RDF2vec light: “train when needed” – Runtime: minutes instead of days Portisch et al. (2020): RDF2Vec Light – A Lightweight Approach for Knowledge Graph Embeddings
  • 14. 02/14/22 Heiko Paulheim 14 Local Embeddings: RDF2vec Light • Results: – Many classification and regression tasks work fine with light • As good as or sometimes even better (!) than normal RDF2vec – ...but there is a huge performance drop in tasks like document similarity • First take away: RDF2vec light works better for homogeneous sets of entities Portisch et al. (2020): RDF2Vec Light – A Lightweight Approach for Knowledge Graph Embeddings
  • 15. 02/14/22 Heiko Paulheim 15 Random vs. non-random Walks • Maybe random walks are not such a good idea – They may give too much weight on less important entities and facts • Strategies: – Prefer edges with more frequent predicates – Prefer nodes with higher indegree or PageRank – … – They may cover less important entities and facts too little • Strategies: – The opposite of all of the above strategies • The results are mixed • External signals (e.g., human notions of importance) – generally work better than graph-internal signals Cochez et al. (2017): Biased Graph Walks for RDF Graph Embeddings Al Taweel and Paulheim (2020): Towards Exploiting Implicit Human Feedback for Improving RDF2vec Embeddings
  • 16. 02/14/22 Heiko Paulheim 16 Random vs. non-random Walks • Other walking strategies include, but are not limited to… – Walks with community hops (i.e., random jumps between similar nodes) – Walklets (i.e., smaller subwalks fed into word2vec) – Hierarchical walks (i.e., ignoring rarer hops, putting more emphasis on common connections) – Walks with wildcards • The results, again, are mixed Steenwinckel et al. (2021): Walk Extraction Strategies for Node Embeddings with RDF2Vec in Knowledge Graphs. Database and Expert Systems Applications - DEXA 2021 Workshops
  • 17. 02/14/22 Heiko Paulheim 17 Similarity vs. Relatedness • Closest 10 entities to Angela Merkel in different vector spaces Portisch et al. (2022): Knowledge Graph Embedding for Data Mining vs. Knowledge Graph Embedding for Link Prediction - Two Sides of the Same Coin?
  • 18. 02/14/22 Heiko Paulheim 18 Similarity vs. Relatedness • Why bother? – Use case: table interpretation (a special case of entity disambiguation) related similar
  • 19. 02/14/22 Heiko Paulheim 19 Similarity vs. Relatedness • Recap word embeddings: – Jobs, Wozniak, and Wayne founded Apple Computer Company in April 1976 – Google was officially founded as a company in January 2006 • Graph walks: – Hamburg → country → Germany → leader → Angela_Merkel – Germany → leader → Angela_Merkel → birthPlace → Hamburg – Hamburg → leader → Peter_Tschentscher → residence → Hamburg Germany Angela_Merkel Hamburg birthPlace country leader Peter_Tschentscher leader residence country
  • 20. 02/14/22 Heiko Paulheim 20 Similarity vs. Relatedness • Surrounding entities indicate relatedness – Hamburg → country → Germany → leader → Angela_Merkel – Germany → leader → Angela_Merkel → birthPlace → Hamburg • Same entities in similar positions indicate similarity – Germany → leader → Angela_Merkel → birthPlace → Hamburg – Hamburg → leader → Peter_Tschentscher → residence → Hamburg • Someone is a leader vs. something has a leader • Solution approach: use embedding approach that respects positions – CWINDOW / Structured Skip-ngram Portisch and Paulheim (2021): Putting RDF2vec in Order.
  • 21. 02/14/22 Heiko Paulheim 21 Order-Aware RDF2vec • Using an order-aware variant of word2vec • Experimental results: – order-aware RDF2vec most often outperforms classic RDF2vec – a bit more computation heavy, but still scales to DBpedia etc. Ling et al. (2015): Two/Too Simple Adaptations of Word2Vec for Syntax Problems.
  • 22. 02/14/22 Heiko Paulheim 22 Similarity vs. Relatedness • (s-)RDF2vec allows an explicit trade off w/ different walk strategies Mannheim Baden- Württemberg Germany Adler Mannheim SAP Arena Reiss- Engelhorn -Museum location location location federal state country location city stadium Knowledge Graph Walk Generation Adler_Mannheim → city → Mannheim → country → Germany Adler_Mannheim → stadium → SAP_Arena → location → Mannheim SAP_Arena → location → Mannheim → country → Germany ... “Classic” RDF2vec walks city → Mannheim → country stadium → SAP_Arena → location location → Mannheim → country ... s-RDF2vec walks + RDF2vec “union walks” RDF2vec “classic” RDF2vec “edge” concatenated vector Global PCA Test Cases concatenated vector (task-specific subset) w 2 w 1 (weighted) local PCA Portisch et al. (under review): s-RDF2vec: Injecting Knowledge Graph Structure Into RDF2vec Entity Embeddings.
  • 23. 02/14/22 Heiko Paulheim 23 Similarity vs. Relatedness • s-RDF2vec – using different walk strategies – combining different vector spaces (weighted combinations are possible) • 10 closest neighbors to Mannheim: Portisch et al. (under review): s-RDF2vec: Injecting Knowledge Graph Structure Into RDF2vec Entity Embeddings.
  • 24. 02/14/22 Heiko Paulheim 24 To Materialize or Not to Materialize? May I ask you a question? Sure, go ahead!
  • 25. 02/14/22 Heiko Paulheim 25 To Materialize or Not to Materialize? Rumor has it that RDF2vec performs worse if you run a reasoner to add inferences to the graph first... ???
  • 26. 02/14/22 Heiko Paulheim 26 To Materialize or Not to Materialize? I know it sounds counter intuitive... Hmmm...
  • 27. 02/14/22 Heiko Paulheim 27 To Materialize or Not to Materialize? Hmmm… sounds reasonable. (Pun intended) Okay, there might be an explanation...
  • 28. 02/14/22 Heiko Paulheim 28 To Materialize or Not to Materialize? We need more beer experiments
  • 29. 02/14/22 Heiko Paulheim 29 Back Home... We need more beer experiments OK, let’s go!
  • 30. 02/14/22 Heiko Paulheim 30 Experimental Setup RDF2vec + inferences ● Classification ● Regression ● Entity Similarity ● Entity Relatedness ● Document Similarity (a) (b) Iana and Paulheim (2020): More is not always better: The negative impact of a-box materialization on RDF2vec knowledge graph embeddings
  • 31. 02/14/22 Heiko Paulheim 31 Experimental Results • Classification: unmaterialized is better in 60/80 cases • Regression: unmaterialized is better in 39/60 cases • Entity similarity: unmaterialized is better in 16/20 cases • Entity relatedness: unmaterialized is better in 13/20 cases • But: document similarity: materialized is always better – task has a very different nature – more heterogeneity Iana and Paulheim (2020): More is not always better: The negative impact of a-box materialization on RDF2vec knowledge graph embeddings
  • 32. 02/14/22 Heiko Paulheim 32 To Materialize or not to Materialize? • Explanation 1: materialization skews property distributions Iana and Paulheim (2020): More is not always better: The negative impact of a-box materialization on RDF2vec knowledge graph embeddings
  • 33. 02/14/22 Heiko Paulheim 33 To Materialize or not to Materialize? • Explanation 2 is a bit more complex... • Thought experiment: – DBpedia mostly does not include persons’ gender – learn classifier for gender • Spouse is a symmetric property, but… – distribution is highly uneven – 80% of all subjects of spouse are women spouse Ayda_Field spouse Robbie_Williams . Graells-Garrido et al: (2012): First Women, Second Sex: Gender Bias in Wikipedia
  • 34. 02/14/22 Heiko Paulheim 34 To Materialize or not to Materialize? • Thought experiment: learn classifier for gender • Spouse is a symmetric property, but… – 80% of all subjects of spouse are women • Assume that an embedding captures that information – e.g., order-aware RDF2vec → a downstream classifier can reach >80% accuracy • On the other hand – Materialization completely erases that information • Bottom line: missing information can be a signal – Machine learning terminology: MAR vs. MNAR Iana and Paulheim (2021): More is not Always Better: The Negative Impact of A-box Materialization on RDF2vec Knowledge Graph Embeddings
  • 35. 02/14/22 Heiko Paulheim 35 Dynamic Knowledge Graphs • In theory, RDF2vec can also produce embeddings for dynamic knowledge graphs to a certain extent – given that the neighbors are all known – Experiments are still under way
  • 36. 02/14/22 Heiko Paulheim 36 Understanding the RDF2vec Model Zoo • Variations – Walk extraction (e.g., classic, s-RDF2vec, e-RDF2vec) – Ordered vs. non-ordered – Skip-gram vs. CBOW • This alone gives us 12 combinations of how to train an RDF2vec model • We assume that not all of them are equally good
  • 37. 02/14/22 Heiko Paulheim 37 Understanding the RDF2vec Model Zoo
  • 38. 02/14/22 Heiko Paulheim 38 Understanding the RDF2vec Model Zoo • Variations – Walk extraction (e.g., classic, s-RDF2vec, e-RDF2vec) – Ordered vs. non-ordered – Skip-gram vs. CBOW • Build a systematic collection of basic classification problems • For example, r.{e} vs. ¬r.{e} – e.g., person born in NYC vs. person not born in NYC – here, s-RDF2vec should not be able to solve this
  • 39. 02/14/22 Heiko Paulheim 39 Embeddings and Interpretability • Hot topic: Explainable AI – Knowledge Graphs are a favorable ingredient – Human/machine interpretable knowledge → explainable systems • However: – Embeddings replace interpretable axioms with numeric vectors over non-interpretable dimensions – Where did the semantics go? Paulheim (2018): Make Embeddings Semantic Again!
  • 40. 02/14/22 Heiko Paulheim 40 The 2009 Semantic Web Layer Cake
  • 41. 02/14/22 Heiko Paulheim 41 The 2018 Semantic Web Layer Cake Embeddings
  • 42. 02/14/22 Heiko Paulheim 42 Towards Semantic Vector Space Embeddings cartoon superhero Paulheim (2018): Make Embeddings Semantic Again!
  • 43. 02/14/22 Heiko Paulheim 43 Towards Semantic Vector Space Embeddings cartoon superhero • Approach 1: learn interpretation function • Each dimension of the embedding model is a target for a separate learning problem • Learn a function to explain the dimension • E.g.: • Just an approximation used for explanations and justifications y≈−|∃character .Superhero|
  • 44. 02/14/22 Heiko Paulheim 44 Towards Semantic Vector Space Embeddings cartoon superhero • Approach 2: learn inherently interpretable embeddings • Step 1: learn typical patterns that exist in a knowledge graph – e.g., graph pattern learning – e.g., Horn clauses • Step 2a: use those patterns as embedding dimensions – probably not low dimensional • Step 2b: compact the space – e.g., use dimensions for mutually exclusive patterns
  • 45. 02/14/22 Heiko Paulheim 45 Towards Semantic Vector Space Embeddings • Different angle: learn interpretation for similarity function ~similar type ~same country ~connected to same entity
  • 46. 02/14/22 Heiko Paulheim 46 Explaining Predictions with RDF2vec • Recap: we can, in principle, create vectors for new entities • Some explanation models, like LIME, do this: – Create new artificial entities by perturbation • In our KG context: add/remove connections • Predict for new entities • Learn explanation for predictions • With that approach, LIME should be applicable to predictions w/ RDF2vec Ribeiro et al. (2016): "Why Should I Trust You?": Explaining the Predictions of Any Classifier
  • 47. 02/14/22 Heiko Paulheim 47 Summary • Knowledge Graph Embeddings with RDF2vec – Effective processing of large-scale knowledge sources • Light variant possible for scalability – Variations visited: walk extraction, order-awareness, materialization, ... – Encoding of similarity and/or relatedness • RDF2vec: explicit trade-off is possible! – Additional insights that are not explicit in the graph • aka latent semantics
  • 48. 02/14/22 Heiko Paulheim 48 More on RDF2vec • Collection of – Implementations – Pre-trained models – >45 use cases in various domains
  • 49. 02/14/22 Heiko Paulheim 49 Thank you! http://www.heikopaulheim.com @heikopaulheim
  • 50. 02/14/22 Heiko Paulheim 50 New Adventures in RDF2vec Heiko Paulheim University of Mannheim Heiko Paulheim