In the past years, sophisticated methods for extracting knowledge graphs from Wikipedia, like DBpedia,YAGO, and CaLiGraph, have been developed. In this talk, I revisit some of these methods and examine if and how they can be replaced by prompting a large language model like ChatGPT.
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge Extraction or Knowledge Hallucination?
1. 05/29/23 Heiko Paulheim 1
Knowledge Graph Generation
from Wikipedia in the Age of ChatGPT:
Knowledge Extraction
or Knowledge Hallucination?
Heiko Paulheim
5. 05/29/23 Heiko Paulheim 5
Wikipedia as a Knowledge Graph
• Wikipedia based Knowledge Graphs
– DBpedia: launched 2007
– YAGO: launched 2008
– Extraction from Wikipedia
using mappings & heuristics
• Present
– Two of the most used knowledge graphs
– ...with Wikidata catching up
8. 05/29/23 Heiko Paulheim 8
Wikipedia as a Knowledge Graph
• Mapping to a central schema/ontology
University
chancellor Person
Organisation
Agent
campus Place
range
range
domain
domain
subclass of
subclass of
subclass of
13. 05/29/23 Heiko Paulheim 13
DBpedia Extraction, ChatGPT Style
• Looks nice, but there are some glitches…
– Handling datatypes:
– Handling coordinates:
• But maybe we can resolve this with better prompt engineering...
20. 05/29/23 Heiko Paulheim 20
Knowledge Graph Hallucination, ChatGPT Style
• My first reaction: • My second reaction:
21. 05/29/23 Heiko Paulheim 21
Knowledge Graph Hallucination, ChatGPT Style
Mannheim is a city in the southwestern part of
Germany, the third-largest in the German state of
Baden-Württemberg after Stuttgart and Karlsruhe with a
2019 population of approximately 309,000 inhabitants.
24. 05/29/23 Heiko Paulheim 24
Flashback to 2018
• Much of the missing information is in the Wikipedia text
• ...and already in the abstracts
• Abstracts follow a structure
municipality state country
+
+
-
-
25. 05/29/23 Heiko Paulheim 25
Flashback to 2018
• The first three populated places linked in an abstract about a town
are that town’s municipality, state, and country
• All genres linked in an abstract about a writer
are that writer’s genres
• The first place linked in an abstract about a person
is that person’s birthplace
• The types are already in DBpedia
• Automatically finding those patterns:
We can use existing relations as training data
– Using a local closed world assumption for creating negative examples
26. 05/29/23 Heiko Paulheim 26
Flashback to 2018
• Target: use only models that have >95% precision
– We want extra knowledge, but not much extra noise
• Outcome
– Models could be learned for 99 relations
– Almost 1M additional statements
29. 05/29/23 Heiko Paulheim 29
Relation Extraction from Wikipedia Abstracts:
ChatGPT Style
Only the first three
facts are extracted
from the abstract
30. 05/29/23 Heiko Paulheim 30
Relation Extraction from Wikipedia Abstracts:
ChatGPT Style
DBpedia uses
dbo:federalState here
31. 05/29/23 Heiko Paulheim 31
Relation Extraction from Wikipedia Abstracts:
ChatGPT Style
• In the original paper, we trained general ML models...
32. 05/29/23 Heiko Paulheim 32
Flashback to 2018
• We used solely position and type features
– Nothing language specific
– i.e.: we can apply this to any language
• Extension to 12 largest language editions of DBpedia
– Exploiting inter-language links
– 187 relations (was: 99), 1.6M axioms (was: 1M), at precision >0.95
– #statements per language correlates with #language links to English!
38. 05/29/23 Heiko Paulheim 38
Relation Extraction from Wikipedia Abstracts:
ChatGPT Style
Mostly hallucination…
this is not the population
value from the abstract!
41. 05/29/23 Heiko Paulheim 41
Knowledge Graph Hallucination, ChatGPT Style
• ChatGPT seemed to be eager on “extracting” coordinates from
infoboxes and abstracts
42. 05/29/23 Heiko Paulheim 42
Knowledge Graph Hallucination, ChatGPT Style
• At least, all are different coordinates in Mannheim
43. 05/29/23 Heiko Paulheim 43
Funny Footnote –
Even more Knowledge Hallucination
• Trying to create the input file for Google Map on the previous slide:
Even more hallucination…
many of these values
are not
from the responses
45. 05/29/23 Heiko Paulheim 45
Cat2Ax: Axiomatizing Wikipedia Categories
dbo:Album
dbo:artist.{dbr:Nine_Inch_Nails}
dbo:genre.{dbr:Rock_Music}
See: ISWC 2019 Paper on Uncovering the Semantics of Wikipedia Categories
46. 05/29/23 Heiko Paulheim 46
Cat2Ax: Axiomatizing Wikipedia Categories
– Frequency: how often does the pattern occur in a category?
• i.e.: share of instances that have dbo:genre.{dbr.Rock_Music}?
– Lexical score: likelihood of term as a surface form of object
• i.e.: how often is Rock used to refer to dbr:Rock_Music?
– Sibling score: how likely are sibling categories sharing similar patterns?
• i.e., are there sibling categories with a high score for dbo:genre?
51. 05/29/23 Heiko Paulheim 51
CaLiGraph Example
Category: Musical Groups established
in 1987
List of symphonic metal bands
Category: Swedish death metal bands
List of Swedes in Music
56. 05/29/23 Heiko Paulheim 56
Improving Entity Coverage:
Lists in Wikipedia
• Only existing pages have categories
– Lists may also link to non-existing pages
57. 05/29/23 Heiko Paulheim 57
Pushing Entity Coverage Further
• Beyond red links (2020) • Beyond explicit lists (2021)
71. 05/29/23 Heiko Paulheim 71
Take Aways
• Basic KG creation with ChatGPT can work
– At least in a human in the loop setup
• Reinforcement signals might help here
– Main challenge: hallucinations
• On the other hand: consider them
“extraction of additional facts”
• Isn’t that just like heuristic KG completion?
• Disclaimer:
– No PhD students were harmed or replaced by ChatGPT.
• Full ChatGPT protocol available here.
72. 05/29/23 Heiko Paulheim 72
Knowledge Graph Generation
from Wikipedia in the Age of ChatGPT:
Knowledge Extraction
or Knowledge Hallucination?
Heiko Paulheim