This document discusses different types of "bloopers" or errors that exist in DBpedia data, including redundant properties, incorrect values, confusing properties, and differences from the corresponding Wikipedia pages. Redundant properties express the same information using different properties. Incorrect values assign wrong objects to subjects and properties. Confusing properties have unclear or undefined meanings. Some DBpedia pages also differ significantly from the Wikipedia pages they were extracted from. Natural language processing may help improve data quality in DBpedia.
1. Fariz Darari
fadirra _a.t_ _d.o.t_ com
@mrlogix
Link to these slides: http://www.slideshare.net/fadirra/dbpedia-bloopers
CC BY - NC - ND
2. Bloopers Types
• Redundant Properties
• Incorrect Values
• Confusing Properties
• Different with Corresponding Wiki Pages
DBpedia Bloopers 2
3. Redundant Properties
• Properties that have the same intended meaning
but are expressed differently
• Example:
dbpprop:placeOfBurial vs dbpprop:placeofburial
3DBpedia Bloopers
7. Incorrect Values
• Good Subject, Good Predicate, Bad Object
• Example (http://dbpedia.org/page/Johannesburg -
24 Sep 13):
7DBpedia Bloopers
8. Incorrect Values (cont.)
• http://dbpedia.org/page/Indonesia (24 Sep 13):
8DBpedia Bloopers
“Flag of Indonesia.svg” should be an image URI
9. Incorrect Values (cont.)
• http://dbpedia.org/page/Indonesia (24 Sep 13):
9DBpedia Bloopers
Not clear why “National ideology: Pancasila”
is used as a value for dbpprop:nationalMotto
10. Incorrect Values (cont.)
• http://dbpedia.org/page/Coca-Cola_Amatil (24 Sep 13):
10DBpedia Bloopers
Chief_executive_officer is too general
to be a value for keyPerson
11. Incorrect Values (cont.)
• http://dbpedia.org/page/Komodo_Airport (24 Sep 13):
11
Indonesia should not be a city
DBpedia Bloopers
13. Incorrect Values (cont.)
• http://dbpedia.org/page/Monty_Python (24 Sep 13):
13DBpedia Bloopers
The strings “John Cleese” and “Terry Gilliam”
are also its members
14. Incorrect Values (cont.)
• http://dbpedia.org/page/Air_France_Flight_1611 (24 Sep 13):
14
Whereas on its corresponding Wiki page
(http://en.wikipedia.org/wiki/Air_France_Flight_1611):
DBpedia Bloopers
15. Incorrect Values (cont.)
• http://dbpedia.org/page/Tom_Cat (24 Sep 13):
15DBpedia Bloopers
Should be clear why it is incorrect :)
16. Incorrect Values (cont.)
• http://dbpedia.org/page/Farid_Kamil (24 Sep 13):
16DBpedia Bloopers
Male is an occupation? Nice, can I resign? :)
17. Incorrect Values (cont.)
• Example (at http://dbpedia.org/page/Apa_Kata_Hati):
17
*it’s a movie
DBpedia Bloopers
It’s released on the 29 of [guess the month + year]
19. Confusing Properties
• Some properties are unclear,
their intended meaning is not really described.
• Example at http://dbpedia.org/page/Italy -
24 Sep 13:
• What is dbpprop:titlebar? Anybody know?
19DBpedia Bloopers
22. Different with Wiki Pages
Wiki pages and their corresponding DBpedia data
are sometimes quite different.
22DBpedia Bloopers
23. Different with Wiki Pages (cont.)
• http://dbpedia.org/page/Aqua (24 Sep 13):
23DBpedia Bloopers
compared to ...
24. Different with Wiki Pages (cont.)
• Wiki page (http://en.wikipedia.org/wiki/Aqua -
24 Sep 13):
24DBpedia Bloopers
25. Different with Wiki Pages (cont.)
• http://dbpedia.org/page/Nasi_kebuli (24 Sep 13):
25DBpedia Bloopers
Here the (main) ingredient is only rice
compared to ...
26. Different with Wiki Pages (cont.)
• Wiki page (http://en.wikipedia.org/wiki/Nasi_kebuli -
24 Sep 13):
26DBpedia Bloopers
Here the (main) ingredients are at least of three kinds:
rice, minyak samin (ghee), and goat meat
27. Notes
• DBpedia data is derived from Wikipedia.
Consequently, errors on Wikipedia
might still exist on DBpedia.
• Perhaps Natural Language techniques
can help to improve DBpedia.
• After all, DBpedia is one of the coolest SemWeb
applications!
27DBpedia Bloopers
Left to right from top: Hafez David Ban Ki-moon Chinhua Achebe Aryabhata Händel Confucius Kofi Annan Chief Joseph Plato Ronaldo Albert Einstein Errol Flynn Mohandas Gandhi Ole Henrik Magga an American farmer Adam Vitruvian Man Man with child Heracles with Telephus
*I know cctld means top-level domain. But, for some (non-IT) people, it is unclear. The property page of dbpprop:cctld has no explanation about its intended meaning
*I know cctld means top-level domain. But, for some (non-IT) people, it is unclear. The property page of dbpprop:cctld has no explanation about its intended meaning
*Here the ingredient is only rice
*Here the (main) ingredients are at least of three kinds: rice, minyak samin (ghee), goat meat