DBpedia Spotlight is a tool employed in the Extraction stage of the LOD Lyfe Cycle, performing Entity Recognition and Linking. Although the tool currently specializes in English language, the support for other languages is currently being tested, and demos for German, Dutch and others are available or underway. The tool can be used to enable faceted browsing, semantic search, among other applications. In this webinar we will describe what is DBpedia Spotlight, how it works and how can you benefit from it in your application.
If you are interested in Linked (Open) Data principles and mechanisms, LOD tools & services and concrete use cases that can be realised using LOD then join us in the free LOD2 webinar series!
http://lod2.eu/BlogPost/webinar-series
LOD2 Plenary Meeting 2011: Institute Mihajlo Pupin – Partner Introduction
LOD2 Webinar Series: DBpedia Spotlight
1. Creating Knowledge out of Interlinked Data
LOD2 Webinar . 26.02.2013 . Page 1 http://lod2.eu
2. Creating Knowledge out of Interlinked Data
LOD2 is a large-scale integrating project co-funded by the European
Commission within the FP7 Information and Communication Technologies
Work Programme. This 4-year project comprises leading Linked Open
Data technology researchers, companies, and service providers. Coming
from across 12 countries the partners are coordinated by the Agile
Knowledge Engineering and Semantic Web Research Group at the
University of Leipzig, Germany.
LOD2 will integrate and syndicate Linked Data with existing large-scale
applications. The project shows the benefits in the scenarios of Media and
Publishing, Corporate Data intranets and eGovernment.
http://lod2.eu
LOD2 Webinar . 29.11.2011 . Page 2 http://lod2.eu
3. Creating Knowledge out of Interlinked Data
Once per month the LOD2 webinar series offer a free webinar about
tools and services along the Linked Open Data Life Cycle.
Stay with us and learn more about acquisition, editing, composing,
connected applications – and finally publishing Linked Open Data.
http://lod2.eu
LOD2 Webinar . 29.11.2011 . Page 3 http://lod2.eu
4. Creating Knowledge out of Interlinked Data
Agenda
Profiles: Pablo N Mendes and the DBpedia Spotlight team
Linked Data life cycle and role of DBpedia Spotlight within LOD2
What is DBpedia Spotlight
Demonstration
Lessons Learned and Next steps
Q&A
LOD2 Webinar . 26.02.2013. Page 4 http://lod2.eu
5. Creating Knowledge out of Interlinked Data
Pablo N. Mendes and the DBpedia Spotlight team
Pablo N. Mendes
Research Associate at the Co-maintainers
Open Knowledge Foundation, Max Jakob (Neofonie Gmbh)
Germany
Joachim Daiber (MS student at
http://okfn.de
the Rijksuniversiteit Groningen)
Interests:
- Information Extraction, Integration,
Retrieval and Exploration
Contributors
More info:
Sandro Coelho (BS student at UFJF, Brazil)
http://pablomendes.com
Chris Hokamp (PhD student at University
of North Texas, USA)
Funding Dirk Weissenborn (MS student at
LOD2, DICODE, Google Summer University of Dresden, Germany)
of Code 2012, IKS Liu Zhengzhong (now PhD student at
Carnegie Mellon University, USA)
Hosting Marcus Nitschke (student at U. Leipzig)
U.Mannheim, MTA SZTAKI, ...
Globo.com, RNP.br
Full list on GitHub.
LOD2 Webinar . 26.02.2013. Page 5 http://lod2.eu
6. Creating Knowledge out of Interlinked Data
Linked Data Life Cycle
Manual Interlinking
revision Fusing Classification
authoring Enrichment
Storage Quality
Querying Analysis
Extraction Search Evolution
Browsing Repair
Exploration
LOD2 Webinar . 26.02.2013. Page 6 http://lod2.eu
7. Creating Knowledge out of Interlinked Data
Linked Data Life Cycle
Manual Interlinking
revision Fusing Classification
authoring Enrichment
Storage Quality
Querying Analysis
Extraction Search Evolution
Browsing Repair
Exploration
LOD2 Webinar . 26.02.2013. Page 7 http://lod2.eu
8. Creating Knowledge out of Interlinked Data
Shedding Light on the Web of Documents
LOD2 Webinar . 26.02.2013. Page 8 http://lod2.eu
9. Creating Knowledge out of Interlinked Data
Named Entity Recognition/Disambiguation
• Automatically put Wikipedia links to (plain) text.
LOD2 Webinar . 26.02.2013. Page 9 http://lod2.eu
10. Creating Knowledge out of Interlinked Data
Named Entity Recognition/Disambiguation
• Automatically put Wikipedia links to (plain) text.
• 1. Recognition: find „interesting“ strings
• s urface form s
LOD2 Webinar . 26.02.2013. Page 10 http://lod2.eu
11. Creating Knowledge out of Interlinked Data
Named Entity Recognition/Disambiguation
• Automatically put Wikipedia links to (plain) text.
• 1. Recognition: find „interesting“ strings
• s urface form s
LOD2 Webinar . 26.02.2013. Page 11 http://lod2.eu
12. Creating Knowledge out of Interlinked Data
Named Entity Recognition/Disambiguation
• Automatically put Wikipedia links to (plain) text.
• 1. Recognition: find „interesting“ strings
• s urface form s
• 2. Disambiguation: choose appropriate Wikipedia page
• Each Wikipedia page represents an e ntity
• Every surface form can have multiple candidate entities for linking
LOD2 Webinar . 26.02.2013. Page 12 http://lod2.eu
13. Creating Knowledge out of Interlinked Data
Michael Jackson died in 2007.
LOD2 Webinar . 26.02.2013. Page 13 http://lod2.eu
14. Creating Knowledge out of Interlinked Data
Michael Jackson died in 2007.
• Recognition: Find surface forms
LOD2 Webinar . 26.02.2013. Page 14 http://lod2.eu
15. Creating Knowledge out of Interlinked Data
[Michael Jackson] died in 2007.
• Recognition: Find surface forms
LOD2 Webinar . 26.02.2013. Page 15 http://lod2.eu
16. Creating Knowledge out of Interlinked Data
[Michael Jackson] died in 2007.
• Disambiguation: Choose correct entity
LOD2 Webinar . 26.02.2013. Page 16 http://lod2.eu
17. Creating Knowledge out of Interlinked Data
[Michael Jackson] died in 2007.
• Disambiguation: Choose correct entity
• Candidates for [Michael Jackson]
LOD2 Webinar . 26.02.2013. Page 17 http://lod2.eu
18. Creating Knowledge out of Interlinked Data
[Michael Jackson] died in 2007.
• Disambiguation: Choose correct entity
• Candidates for [Michael Jackson]
LOD2 Webinar . 26.02.2013. Page 18 http://lod2.eu
19. Creating Knowledge out of Interlinked Data
contex
t
[Michael Jackson] died in 2007.
• Disambiguation: Choose correct entity
• Candidates for [Michael Jackson]
LOD2 Webinar . 26.02.2013. Page 19 http://lod2.eu
20. Creating Knowledge out of Interlinked Data
less dis
tinctive
contex
t
[Michael Jackson] came to Paris.
• Disambiguation: Choose correct entity
• Candidates for [Michael Jackson]
Singer Journalist
LOD2 Webinar . 26.02.2013. Page 20 http://lod2.eu
21. Creating Knowledge out of Interlinked Data
less dis
tinctive
contex
t
[Michael Jackson] came to Paris.
• Disambiguation: Choose correct entity
• Candidates for [Michael Jackson]
Singer Journalist
LOD2 Webinar . 26.02.2013. Page 21 http://lod2.eu
22. Creating Knowledge out of Interlinked Data
Probabilities
• P(entity | surface form)
• Who is typically meant by a name?
• For example, given [Michael Jackson] (and ignoring the context), what
are the probabilities of the candidates?
• Michael J ackson (singer) 0.98
• Michael J ackson (journalist) 0.02
• Other useful probabilities:
• P(surface form | entity), P(entity), P(surface form)
• Estimate Maximum Likelihood using Wikipedia page links
LOD2 Webinar . 26.02.2013. Page 22 http://lod2.eu
23. Creating Knowledge out of Interlinked Data
Data Processing
• Two pipelines
− Single machine with Scala
− MapReduce-style with Apache Pig
• Apache Pig for analyzing large datasets on top of Hadoop
− Data-flow language
− Think in tuples, bags and maps
− load, filter, join, group by, store, …
− from which Pig derives a MapReduce plan
− We build on p ig nlp ro c , started by Olivier Grisel (Stanbol)
LOD2 Webinar . 26.02.2013. Page 23 http://lod2.eu
24. Creating Knowledge out of Interlinked Data
Probability estimation
count( surface form, entity )
• P( entity | surface form ) =
count( surface form )
• P( Michael J ackson (singer) | Michael J ackson) = 0.98
• P( Michael J ackson (journalist) | Michael J ackson) = 0.02
• Check the project web for estimation of other scores
– Other probabilities...
– TF*ICF (modification of TF*IDF) and others...
LOD2 Webinar . 26.02.2013. Page 24 http://lod2.eu
25. Creating Knowledge out of Interlinked Data
LOD2 Webinar . 26.02.2013. Page 25 http://lod2.eu
26. Creating Knowledge out of Interlinked Data
Annotate
http://dbpedia.org/resource/LSU_Tigers
LOD2 Webinar . 26.02.2013. Page 26 http://lod2.eu
27. Creating Knowledge out of Interlinked Data
Annotate
http://dbpedia.org/resource/LSU_Tigers
http://dbpedia.org/resource/No. 4 (album)
LOD2 Webinar . 26.02.2013. Page 27 http://lod2.eu
28. Creating Knowledge out of Interlinked Data
Top K Candidates
LSU_Tigers
Louisiana
State
University
LOD2 Webinar . 26.02.2013. Page 28 http://lod2.eu
29. Creating Knowledge out of Interlinked Data
Demo:
– http://spotlight.dbpedia.org/demo/
Web Service:
– http://spotlight.dbpedia.org/rest/{API}
– APIs:
• Phrase Recognition (/spot), Disambiguation (/disambiguation)
• Top K disambiguations (/candidates)
• Annotation (/annotation)
Source code:
– https://github.com/dbpedia-spotlight/dbpedia-spotlight/
Apache V2 License!
LOD2 Webinar . 26.02.2013. Page 29 http://lod2.eu
30. Creating Knowledge out of Interlinked Data
Lessons learned
A generic solution to the problem is tough
– Most of the research focuses on solving very specialized cases
– Some entity types are harder than others
– Some types of text are harder than others
Yet, users expect it to “just work”.
We are focusing on a generic core that can be easily customized.
LOD2 Webinar . 26.02.2013. Page 30 http://lod2.eu
31. Creating Knowledge out of Interlinked Data
Next steps
More experiments with DBpedia Spotlight in the context of LOD2
Use Case packages: Wolters Kluwer (legal domain, German
language), Emergency Response,
Automating build process and release to LOD2 Stack
Expanding to other languages
Easier adaptation to other knowledge bases beyond DBpedia
New algorithms, collective disambiguation, etc.
LOD2 Webinar . 26.02.2013. Page 31 http://lod2.eu
32. Creating Knowledge out of Interlinked Data
Credits
Jingle R.E.M., Martin Kaltenböck, Florian Kondert
Coordination Thomas Thurner
Martin Kaltenböck
Moderation Martin Kaltenböck
Presented by Pablo N. Mendes
Slides from Pablo N. Mendes, Max Jakob, Joachim Daiber
LOD2 Webinar . 26.02.2013 . Page 32 http://lod2.eu
33. Creating Knowledge out of Interlinked Data
Hope you enjoyed staying with us – if you need more detailed
information, visit us at www.lod2.eu and let us know how we can
improve to meet your expectations!
Don’t forget to register for our next webinar
27.03.2013 – CKAN and PublicData.eu (OKFN)
April – Vituoso 7 (Openlink Software)
Have a great day and don’t forget ...
http://lod2.eu
LOD2 Webinar . 29.11.2011 . Page 33 http://lod2.eu
34. Creating Knowledge out of Interlinked Data
http://lod2.eu
LOD2 Webinar . 29.11.2011 . Page 34 http://lod2.eu