SlideShare une entreprise Scribd logo
1  sur  34
Télécharger pour lire hors ligne
Creating Knowledge out of Interlinked Data




LOD2 Webinar . 26.02.2013 . Page 1                     http://lod2.eu
Creating Knowledge out of Interlinked Data




        LOD2 is a large-scale integrating project co-funded by the European
        Commission within the FP7 Information and Communication Technologies
        Work Programme. This 4-year project comprises leading Linked Open
        Data technology researchers, companies, and service providers. Coming
        from across 12 countries the partners are coordinated by the Agile
        Knowledge Engineering and Semantic Web Research Group at the
        University of Leipzig, Germany.

        LOD2 will integrate and syndicate Linked Data with existing large-scale
        applications. The project shows the benefits in the scenarios of Media and
        Publishing, Corporate Data intranets and eGovernment.




                                                                                     http://lod2.eu
LOD2 Webinar . 29.11.2011 . Page 2                                                    http://lod2.eu
Creating Knowledge out of Interlinked Data




        Once per month the LOD2 webinar series offer a free webinar about
        tools and services along the Linked Open Data Life Cycle.

        Stay with us and learn more about acquisition, editing, composing,
        connected applications – and finally publishing Linked Open Data.




                                                                             http://lod2.eu
LOD2 Webinar . 29.11.2011 . Page 3                                            http://lod2.eu
Creating Knowledge out of Interlinked Data



Agenda


    Profiles: Pablo N Mendes and the DBpedia Spotlight team

    Linked Data life cycle and role of DBpedia Spotlight within LOD2

    What is DBpedia Spotlight

    Demonstration

    Lessons Learned and Next steps

    Q&A




LOD2 Webinar . 26.02.2013. Page 4                                       http://lod2.eu
Creating Knowledge out of Interlinked Data



Pablo N. Mendes and the DBpedia Spotlight team

       Pablo N. Mendes
       Research Associate at the                   Co-maintainers
           Open Knowledge Foundation,              Max Jakob (Neofonie Gmbh)
           Germany
                                                   Joachim Daiber (MS student at
           http://okfn.de
                                                   the Rijksuniversiteit Groningen)
        Interests:
       - Information Extraction, Integration,
           Retrieval and Exploration
                                       Contributors
        More info:
                                       Sandro Coelho (BS student at UFJF, Brazil)
       http://pablomendes.com
                                       Chris Hokamp (PhD student at University
                                       of North Texas, USA)
Funding                                Dirk Weissenborn (MS student at
LOD2, DICODE, Google Summer            University of Dresden, Germany)
of Code 2012, IKS                      Liu Zhengzhong (now PhD student at
                                       Carnegie Mellon University, USA)
Hosting                                Marcus Nitschke (student at U. Leipzig)
U.Mannheim, MTA SZTAKI,                ...
Globo.com, RNP.br
                                       Full list on GitHub.
LOD2 Webinar . 26.02.2013. Page 5                                         http://lod2.eu
Creating Knowledge out of Interlinked Data



Linked Data Life Cycle



                     Manual             Interlinking
                     revision              Fusing      Classification
                    authoring                           Enrichment




          Storage                                               Quality
          Querying                                              Analysis




                    Extraction             Search       Evolution
                                         Browsing        Repair
                                        Exploration

LOD2 Webinar . 26.02.2013. Page 6                                          http://lod2.eu
Creating Knowledge out of Interlinked Data



Linked Data Life Cycle



                     Manual             Interlinking
                     revision              Fusing      Classification
                    authoring                           Enrichment




          Storage                                               Quality
          Querying                                              Analysis




                    Extraction             Search       Evolution
                                         Browsing        Repair
                                        Exploration

LOD2 Webinar . 26.02.2013. Page 7                                          http://lod2.eu
Creating Knowledge out of Interlinked Data




                    Shedding Light on the Web of Documents




LOD2 Webinar . 26.02.2013. Page 8                            http://lod2.eu
Creating Knowledge out of Interlinked Data




  Named Entity Recognition/Disambiguation
• Automatically put Wikipedia links to (plain) text.




 LOD2 Webinar . 26.02.2013. Page 9                      http://lod2.eu
Creating Knowledge out of Interlinked Data




Named Entity Recognition/Disambiguation
• Automatically put Wikipedia links to (plain) text.




• 1. Recognition: find „interesting“ strings
    •    s urface form s




LOD2 Webinar . 26.02.2013. Page 10                     http://lod2.eu
Creating Knowledge out of Interlinked Data




 Named Entity Recognition/Disambiguation
• Automatically put Wikipedia links to (plain) text.




• 1. Recognition: find „interesting“ strings
    •    s urface form s




LOD2 Webinar . 26.02.2013. Page 11                     http://lod2.eu
Creating Knowledge out of Interlinked Data




   Named Entity Recognition/Disambiguation
• Automatically put Wikipedia links to (plain) text.




• 1. Recognition: find „interesting“ strings
    •    s urface form s
• 2. Disambiguation: choose appropriate Wikipedia page
    •    Each Wikipedia page represents an e ntity
    •    Every surface form can have multiple candidate entities for linking
 LOD2 Webinar . 26.02.2013. Page 12                                            http://lod2.eu
Creating Knowledge out of Interlinked Data




Michael Jackson died in 2007.




LOD2 Webinar . 26.02.2013. Page 13                     http://lod2.eu
Creating Knowledge out of Interlinked Data




Michael Jackson died in 2007.
• Recognition: Find surface forms




 LOD2 Webinar . 26.02.2013. Page 14                     http://lod2.eu
Creating Knowledge out of Interlinked Data




[Michael Jackson] died in 2007.
• Recognition: Find surface forms




 LOD2 Webinar . 26.02.2013. Page 15                     http://lod2.eu
Creating Knowledge out of Interlinked Data




[Michael Jackson] died in 2007.


• Disambiguation: Choose correct entity




 LOD2 Webinar . 26.02.2013. Page 16                     http://lod2.eu
Creating Knowledge out of Interlinked Data




[Michael Jackson] died in 2007.


• Disambiguation: Choose correct entity
   •     Candidates for               [Michael Jackson]




 LOD2 Webinar . 26.02.2013. Page 17                       http://lod2.eu
Creating Knowledge out of Interlinked Data




       [Michael Jackson] died in 2007.
• Disambiguation: Choose correct entity
   •     Candidates for               [Michael Jackson]




 LOD2 Webinar . 26.02.2013. Page 18                       http://lod2.eu
Creating Knowledge out of Interlinked Data


                                                          contex
                                                                t
       [Michael Jackson] died in 2007.
• Disambiguation: Choose correct entity
   •     Candidates for               [Michael Jackson]




 LOD2 Webinar . 26.02.2013. Page 19                            http://lod2.eu
Creating Knowledge out of Interlinked Data
                                                          less dis
                                                                   tinctive
                                                             contex
                                                                      t
 [Michael Jackson] came to Paris.
• Disambiguation: Choose correct entity
   •     Candidates for               [Michael Jackson]


    Singer                                                      Journalist




 LOD2 Webinar . 26.02.2013. Page 20                                   http://lod2.eu
Creating Knowledge out of Interlinked Data

                                                          less dis
                                                                   tinctive
                                                             contex
                                                                      t
 [Michael Jackson] came to Paris.
• Disambiguation: Choose correct entity
   •     Candidates for               [Michael Jackson]


    Singer                                                     Journalist




 LOD2 Webinar . 26.02.2013. Page 21                                   http://lod2.eu
Creating Knowledge out of Interlinked Data




Probabilities
• P(entity | surface form)
   •     Who is typically meant by a name?
   •     For example, given [Michael Jackson] (and ignoring the context), what
         are the probabilities of the candidates?
   •     Michael J ackson (singer) 0.98
   •     Michael J ackson (journalist) 0.02

• Other useful probabilities:
   •     P(surface form | entity), P(entity), P(surface form)


• Estimate Maximum Likelihood using Wikipedia page links

 LOD2 Webinar . 26.02.2013. Page 22                                    http://lod2.eu
Creating Knowledge out of Interlinked Data




  Data Processing
• Two pipelines
      −    Single machine with Scala
      −    MapReduce-style with Apache Pig

• Apache Pig for analyzing large datasets on top of Hadoop
      −    Data-flow language
      −    Think in tuples, bags and maps
      −    load, filter, join, group by, store, …
      −    from which Pig derives a MapReduce plan
      −    We build on p ig nlp ro c , started by Olivier Grisel (Stanbol)


 LOD2 Webinar . 26.02.2013. Page 23                                   http://lod2.eu
Creating Knowledge out of Interlinked Data




 Probability estimation
                                                 count( surface form, entity )
• P( entity | surface form ) =
                                                        count( surface form )

    •    P( Michael J ackson (singer) | Michael J ackson) = 0.98
    •    P( Michael J ackson (journalist) | Michael J ackson) = 0.02




• Check the project web for estimation of other scores
    – Other probabilities...
    – TF*ICF (modification of TF*IDF) and others...

 LOD2 Webinar . 26.02.2013. Page 24                                              http://lod2.eu
Creating Knowledge out of Interlinked Data




LOD2 Webinar . 26.02.2013. Page 25                     http://lod2.eu
Creating Knowledge out of Interlinked Data




Annotate
                                     http://dbpedia.org/resource/LSU_Tigers




LOD2 Webinar . 26.02.2013. Page 26                                            http://lod2.eu
Creating Knowledge out of Interlinked Data




 Annotate

                                     http://dbpedia.org/resource/LSU_Tigers




                                                    http://dbpedia.org/resource/No. 4 (album)




LOD2 Webinar . 26.02.2013. Page 27                                                 http://lod2.eu
Creating Knowledge out of Interlinked Data




 Top K Candidates



                                                       LSU_Tigers

                                                             Louisiana
                                                             State
                                                             University




LOD2 Webinar . 26.02.2013. Page 28                             http://lod2.eu
Creating Knowledge out of Interlinked Data




Demo:
      – http://spotlight.dbpedia.org/demo/
Web Service:
      – http://spotlight.dbpedia.org/rest/{API}
      – APIs:
             • Phrase Recognition (/spot), Disambiguation (/disambiguation)
             • Top K disambiguations (/candidates)
             • Annotation (/annotation)
Source code:
      – https://github.com/dbpedia-spotlight/dbpedia-spotlight/
Apache V2 License!
LOD2 Webinar . 26.02.2013. Page 29                                  http://lod2.eu
Creating Knowledge out of Interlinked Data




Lessons learned

    A generic solution to the problem is tough
      – Most of the research focuses on solving very specialized cases
      – Some entity types are harder than others
      – Some types of text are harder than others

      Yet, users expect it to “just work”.

We are focusing on a generic core that can be easily customized.




LOD2 Webinar . 26.02.2013. Page 30                                       http://lod2.eu
Creating Knowledge out of Interlinked Data




Next steps

    More experiments with DBpedia Spotlight in the context of LOD2
     Use Case packages: Wolters Kluwer (legal domain, German
     language), Emergency Response,

    Automating build process and release to LOD2 Stack

    Expanding to other languages

    Easier adaptation to other knowledge bases beyond DBpedia

    New algorithms, collective disambiguation, etc.




LOD2 Webinar . 26.02.2013. Page 31                               http://lod2.eu
Creating Knowledge out of Interlinked Data




Credits

Jingle       R.E.M., Martin Kaltenböck, Florian Kondert
Coordination Thomas Thurner
             Martin Kaltenböck
Moderation Martin Kaltenböck
Presented by Pablo N. Mendes
Slides from Pablo N. Mendes, Max Jakob, Joachim Daiber




LOD2 Webinar . 26.02.2013 . Page 32                       http://lod2.eu
Creating Knowledge out of Interlinked Data




        Hope you enjoyed staying with us – if you need more detailed
        information, visit us at www.lod2.eu and let us know how we can
        improve to meet your expectations!

        Don’t forget to register for our next webinar

           27.03.2013 – CKAN and PublicData.eu (OKFN)
           April – Vituoso 7 (Openlink Software)

        Have a great day and don’t forget ...




                                                                          http://lod2.eu
LOD2 Webinar . 29.11.2011 . Page 33                                        http://lod2.eu
Creating Knowledge out of Interlinked Data




                                                       http://lod2.eu
LOD2 Webinar . 29.11.2011 . Page 34                     http://lod2.eu

Contenu connexe

Tendances

Corrib.org - OpenSource and Research
Corrib.org - OpenSource and ResearchCorrib.org - OpenSource and Research
Corrib.org - OpenSource and Researchadameq
 
Question answering in linked data
Question answering in linked dataQuestion answering in linked data
Question answering in linked dataReza Ramezani
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Andre Freitas
 
Tutorial semantic wikis and applications
Tutorial   semantic wikis and applicationsTutorial   semantic wikis and applications
Tutorial semantic wikis and applicationsMark Greaves
 
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...Cataldo Musto
 
From the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking upFrom the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking upDavide Palmisano
 
CSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web TutorialCSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web TutorialLeeFeigenbaum
 
Wimmics Research Team Overview 2017
Wimmics Research Team Overview 2017Wimmics Research Team Overview 2017
Wimmics Research Team Overview 2017Fabien Gandon
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge GraphsJeff Z. Pan
 
Tutorial on Semantic Digital Libraries (WWW'2007)
Tutorial on Semantic Digital Libraries (WWW'2007)Tutorial on Semantic Digital Libraries (WWW'2007)
Tutorial on Semantic Digital Libraries (WWW'2007)Sebastian Ryszard Kruk
 
SDA2013 Pundit: Creating, Exploring and Consuming Annotations
SDA2013 Pundit: Creating, Exploring and Consuming AnnotationsSDA2013 Pundit: Creating, Exploring and Consuming Annotations
SDA2013 Pundit: Creating, Exploring and Consuming AnnotationsMarco Grassi
 
Normative Requirements as Linked Data
Normative Requirements as Linked DataNormative Requirements as Linked Data
Normative Requirements as Linked DataFabien Gandon
 
On the many graphs of the Web and the interest of adding their missing links.
On the many graphs of the Web and the interest of adding their missing links. On the many graphs of the Web and the interest of adding their missing links.
On the many graphs of the Web and the interest of adding their missing links. Fabien Gandon
 
Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupalemmanuel_jamin
 
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webFabien Gandon
 
ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"Fabien Gandon
 
Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Mariana Damova, Ph.D
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataAndre Freitas
 

Tendances (20)

Corrib.org - OpenSource and Research
Corrib.org - OpenSource and ResearchCorrib.org - OpenSource and Research
Corrib.org - OpenSource and Research
 
Question answering in linked data
Question answering in linked dataQuestion answering in linked data
Question answering in linked data
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
 
Tutorial semantic wikis and applications
Tutorial   semantic wikis and applicationsTutorial   semantic wikis and applications
Tutorial semantic wikis and applications
 
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
 
From the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking upFrom the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking up
 
CSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web TutorialCSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web Tutorial
 
NISO Forum, Denver, Sept. 24, 2012: EZID: Easy dataset identification & manag...
NISO Forum, Denver, Sept. 24, 2012: EZID: Easy dataset identification & manag...NISO Forum, Denver, Sept. 24, 2012: EZID: Easy dataset identification & manag...
NISO Forum, Denver, Sept. 24, 2012: EZID: Easy dataset identification & manag...
 
Wimmics Research Team Overview 2017
Wimmics Research Team Overview 2017Wimmics Research Team Overview 2017
Wimmics Research Team Overview 2017
 
NISO Webinar: Software Preservation and Use: I Saved the Files But Can I Run ...
NISO Webinar: Software Preservation and Use: I Saved the Files But Can I Run ...NISO Webinar: Software Preservation and Use: I Saved the Files But Can I Run ...
NISO Webinar: Software Preservation and Use: I Saved the Files But Can I Run ...
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge Graphs
 
Tutorial on Semantic Digital Libraries (WWW'2007)
Tutorial on Semantic Digital Libraries (WWW'2007)Tutorial on Semantic Digital Libraries (WWW'2007)
Tutorial on Semantic Digital Libraries (WWW'2007)
 
SDA2013 Pundit: Creating, Exploring and Consuming Annotations
SDA2013 Pundit: Creating, Exploring and Consuming AnnotationsSDA2013 Pundit: Creating, Exploring and Consuming Annotations
SDA2013 Pundit: Creating, Exploring and Consuming Annotations
 
Normative Requirements as Linked Data
Normative Requirements as Linked DataNormative Requirements as Linked Data
Normative Requirements as Linked Data
 
On the many graphs of the Web and the interest of adding their missing links.
On the many graphs of the Web and the interest of adding their missing links. On the many graphs of the Web and the interest of adding their missing links.
On the many graphs of the Web and the interest of adding their missing links.
 
Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupal
 
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the web
 
ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"
 
Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
 

En vedette

A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Ber...
A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Ber...A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Ber...
A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Ber...Pablo Mendes
 
Graph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsGraph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsNYC Predictive Analytics
 
How to use Latent Semantic Analysis to Glean Real Insight - Franco Amalfi
How to use Latent Semantic Analysis to Glean Real Insight - Franco AmalfiHow to use Latent Semantic Analysis to Glean Real Insight - Franco Amalfi
How to use Latent Semantic Analysis to Glean Real Insight - Franco AmalfiSocial Media Camp
 
Latent Semanctic Analysis Auro Tripathy
Latent Semanctic Analysis Auro TripathyLatent Semanctic Analysis Auro Tripathy
Latent Semanctic Analysis Auro TripathyAuro Tripathy
 
Latent Semantic Analysis of Wikipedia with Spark
Latent Semantic Analysis of Wikipedia with SparkLatent Semantic Analysis of Wikipedia with Spark
Latent Semantic Analysis of Wikipedia with SparkSandy Ryza
 
Introduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisIntroduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisNYC Predictive Analytics
 
Syntactic Analysis
Syntactic AnalysisSyntactic Analysis
Syntactic AnalysisAleli Lac
 
Semantics: Seven types of meaning
Semantics: Seven types of meaningSemantics: Seven types of meaning
Semantics: Seven types of meaningMiftadia Laula
 
NERD: Evaluating Named Entity Recognition Tools in the Web of Data
NERD: Evaluating Named Entity Recognition Tools in the Web of DataNERD: Evaluating Named Entity Recognition Tools in the Web of Data
NERD: Evaluating Named Entity Recognition Tools in the Web of DataGiuseppe Rizzo
 
Collaborative Filtering at Spotify
Collaborative Filtering at SpotifyCollaborative Filtering at Spotify
Collaborative Filtering at SpotifyErik Bernhardsson
 

En vedette (13)

A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Ber...
A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Ber...A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Ber...
A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Ber...
 
GoogLeNet Insights
GoogLeNet InsightsGoogLeNet Insights
GoogLeNet Insights
 
Entity Linking
Entity LinkingEntity Linking
Entity Linking
 
Graph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsGraph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media Analytics
 
How to use Latent Semantic Analysis to Glean Real Insight - Franco Amalfi
How to use Latent Semantic Analysis to Glean Real Insight - Franco AmalfiHow to use Latent Semantic Analysis to Glean Real Insight - Franco Amalfi
How to use Latent Semantic Analysis to Glean Real Insight - Franco Amalfi
 
Latent Semanctic Analysis Auro Tripathy
Latent Semanctic Analysis Auro TripathyLatent Semanctic Analysis Auro Tripathy
Latent Semanctic Analysis Auro Tripathy
 
Latent Semantic Analysis of Wikipedia with Spark
Latent Semantic Analysis of Wikipedia with SparkLatent Semantic Analysis of Wikipedia with Spark
Latent Semantic Analysis of Wikipedia with Spark
 
Introduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisIntroduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic Analysis
 
Syntactic Analysis
Syntactic AnalysisSyntactic Analysis
Syntactic Analysis
 
Semantics: Seven types of meaning
Semantics: Seven types of meaningSemantics: Seven types of meaning
Semantics: Seven types of meaning
 
NERD: Evaluating Named Entity Recognition Tools in the Web of Data
NERD: Evaluating Named Entity Recognition Tools in the Web of DataNERD: Evaluating Named Entity Recognition Tools in the Web of Data
NERD: Evaluating Named Entity Recognition Tools in the Web of Data
 
Semantics
SemanticsSemantics
Semantics
 
Collaborative Filtering at Spotify
Collaborative Filtering at SpotifyCollaborative Filtering at Spotify
Collaborative Filtering at Spotify
 

Similaire à LOD2 Webinar Series: DBpedia Spotlight

LOD2 Webinar Series - 7 - CloudView
LOD2 Webinar Series - 7 - CloudView LOD2 Webinar Series - 7 - CloudView
LOD2 Webinar Series - 7 - CloudView Semantic Web Company
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data VisualizationLaura Po
 
Soren Auer - LOD2 - creating knowledge out of Interlinked Data
Soren Auer - LOD2 - creating knowledge out of Interlinked DataSoren Auer - LOD2 - creating knowledge out of Interlinked Data
Soren Auer - LOD2 - creating knowledge out of Interlinked DataOpen City Foundation
 
Improving the Performance of the DL-Learner SPARQL Component for Semantic We...
Improving the Performance of the  DL-Learner SPARQL Component for Semantic We...Improving the Performance of the  DL-Learner SPARQL Component for Semantic We...
Improving the Performance of the DL-Learner SPARQL Component for Semantic We...Sebastian Hellmann
 
Navigation-induced Knowledge Engineering by Example
 Navigation-induced Knowledge Engineering by Example Navigation-induced Knowledge Engineering by Example
Navigation-induced Knowledge Engineering by ExampleSebastian Hellmann
 
Linked data and semantic wikis
Linked data and semantic wikisLinked data and semantic wikis
Linked data and semantic wikisSören Auer
 

Similaire à LOD2 Webinar Series: DBpedia Spotlight (20)

LOD2 Webinar Series: LOD2 in information and publishing industry
LOD2 Webinar Series: LOD2 in information and publishing industryLOD2 Webinar Series: LOD2 in information and publishing industry
LOD2 Webinar Series: LOD2 in information and publishing industry
 
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORELOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
 
Limes webinar
Limes webinarLimes webinar
Limes webinar
 
LOD2 Webinar Series: LIMES
LOD2 Webinar Series: LIMESLOD2 Webinar Series: LIMES
LOD2 Webinar Series: LIMES
 
LOD2 Webinar Series: Zemanta / Open refine
LOD2 Webinar Series: Zemanta / Open refine LOD2 Webinar Series: Zemanta / Open refine
LOD2 Webinar Series: Zemanta / Open refine
 
LOD2 Webinar Series - 7 - CloudView
LOD2 Webinar Series - 7 - CloudView LOD2 Webinar Series - 7 - CloudView
LOD2 Webinar Series - 7 - CloudView
 
LOD2: State of Play WP9: Use Case Open Government Data
LOD2: State of Play WP9: Use Case Open Government DataLOD2: State of Play WP9: Use Case Open Government Data
LOD2: State of Play WP9: Use Case Open Government Data
 
LOD2: State of Play WP5 - Linked Data Visualization, Browsing and Authoring
LOD2: State of Play WP5 - Linked Data Visualization, Browsing and AuthoringLOD2: State of Play WP5 - Linked Data Visualization, Browsing and Authoring
LOD2: State of Play WP5 - Linked Data Visualization, Browsing and Authoring
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data Visualization
 
Free Webinar: LOD2 Stack - 1st release
Free Webinar: LOD2 Stack - 1st releaseFree Webinar: LOD2 Stack - 1st release
Free Webinar: LOD2 Stack - 1st release
 
LOD2 Webinar Series: SILK
LOD2 Webinar Series: SILKLOD2 Webinar Series: SILK
LOD2 Webinar Series: SILK
 
Soren Auer - LOD2 - creating knowledge out of Interlinked Data
Soren Auer - LOD2 - creating knowledge out of Interlinked DataSoren Auer - LOD2 - creating knowledge out of Interlinked Data
Soren Auer - LOD2 - creating knowledge out of Interlinked Data
 
LOD2 Webinar Series: OntoWiki
LOD2 Webinar Series: OntoWikiLOD2 Webinar Series: OntoWiki
LOD2 Webinar Series: OntoWiki
 
LOD2 Webinar Series: CubeViz
LOD2 Webinar Series: CubeViz LOD2 Webinar Series: CubeViz
LOD2 Webinar Series: CubeViz
 
LOD2 Webinar Series FOX
LOD2 Webinar Series FOXLOD2 Webinar Series FOX
LOD2 Webinar Series FOX
 
Improving the Performance of the DL-Learner SPARQL Component for Semantic We...
Improving the Performance of the  DL-Learner SPARQL Component for Semantic We...Improving the Performance of the  DL-Learner SPARQL Component for Semantic We...
Improving the Performance of the DL-Learner SPARQL Component for Semantic We...
 
Navigation-induced Knowledge Engineering by Example
 Navigation-induced Knowledge Engineering by Example Navigation-induced Knowledge Engineering by Example
Navigation-induced Knowledge Engineering by Example
 
Linked data and semantic wikis
Linked data and semantic wikisLinked data and semantic wikis
Linked data and semantic wikis
 
LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair
LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and RepairLOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair
LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair
 
LOD2: State of Play WP3B - Knowledge Extraction, NLP2RDF + NIF
LOD2: State of Play WP3B - Knowledge Extraction, NLP2RDF + NIFLOD2: State of Play WP3B - Knowledge Extraction, NLP2RDF + NIF
LOD2: State of Play WP3B - Knowledge Extraction, NLP2RDF + NIF
 

Plus de LOD2 Creating Knowledge out of Interlinked Data

Plus de LOD2 Creating Knowledge out of Interlinked Data (20)

LOD2 Webinar: SIREn
LOD2 Webinar: SIREnLOD2 Webinar: SIREn
LOD2 Webinar: SIREn
 
LOD2 Webinar: UnifiedViews
LOD2 Webinar: UnifiedViewsLOD2 Webinar: UnifiedViews
LOD2 Webinar: UnifiedViews
 
LOD2 Webinar Series: 3rd relase of the Stack
LOD2 Webinar Series: 3rd relase of the StackLOD2 Webinar Series: 3rd relase of the Stack
LOD2 Webinar Series: 3rd relase of the Stack
 
LOD2 Webinar Series: Virtuoso 7
LOD2 Webinar Series: Virtuoso 7LOD2 Webinar Series: Virtuoso 7
LOD2 Webinar Series: Virtuoso 7
 
LOD2 Webinar Series: publicdata.eu and CKAN
LOD2 Webinar Series: publicdata.eu and CKANLOD2 Webinar Series: publicdata.eu and CKAN
LOD2 Webinar Series: publicdata.eu and CKAN
 
LOD2 General Presentation 2012
LOD2 General Presentation 2012LOD2 General Presentation 2012
LOD2 General Presentation 2012
 
LOD2 Webinar Series: PoolParty
LOD2 Webinar Series: PoolPartyLOD2 Webinar Series: PoolParty
LOD2 Webinar Series: PoolParty
 
LOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and SparqlifyLOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and Sparqlify
 
LOD2 Plenary Vienna 2012: WP12 - Project Management
LOD2 Plenary Vienna 2012: WP12 - Project ManagementLOD2 Plenary Vienna 2012: WP12 - Project Management
LOD2 Plenary Vienna 2012: WP12 - Project Management
 
LOD2 Plenary Vienna 2012: WP10 - Training, Dissemination, Community Building,...
LOD2 Plenary Vienna 2012: WP10 - Training, Dissemination, Community Building,...LOD2 Plenary Vienna 2012: WP10 - Training, Dissemination, Community Building,...
LOD2 Plenary Vienna 2012: WP10 - Training, Dissemination, Community Building,...
 
LOD2 Plenary Vienna 2012: WP9A - LOD for a Distributed Marketplace for Public...
LOD2 Plenary Vienna 2012: WP9A - LOD for a Distributed Marketplace for Public...LOD2 Plenary Vienna 2012: WP9A - LOD for a Distributed Marketplace for Public...
LOD2 Plenary Vienna 2012: WP9A - LOD for a Distributed Marketplace for Public...
 
LOD2 Plenary Vienna 2012: WP9 publicdata.eu – Publishing Governmental Informa...
LOD2 Plenary Vienna 2012: WP9 publicdata.eu – Publishing Governmental Informa...LOD2 Plenary Vienna 2012: WP9 publicdata.eu – Publishing Governmental Informa...
LOD2 Plenary Vienna 2012: WP9 publicdata.eu – Publishing Governmental Informa...
 
LOD2 Plenary Vienna 2012: WP8: Linked Open Data for Enterprise Data Web
LOD2 Plenary Vienna 2012: WP8: Linked Open Data for Enterprise Data WebLOD2 Plenary Vienna 2012: WP8: Linked Open Data for Enterprise Data Web
LOD2 Plenary Vienna 2012: WP8: Linked Open Data for Enterprise Data Web
 
LOD2 Plenary Vienna 2012: WP7 - Linked Open Data for Media and Publishing
LOD2 Plenary Vienna 2012: WP7 - Linked Open Data for Media and Publishing LOD2 Plenary Vienna 2012: WP7 - Linked Open Data for Media and Publishing
LOD2 Plenary Vienna 2012: WP7 - Linked Open Data for Media and Publishing
 
LOD2 Plenary Vienna 2012: WP6 - Interfaces, Integration & LOD2 Stack
LOD2 Plenary Vienna 2012: WP6 - Interfaces, Integration & LOD2 StackLOD2 Plenary Vienna 2012: WP6 - Interfaces, Integration & LOD2 Stack
LOD2 Plenary Vienna 2012: WP6 - Interfaces, Integration & LOD2 Stack
 
LOD2 Plenary Vienna 2012: WP5 - Linked Data Browsing, Visualization and Autho...
LOD2 Plenary Vienna 2012: WP5 - Linked Data Browsing, Visualization and Autho...LOD2 Plenary Vienna 2012: WP5 - Linked Data Browsing, Visualization and Autho...
LOD2 Plenary Vienna 2012: WP5 - Linked Data Browsing, Visualization and Autho...
 
LOD2 Plenary Vienna 2012: WP4 - Reuse, Interlinking and Knowledge Fusion
LOD2 Plenary Vienna 2012: WP4 - Reuse, Interlinking and Knowledge FusionLOD2 Plenary Vienna 2012: WP4 - Reuse, Interlinking and Knowledge Fusion
LOD2 Plenary Vienna 2012: WP4 - Reuse, Interlinking and Knowledge Fusion
 
LOD2 Plenary Vienna 2012: WP2 - Storing and Querying Very Large Knowledge Bases
LOD2 Plenary Vienna 2012: WP2 - Storing and Querying Very Large Knowledge BasesLOD2 Plenary Vienna 2012: WP2 - Storing and Querying Very Large Knowledge Bases
LOD2 Plenary Vienna 2012: WP2 - Storing and Querying Very Large Knowledge Bases
 
LOD2 webinar series: Virtuoso by OpenLink Software
LOD2 webinar series: Virtuoso by OpenLink SoftwareLOD2 webinar series: Virtuoso by OpenLink Software
LOD2 webinar series: Virtuoso by OpenLink Software
 
LOD2 Plenary Meeting 2011: Institute Mihajlo Pupin – Partner Introduction
LOD2 Plenary Meeting 2011: Institute Mihajlo Pupin – Partner IntroductionLOD2 Plenary Meeting 2011: Institute Mihajlo Pupin – Partner Introduction
LOD2 Plenary Meeting 2011: Institute Mihajlo Pupin – Partner Introduction
 

LOD2 Webinar Series: DBpedia Spotlight

  • 1. Creating Knowledge out of Interlinked Data LOD2 Webinar . 26.02.2013 . Page 1 http://lod2.eu
  • 2. Creating Knowledge out of Interlinked Data LOD2 is a large-scale integrating project co-funded by the European Commission within the FP7 Information and Communication Technologies Work Programme. This 4-year project comprises leading Linked Open Data technology researchers, companies, and service providers. Coming from across 12 countries the partners are coordinated by the Agile Knowledge Engineering and Semantic Web Research Group at the University of Leipzig, Germany. LOD2 will integrate and syndicate Linked Data with existing large-scale applications. The project shows the benefits in the scenarios of Media and Publishing, Corporate Data intranets and eGovernment. http://lod2.eu LOD2 Webinar . 29.11.2011 . Page 2 http://lod2.eu
  • 3. Creating Knowledge out of Interlinked Data Once per month the LOD2 webinar series offer a free webinar about tools and services along the Linked Open Data Life Cycle. Stay with us and learn more about acquisition, editing, composing, connected applications – and finally publishing Linked Open Data. http://lod2.eu LOD2 Webinar . 29.11.2011 . Page 3 http://lod2.eu
  • 4. Creating Knowledge out of Interlinked Data Agenda  Profiles: Pablo N Mendes and the DBpedia Spotlight team  Linked Data life cycle and role of DBpedia Spotlight within LOD2  What is DBpedia Spotlight  Demonstration  Lessons Learned and Next steps  Q&A LOD2 Webinar . 26.02.2013. Page 4 http://lod2.eu
  • 5. Creating Knowledge out of Interlinked Data Pablo N. Mendes and the DBpedia Spotlight team Pablo N. Mendes Research Associate at the Co-maintainers Open Knowledge Foundation, Max Jakob (Neofonie Gmbh) Germany Joachim Daiber (MS student at http://okfn.de the Rijksuniversiteit Groningen)  Interests: - Information Extraction, Integration, Retrieval and Exploration Contributors  More info: Sandro Coelho (BS student at UFJF, Brazil) http://pablomendes.com Chris Hokamp (PhD student at University of North Texas, USA) Funding Dirk Weissenborn (MS student at LOD2, DICODE, Google Summer University of Dresden, Germany) of Code 2012, IKS Liu Zhengzhong (now PhD student at Carnegie Mellon University, USA) Hosting Marcus Nitschke (student at U. Leipzig) U.Mannheim, MTA SZTAKI, ... Globo.com, RNP.br Full list on GitHub. LOD2 Webinar . 26.02.2013. Page 5 http://lod2.eu
  • 6. Creating Knowledge out of Interlinked Data Linked Data Life Cycle Manual Interlinking revision Fusing Classification authoring Enrichment Storage Quality Querying Analysis Extraction Search Evolution Browsing Repair Exploration LOD2 Webinar . 26.02.2013. Page 6 http://lod2.eu
  • 7. Creating Knowledge out of Interlinked Data Linked Data Life Cycle Manual Interlinking revision Fusing Classification authoring Enrichment Storage Quality Querying Analysis Extraction Search Evolution Browsing Repair Exploration LOD2 Webinar . 26.02.2013. Page 7 http://lod2.eu
  • 8. Creating Knowledge out of Interlinked Data Shedding Light on the Web of Documents LOD2 Webinar . 26.02.2013. Page 8 http://lod2.eu
  • 9. Creating Knowledge out of Interlinked Data Named Entity Recognition/Disambiguation • Automatically put Wikipedia links to (plain) text. LOD2 Webinar . 26.02.2013. Page 9 http://lod2.eu
  • 10. Creating Knowledge out of Interlinked Data Named Entity Recognition/Disambiguation • Automatically put Wikipedia links to (plain) text. • 1. Recognition: find „interesting“ strings • s urface form s LOD2 Webinar . 26.02.2013. Page 10 http://lod2.eu
  • 11. Creating Knowledge out of Interlinked Data Named Entity Recognition/Disambiguation • Automatically put Wikipedia links to (plain) text. • 1. Recognition: find „interesting“ strings • s urface form s LOD2 Webinar . 26.02.2013. Page 11 http://lod2.eu
  • 12. Creating Knowledge out of Interlinked Data Named Entity Recognition/Disambiguation • Automatically put Wikipedia links to (plain) text. • 1. Recognition: find „interesting“ strings • s urface form s • 2. Disambiguation: choose appropriate Wikipedia page • Each Wikipedia page represents an e ntity • Every surface form can have multiple candidate entities for linking LOD2 Webinar . 26.02.2013. Page 12 http://lod2.eu
  • 13. Creating Knowledge out of Interlinked Data Michael Jackson died in 2007. LOD2 Webinar . 26.02.2013. Page 13 http://lod2.eu
  • 14. Creating Knowledge out of Interlinked Data Michael Jackson died in 2007. • Recognition: Find surface forms LOD2 Webinar . 26.02.2013. Page 14 http://lod2.eu
  • 15. Creating Knowledge out of Interlinked Data [Michael Jackson] died in 2007. • Recognition: Find surface forms LOD2 Webinar . 26.02.2013. Page 15 http://lod2.eu
  • 16. Creating Knowledge out of Interlinked Data [Michael Jackson] died in 2007. • Disambiguation: Choose correct entity LOD2 Webinar . 26.02.2013. Page 16 http://lod2.eu
  • 17. Creating Knowledge out of Interlinked Data [Michael Jackson] died in 2007. • Disambiguation: Choose correct entity • Candidates for [Michael Jackson] LOD2 Webinar . 26.02.2013. Page 17 http://lod2.eu
  • 18. Creating Knowledge out of Interlinked Data [Michael Jackson] died in 2007. • Disambiguation: Choose correct entity • Candidates for [Michael Jackson] LOD2 Webinar . 26.02.2013. Page 18 http://lod2.eu
  • 19. Creating Knowledge out of Interlinked Data contex t [Michael Jackson] died in 2007. • Disambiguation: Choose correct entity • Candidates for [Michael Jackson] LOD2 Webinar . 26.02.2013. Page 19 http://lod2.eu
  • 20. Creating Knowledge out of Interlinked Data less dis tinctive contex t [Michael Jackson] came to Paris. • Disambiguation: Choose correct entity • Candidates for [Michael Jackson] Singer Journalist LOD2 Webinar . 26.02.2013. Page 20 http://lod2.eu
  • 21. Creating Knowledge out of Interlinked Data less dis tinctive contex t [Michael Jackson] came to Paris. • Disambiguation: Choose correct entity • Candidates for [Michael Jackson] Singer Journalist LOD2 Webinar . 26.02.2013. Page 21 http://lod2.eu
  • 22. Creating Knowledge out of Interlinked Data Probabilities • P(entity | surface form) • Who is typically meant by a name? • For example, given [Michael Jackson] (and ignoring the context), what are the probabilities of the candidates? • Michael J ackson (singer) 0.98 • Michael J ackson (journalist) 0.02 • Other useful probabilities: • P(surface form | entity), P(entity), P(surface form) • Estimate Maximum Likelihood using Wikipedia page links LOD2 Webinar . 26.02.2013. Page 22 http://lod2.eu
  • 23. Creating Knowledge out of Interlinked Data Data Processing • Two pipelines − Single machine with Scala − MapReduce-style with Apache Pig • Apache Pig for analyzing large datasets on top of Hadoop − Data-flow language − Think in tuples, bags and maps − load, filter, join, group by, store, … − from which Pig derives a MapReduce plan − We build on p ig nlp ro c , started by Olivier Grisel (Stanbol) LOD2 Webinar . 26.02.2013. Page 23 http://lod2.eu
  • 24. Creating Knowledge out of Interlinked Data Probability estimation count( surface form, entity ) • P( entity | surface form ) = count( surface form ) • P( Michael J ackson (singer) | Michael J ackson) = 0.98 • P( Michael J ackson (journalist) | Michael J ackson) = 0.02 • Check the project web for estimation of other scores – Other probabilities... – TF*ICF (modification of TF*IDF) and others... LOD2 Webinar . 26.02.2013. Page 24 http://lod2.eu
  • 25. Creating Knowledge out of Interlinked Data LOD2 Webinar . 26.02.2013. Page 25 http://lod2.eu
  • 26. Creating Knowledge out of Interlinked Data Annotate http://dbpedia.org/resource/LSU_Tigers LOD2 Webinar . 26.02.2013. Page 26 http://lod2.eu
  • 27. Creating Knowledge out of Interlinked Data Annotate http://dbpedia.org/resource/LSU_Tigers http://dbpedia.org/resource/No. 4 (album) LOD2 Webinar . 26.02.2013. Page 27 http://lod2.eu
  • 28. Creating Knowledge out of Interlinked Data Top K Candidates LSU_Tigers Louisiana State University LOD2 Webinar . 26.02.2013. Page 28 http://lod2.eu
  • 29. Creating Knowledge out of Interlinked Data Demo: – http://spotlight.dbpedia.org/demo/ Web Service: – http://spotlight.dbpedia.org/rest/{API} – APIs: • Phrase Recognition (/spot), Disambiguation (/disambiguation) • Top K disambiguations (/candidates) • Annotation (/annotation) Source code: – https://github.com/dbpedia-spotlight/dbpedia-spotlight/ Apache V2 License! LOD2 Webinar . 26.02.2013. Page 29 http://lod2.eu
  • 30. Creating Knowledge out of Interlinked Data Lessons learned  A generic solution to the problem is tough – Most of the research focuses on solving very specialized cases – Some entity types are harder than others – Some types of text are harder than others Yet, users expect it to “just work”. We are focusing on a generic core that can be easily customized. LOD2 Webinar . 26.02.2013. Page 30 http://lod2.eu
  • 31. Creating Knowledge out of Interlinked Data Next steps  More experiments with DBpedia Spotlight in the context of LOD2 Use Case packages: Wolters Kluwer (legal domain, German language), Emergency Response,  Automating build process and release to LOD2 Stack  Expanding to other languages  Easier adaptation to other knowledge bases beyond DBpedia  New algorithms, collective disambiguation, etc. LOD2 Webinar . 26.02.2013. Page 31 http://lod2.eu
  • 32. Creating Knowledge out of Interlinked Data Credits Jingle R.E.M., Martin Kaltenböck, Florian Kondert Coordination Thomas Thurner Martin Kaltenböck Moderation Martin Kaltenböck Presented by Pablo N. Mendes Slides from Pablo N. Mendes, Max Jakob, Joachim Daiber LOD2 Webinar . 26.02.2013 . Page 32 http://lod2.eu
  • 33. Creating Knowledge out of Interlinked Data Hope you enjoyed staying with us – if you need more detailed information, visit us at www.lod2.eu and let us know how we can improve to meet your expectations! Don’t forget to register for our next webinar 27.03.2013 – CKAN and PublicData.eu (OKFN) April – Vituoso 7 (Openlink Software) Have a great day and don’t forget ... http://lod2.eu LOD2 Webinar . 29.11.2011 . Page 33 http://lod2.eu
  • 34. Creating Knowledge out of Interlinked Data http://lod2.eu LOD2 Webinar . 29.11.2011 . Page 34 http://lod2.eu