SlideShare a Scribd company logo
1 of 46
Download to read offline
Stat rosa pristina nomine,
  nomine nuda tenemus




   César de Pablo Sanchez
Overview of previous work



TAC-KBP 2010 - Combining Similarities and Regression
          Classifiers for Entity Linking


               1. Task definition: KBP and EL
                    2. System description
                         3. Results
                       4. Conclusions
Overview of previous work
Drug Drug Interactions




Relation extraction
Anaphora resolution
OPINATOR - Opinion Mining
               Sentiment loaded dictionaries
               Sentiment classification
               Opinion summarization
               Search/Navigation
Knowledge acquisition
List candidates for the Greek elections in June.
Knowledge acquisition
List candidates for the Greek elections in June.
Knowledge acquisition
List candidates for the Greek elections in June.




What party does Tsipras represents?

How old is he?
What does Syriza means?
Knowledge acquisition
List candidates for the Greek elections in June.




What party does Tsipras represents?

How old is he?
What does Syriza means?
Knowledge acquisition
List candidates for the Greek elections in June.




What party does Tsipras represents?

How old is he?
What does Syriza means?




How old is Samaras?
Knowledge acquisition
List candidates for the Greek elections in June.




What party does Tsipras represents?

How old is he?
What does Syriza means?




How old is Samaras?
TAC-KBP 2010 - Combining Similarities and Regression
          Classifiers for Entity Linking




           1. Task definition: KBP and EL
                2. System description
                     3. Results
                   4. Conclusions
TAC-KBP 2010 - Combining Similarities and Regression
          Classifiers for Entity Linking




             Knowledge Base Population
    César de Pablo, Juan Perea, Paloma Martínez
Knowledge Base Population


                  Knowledge

                    Base

      KBP
Knowledge Base Population


                                       Knowledge

                                         Base

                     KBP
                                        from Wikipedia dump (2008)
                                           ●
                                             Title, name, type, id,
                                           ●
                                             wiki text,
                                           ●
                                             several facts as [name, value]
●   1.3 million English newswire
    documents
    ●   Published from 1994 and 2008
●   488.240 webpages
IE = KBP?



            QA = KBP?
IE = KBP?
Accurate extraction of facts – not annotation

 Learn facts from corpus - repetition is not
      important but helps confidence

     Asserting wrong information is bad

                 Scalability

                Provenance




                                                QA = KBP?
IE = KBP?
Accurate extraction of facts – not annotation          Slots are fixed but targets change
 Learn facts from corpus - repetition is not           Leverage knowledge from the KB
      important but helps confidence
                                                Global resolution - ground information to the KB
     Asserting wrong information is bad
                                                              Avoid contradiction
                 Scalability
                                                               Detect novel info
                Provenance




                                                   QA = KBP?
Task at TAC - KBP
●
        Task –1: Slot Filling in
    Entity Linking grounding entity mentions
    document to KB entries
●   Slot Filling – Learning attributes about target
    entities

      Task 2: Entity Linking
Task at TAC - KBP
●
        Task –1: Slot Filling in
    Entity Linking grounding entity mentions
    document to KB entries
●   Slot Filling – Learning attributes about target
    entities
Task at TAC - KBP
●   Entity Linking – grounding entity mentions in
    document to KB entries
●   Slot Filling – Learning attributes about target
    entities

      Task 2: Entity Linking
Entity Linking: Example
For a name string and a document, determine which mentions in
  ● Entity Linking – grounding entity entity in a KB

if any is being referred to by entries string
     document to KB the name
 ● <query
        id="EL006455">
   Slot Filling – Learning attributes about target
  <name>Reserve Bank</name>
   entities
  <docid>eng-NG-31-100316-11150589</docid>
  <entity>E0700143</entity>
  </query>



   <query id="EL06472">
   <name>Reserve Bank</name>
   <docid>eng-NG-31-142262-10040510</docid>
   <entity>E0421510</entity>
   </query>
Entity Linking: Example
For a name string and a document, determine which mentions in
  ● Entity Linking – grounding entity entity in a KB

if any is being referred to by entries string
     document to KB the name
 ● <query
        id="EL006455">
   Slot Filling – Learning attributes about target
  <name>Reserve Bank</name>
   entities
  <docid>eng-NG-31-100316-11150589</docid>
  <entity>E0700143</entity>
                                              …
  </query>
                                              E0421510: Reserve Bank of Australia
                                              …
                                              E0700143: Reserve Bank of India
   <query id="EL06472">                       ....
   <name>Reserve Bank</name>
   <docid>eng-NG-31-142262-10040510</docid>   NIL
   <entity>E0421510</entity>
   </query>
Entity Linking: Challenges
Focus on confusable entities
   ● Entity Linking – grounding entity mentions in

   ●
     Ambiguous names : Reserve Bank, Alan Jackson, Fonda
     document to KB entries
   ●●   Slot Filling – Learning attributes about target
        entities
Entity Linking: Challenges
Focus on confusable entities
   ● Entity Linking – grounding entity mentions in

   ●
     Ambiguous names entries
     document to KB
   ●●   Multiple Name– Learning attributes about target
        Slot Filling variants: Saddam Hussain, Saddam Hussein
        entities
Entity Linking: Challenges
Focus on confusable entities
   ● Entity Linking – grounding entity mentions in

   ●
     Ambiguous names entries
     document to KB
   ●●   Multiple Name– Learning attributes about target
        Slot Filling variants
   ●    entities
        Acronym expansion: CDC, AZ
Entity Linking: Challenges
Focus on confusable entities
   ● Entity Linking – grounding entity mentions in

   ●
     Ambiguous names entries
     document to KB
   ●●   Multiple Name– Learning attributes about target
        Slot Filling variants
   ●    entities
        Acronym expansion
   ●
        Variety of cases : Centre for Disease Control, European Centre
        for Disease Control, AZ, Arizona, Astra Zeneca
Entity Linking: Challenges
Focus on confusable entities
   ● Entity Linking – grounding entity mentions in

    ●
     Ambiguous names entries
     document to KB
    ●●   Multiple Name– Learning attributes about target
         Slot Filling variants
    ●    entities
         Acronym expansion
    ●
         Variety of cases
●
    Pilot task – entity linking withouth text support
●
    Identify missing entities – then cluster (2011)
Entity Linking: Evaluation
Name mention – document pairs
●
    Accuracy micro = num correct / num queries
●
    Accuracy macro = group by entities (2009)


            queries   NIL    set          genre    % NIL
            3904      2229   eval 2009    news     0.571
            1500      426    train 2010   web      0.284
            2250      1230   eval 2010    news +   0.547
                                          web
uc3m EL system
●
    Supervised architecture
     ● Entity Linking – grounding entity mentions in

●
    Use similarities to KB entries or parts of them – avoid a
       document between objects
    wide feature vector
     ● Slot Filling – Learning attributes about target


●
       entities

                                 1) Candidate Entity Retrieval
                                        2) Candidate Filtering
                              3) Validation (NIL classification)
uc3m EL system
1) Candidate Retrieval
●
    Each KB article is indexed using Lucene, using several
     ● Entity Linking – grounding entity mentions in
    indexes and fields KB entries
       document to
    ●
     ●   ALIASFilling – names plus aliases extracted from wiki slots:
         Slot - include Learning attributes about target
         alias, abbreviation, website, etc.
         entities
    ●
         NER – Named entities extracted from text: <id, ne, text>
    ●
         KB - entity slots <id, [(slot_name,slot_value)]>
    ●
         WIKIPEDIA – anchorList, category, redirect, outlinks, inlinks
●
    Each EL query transforms into several Lucene queries –
    result [KB name, score] list
1) Candidate Retrieval
●
    EL Query: [Michael Jordan,eng-NG-31-100316-11150589]
     ● Entity Linking – grounding entity mentions in

●
    Lucene queries:to KB entries
       document
    ●
     ●   name=Michael AND name = Jordan
          Slot Filling – Learning attributes about target
    ●
          entities
         alias=Michael AND alias = Jordan
    ●
         abbr=Michael AND abbr = Jordan

●
    For each query:
    ●
         [EL0989789, Michael Jordan, 25.00]
    ●
         [EL6565356, Michael B. Jordan , 25.00]
    ●
         [EL6565356, Michael I. Jordan , 25.00]
    ●
         [EL6565356, Michael-Hakim Jordan , 25.00]
    ●
         [EL6565356, Jordan , 20.00]
2) Candidate Filtering
●
    Classification problem
     ● Entity Linking – grounding entity mentions in

    ●
       decide (EL query KB entries name + wiki text ) is a good
       document to + text , KB
         match
     ●    Slot Filling – Learning attributes about target
    ●
         In fact, rank by prediction confidence
          entities
●
    Use similarity scores as features – norm and unnorm
●
    Use a cost sensitive classifier.
●
    Best results: Model trees with linear regression leafs
Features
●
    Index-based scores:
     ● Entity Linking – grounding entity mentions in
    ●
       sim (EL queries, KB entries) directly from initial retrieval
        document to KB entries
●
    Context-similarity Learning attributes about target
     ● Slot Filling – scores:


    ●   entities
       sim(document, wikitext) o sim(document,slots)
●
    Name similarity score:
    ●
        sim (EL queries, KB entries) – more expensive: equal,
        QcontainsE, EcontainsQ, Jaro, Jaro-Winkler, SLIM (based on
        SecondString)
3) Validation
●
    Classification – selected candidate is good enough or NIL
     ● Entity Linking – grounding entity mentions in

●
    Positive examples KBcorrect candidate example
       document to – entries
●    ● Slot Filling – Learning attributes about target
    Negative examples – top ranked entities for those queries
       entities
    that do not have a link in the KB
●
    Balanced dataset
●
    Best classifier: Logistic Regression
EL results - main
●


●
    ●   Entity Linking – grounding entity mentions in
        document to KB web
                  news    entriesnews+web Highest Median
●
          750 ORG    0.69   0.67   0.67   0.85   0.68
    ●   Slot GPE 0.52– Learning attributes about target
          749
              Filling     0.53   0.51    0.80  0.60
●
        entities 0.82
          751 PER         0.76   0.85    0.96  0.85
●
          2250 ALL   0.67   0.65   0.68   0.87   0.69

●


●   Influence of domain?
EL results - main
●


●
    ●   Entity Linking – grounding entity mentions in
        document to KB web
                  news    entriesnews+web Highest Median
●
          750 ORG    0.69   0.67   0.67   0.85   0.68
    ●   Slot GPE 0.52– Learning attributes about target
          749
              Filling     0.53   0.51    0.80  0.60
●
        entities 0.82
          751 PER         0.76   0.85    0.96  0.85
●
          2250 ALL   0.67   0.65   0.68   0.87   0.69

●
EL results - main
●


●
     ●   Entity Linking – grounding entity mentions in
         document to KB web
                   news    entriesnews+web Highest Median
●
           750 ORG    0.69   0.67   0.67   0.85   0.68
     ●   Slot GPE 0.52– Learning attributes about target
           749
               Filling     0.53   0.51    0.80  0.60
●
         entities 0.82
           751 PER         0.76   0.85    0.96  0.85
●
           2250 ALL   0.67   0.65   0.68   0.87   0.69

●   GPE are particularly difficult
EL results - main
●   AA
     ● Entity Linking – grounding entity mentions in

       document to KB web
                 news    entriesnews+web Highest Median
           750 ORG    0.69   0.67   0.67       0.85      0.68
     ●   Slot GPE 0.52– Learning attributes about target
           749
               Filling     0.53   0.51    0.80  0.60
         entities 0.82
           751 PER         0.76   0.85    0.96  0.85
           2250 ALL   0.67   0.65   0.68       0.87      0.69



                      news   web    news+web   Highest   Median
           2250 ALL   0.67   0.65   0.68       0.87      0.69
           1020 noNIL 0.51   0.59   0.49
           1230 NIL   0.81   0.70   0.82
EL results - main
●   AA
     ● Entity Linking – grounding entity mentions in

       document to KB web
                 news    entriesnews+web Highest Median
           750 ORG    0.69   0.67   0.67       0.85      0.68
     ●   Slot GPE 0.52– Learning attributes about target
           749
               Filling     0.53   0.51    0.80  0.60
         entities 0.82
           751 PER         0.76   0.85    0.96  0.85
           2250 ALL   0.67   0.65   0.68       0.87      0.69



                      news   web    news+web   Highest   Median
           2250 ALL   0.67   0.65   0.68       0.87      0.69
           1020 noNIL 0.51   0.59   0.49
           1230 NIL   0.81   0.70   0.82
EL results – pilot w/o text
●


●
    ●   Entity Linking – grounding entity mentions in
        document to KB entries
                      news(main) news +n-sim NIL +n-sim all
●
               2250 ALL   0.67   0.58   0.66   0.70
    ●   Slot Filling – Learning attributes about target
               1020 noNIL 0.51 0.35   0.40   0.47
●
        entities NIL 0.81
               1230            0.77   0.88   0.88
●


●   Including name similarity scores helped
EL results – pilot w/o text
●


●
    ●   Entity Linking – grounding entity mentions in
        document to KB entries
                      news(main) news +n-sim NIL +n-sim all
●
               2250 ALL   0.67   0.58   0.66   0.70
    ●   Slot Filling – Learning attributes about target
               1020 noNIL 0.51 0.35   0.40   0.47
●
        entities NIL 0.81
               1230            0.77   0.88   0.88
●


●   Including name similarity scores helped
EL systems comparison
●
    Prior on Link probability/popularity (Stanford-UBC 2009, LCC 2010,
     ● Entity Linking – grounding entity mentions in
    Microsoft 2011)
        document to KB entries
    Learning to rank algorithms: ListNet (CUNY 2011)
     ● Slot Filling – Learning attributes about target

●
    Expand queries: acronym expansion/correference (NUS 2011)
        entities
●
    Unsupervised system – entity co-ocurrence + PageRank
    (WebTLab 2010)

●
    Inductive EL – first cluster, then link (LCC 2011)
●
    Collective entity linking   (Microsoft 2011)
Conclusion
●
    Supervised EL system
     ● Entity Linking – grounding entity mentions in

    ●
       Influence of training size
        document to KB entries
    ●●   beware of training data distribution
         Slot Filling – Learning attributes about target
●     entities
    Consider name-similarities even for reranking
●
    Improve initial candidate retrieval
●
    Perform collective Entity Linking
●
    Efficiency?
Related tasks
●   Cluster Documents Mentioning Entities
●   Entity correference – document and cross-
    document
●   Add missing links between Wikipedia pages
●   Link entities to matching Wikipedia articles

More Related Content

Viewers also liked

Project I Nt Upret V2
Project I Nt Upret   V2Project I Nt Upret   V2
Project I Nt Upret V2Ng
 
Conseil de Quartier Voltaire Part-Dieu : Plénière du 19/02/2016
Conseil de Quartier Voltaire Part-Dieu : Plénière du 19/02/2016Conseil de Quartier Voltaire Part-Dieu : Plénière du 19/02/2016
Conseil de Quartier Voltaire Part-Dieu : Plénière du 19/02/2016xavierPN
 
El català, llengua global - Vicent Partal
El català, llengua global - Vicent PartalEl català, llengua global - Vicent Partal
El català, llengua global - Vicent PartalUn Entre Tants
 
Completed Recent Installations
Completed Recent InstallationsCompleted Recent Installations
Completed Recent InstallationsGoldbrecht USA
 
Proyecto de ingles 2
Proyecto de ingles 2Proyecto de ingles 2
Proyecto de ingles 2lorena
 
Software industry financial_report_3q2014
Software industry financial_report_3q2014Software industry financial_report_3q2014
Software industry financial_report_3q2014Netreba
 
Dignity Health Case Study_April 2016
Dignity Health Case Study_April 2016Dignity Health Case Study_April 2016
Dignity Health Case Study_April 2016raffetto
 
Projet7 - Typographie
Projet7 - TypographieProjet7 - Typographie
Projet7 - Typographieguest000769
 
Proyecto de ingles._presentacion[3]
Proyecto de ingles._presentacion[3]Proyecto de ingles._presentacion[3]
Proyecto de ingles._presentacion[3]lorena
 
Ktoa 스마트폰도입에따른국내통신시장환경의변화(100615)
Ktoa 스마트폰도입에따른국내통신시장환경의변화(100615)Ktoa 스마트폰도입에따른국내통신시장환경의변화(100615)
Ktoa 스마트폰도입에따른국내통신시장환경의변화(100615)Inniya
 
Proyecto de ingles._presentacion[4]
Proyecto de ingles._presentacion[4]Proyecto de ingles._presentacion[4]
Proyecto de ingles._presentacion[4]lorena
 
Андрей Рябых. Модели монетизации в интернете. Откуда берутся деньги?
Андрей Рябых. Модели монетизации в интернете. Откуда берутся деньги?Андрей Рябых. Модели монетизации в интернете. Откуда берутся деньги?
Андрей Рябых. Модели монетизации в интернете. Откуда берутся деньги?Доп.Реальность
 
Core Service Offerings
Core Service OfferingsCore Service Offerings
Core Service Offeringsvikastar
 
ERP Secrets For Recruit Dc 2011
ERP Secrets For Recruit Dc 2011ERP Secrets For Recruit Dc 2011
ERP Secrets For Recruit Dc 2011gcluff
 

Viewers also liked (20)

Project I Nt Upret V2
Project I Nt Upret   V2Project I Nt Upret   V2
Project I Nt Upret V2
 
Conseil de Quartier Voltaire Part-Dieu : Plénière du 19/02/2016
Conseil de Quartier Voltaire Part-Dieu : Plénière du 19/02/2016Conseil de Quartier Voltaire Part-Dieu : Plénière du 19/02/2016
Conseil de Quartier Voltaire Part-Dieu : Plénière du 19/02/2016
 
Денис Панин.
Денис Панин.Денис Панин.
Денис Панин.
 
El català, llengua global - Vicent Partal
El català, llengua global - Vicent PartalEl català, llengua global - Vicent Partal
El català, llengua global - Vicent Partal
 
Digitization
DigitizationDigitization
Digitization
 
Completed Recent Installations
Completed Recent InstallationsCompleted Recent Installations
Completed Recent Installations
 
Proyecto de ingles 2
Proyecto de ingles 2Proyecto de ingles 2
Proyecto de ingles 2
 
Software industry financial_report_3q2014
Software industry financial_report_3q2014Software industry financial_report_3q2014
Software industry financial_report_3q2014
 
1entretants
1entretants1entretants
1entretants
 
Dignity Health Case Study_April 2016
Dignity Health Case Study_April 2016Dignity Health Case Study_April 2016
Dignity Health Case Study_April 2016
 
Projet7 - Typographie
Projet7 - TypographieProjet7 - Typographie
Projet7 - Typographie
 
Psp Engagement1 1
Psp Engagement1 1Psp Engagement1 1
Psp Engagement1 1
 
Proyecto de ingles._presentacion[3]
Proyecto de ingles._presentacion[3]Proyecto de ingles._presentacion[3]
Proyecto de ingles._presentacion[3]
 
Ktoa 스마트폰도입에따른국내통신시장환경의변화(100615)
Ktoa 스마트폰도입에따른국내통신시장환경의변화(100615)Ktoa 스마트폰도입에따른국내통신시장환경의변화(100615)
Ktoa 스마트폰도입에따른국내통신시장환경의변화(100615)
 
Proyecto de ingles._presentacion[4]
Proyecto de ingles._presentacion[4]Proyecto de ingles._presentacion[4]
Proyecto de ingles._presentacion[4]
 
Андрей Рябых. Модели монетизации в интернете. Откуда берутся деньги?
Андрей Рябых. Модели монетизации в интернете. Откуда берутся деньги?Андрей Рябых. Модели монетизации в интернете. Откуда берутся деньги?
Андрей Рябых. Модели монетизации в интернете. Откуда берутся деньги?
 
Core Service Offerings
Core Service OfferingsCore Service Offerings
Core Service Offerings
 
İşyeri̇ yeni̇ kontrol belgeleri̇
İşyeri̇ yeni̇ kontrol belgeleri̇İşyeri̇ yeni̇ kontrol belgeleri̇
İşyeri̇ yeni̇ kontrol belgeleri̇
 
Revista01 2
Revista01 2Revista01 2
Revista01 2
 
ERP Secrets For Recruit Dc 2011
ERP Secrets For Recruit Dc 2011ERP Secrets For Recruit Dc 2011
ERP Secrets For Recruit Dc 2011
 

Similar to Greek Election Candidates

2015 07-tuto2-clus type
2015 07-tuto2-clus type2015 07-tuto2-clus type
2015 07-tuto2-clus typejins0618
 
ESBM: An Entity Summarization Benchmark (ESWC 2020)
ESBM: An Entity Summarization Benchmark (ESWC 2020)ESBM: An Entity Summarization Benchmark (ESWC 2020)
ESBM: An Entity Summarization Benchmark (ESWC 2020)Qingxia Liu
 
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation RecommendationMulti-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendationkrisztianbalog
 
KDIR2015-Entity Linking and Knowledge Discovery in Microblogs-Presentation
KDIR2015-Entity Linking and Knowledge Discovery in Microblogs-PresentationKDIR2015-Entity Linking and Knowledge Discovery in Microblogs-Presentation
KDIR2015-Entity Linking and Knowledge Discovery in Microblogs-PresentationPikakshi Manchanda
 
The LODIE team at TAC-KBP2015
The LODIE team at TAC-KBP2015The LODIE team at TAC-KBP2015
The LODIE team at TAC-KBP2015JIE GAO
 
Chapter 1 - Concepts for Object Databases.ppt
Chapter 1 - Concepts for Object Databases.pptChapter 1 - Concepts for Object Databases.ppt
Chapter 1 - Concepts for Object Databases.pptShemse Shukre
 
Entity Retrieval (SIGIR 2013 tutorial)
Entity Retrieval (SIGIR 2013 tutorial)Entity Retrieval (SIGIR 2013 tutorial)
Entity Retrieval (SIGIR 2013 tutorial)krisztianbalog
 
Data modelingpresentation
Data modelingpresentationData modelingpresentation
Data modelingpresentationfikirabc
 
Exploiting web search engines to search structured
Exploiting web search engines to search structuredExploiting web search engines to search structured
Exploiting web search engines to search structuredNita Pawar
 
Question Answering with Lydia
Question Answering with LydiaQuestion Answering with Lydia
Question Answering with LydiaJae Hong Kil
 
Relatedness-based Multi-Entity Summarization
Relatedness-based Multi-Entity SummarizationRelatedness-based Multi-Entity Summarization
Relatedness-based Multi-Entity SummarizationGong Cheng
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingNimrita Koul
 
Entity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsEntity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsCloudTechnologies
 
OODBMS Concepts - National University of Singapore.pdf
OODBMS Concepts - National University of Singapore.pdfOODBMS Concepts - National University of Singapore.pdf
OODBMS Concepts - National University of Singapore.pdfssuserd5e338
 
Open IE tutorial 2018
Open IE tutorial 2018Open IE tutorial 2018
Open IE tutorial 2018Andre Freitas
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge GraphsJeff Z. Pan
 
Natural Language Interface to Knowledge Graph
Natural Language Interface to Knowledge GraphNatural Language Interface to Knowledge Graph
Natural Language Interface to Knowledge GraphVaticle
 

Similar to Greek Election Candidates (20)

Presentation
PresentationPresentation
Presentation
 
2015 07-tuto2-clus type
2015 07-tuto2-clus type2015 07-tuto2-clus type
2015 07-tuto2-clus type
 
ESBM: An Entity Summarization Benchmark (ESWC 2020)
ESBM: An Entity Summarization Benchmark (ESWC 2020)ESBM: An Entity Summarization Benchmark (ESWC 2020)
ESBM: An Entity Summarization Benchmark (ESWC 2020)
 
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation RecommendationMulti-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
 
KDIR2015-Entity Linking and Knowledge Discovery in Microblogs-Presentation
KDIR2015-Entity Linking and Knowledge Discovery in Microblogs-PresentationKDIR2015-Entity Linking and Knowledge Discovery in Microblogs-Presentation
KDIR2015-Entity Linking and Knowledge Discovery in Microblogs-Presentation
 
The LODIE team at TAC-KBP2015
The LODIE team at TAC-KBP2015The LODIE team at TAC-KBP2015
The LODIE team at TAC-KBP2015
 
Chapter 1 - Concepts for Object Databases.ppt
Chapter 1 - Concepts for Object Databases.pptChapter 1 - Concepts for Object Databases.ppt
Chapter 1 - Concepts for Object Databases.ppt
 
Entity Retrieval (SIGIR 2013 tutorial)
Entity Retrieval (SIGIR 2013 tutorial)Entity Retrieval (SIGIR 2013 tutorial)
Entity Retrieval (SIGIR 2013 tutorial)
 
Data modelingpresentation
Data modelingpresentationData modelingpresentation
Data modelingpresentation
 
Text Analytics - JCC2014 Kimelfeld
Text Analytics - JCC2014 KimelfeldText Analytics - JCC2014 Kimelfeld
Text Analytics - JCC2014 Kimelfeld
 
Exploiting web search engines to search structured
Exploiting web search engines to search structuredExploiting web search engines to search structured
Exploiting web search engines to search structured
 
Object oriented database
Object oriented databaseObject oriented database
Object oriented database
 
Question Answering with Lydia
Question Answering with LydiaQuestion Answering with Lydia
Question Answering with Lydia
 
Relatedness-based Multi-Entity Summarization
Relatedness-based Multi-Entity SummarizationRelatedness-based Multi-Entity Summarization
Relatedness-based Multi-Entity Summarization
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Entity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsEntity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutions
 
OODBMS Concepts - National University of Singapore.pdf
OODBMS Concepts - National University of Singapore.pdfOODBMS Concepts - National University of Singapore.pdf
OODBMS Concepts - National University of Singapore.pdf
 
Open IE tutorial 2018
Open IE tutorial 2018Open IE tutorial 2018
Open IE tutorial 2018
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge Graphs
 
Natural Language Interface to Knowledge Graph
Natural Language Interface to Knowledge GraphNatural Language Interface to Knowledge Graph
Natural Language Interface to Knowledge Graph
 

Recently uploaded

11042024_First India Newspaper Jaipur.pdf
11042024_First India Newspaper Jaipur.pdf11042024_First India Newspaper Jaipur.pdf
11042024_First India Newspaper Jaipur.pdfFIRST INDIA
 
Foreign Relation of Pakistan with Neighboring Countries.pptx
Foreign Relation of Pakistan with Neighboring Countries.pptxForeign Relation of Pakistan with Neighboring Countries.pptx
Foreign Relation of Pakistan with Neighboring Countries.pptxunark75
 
lok sabha Elections in india- 2024 .pptx
lok sabha Elections in india- 2024 .pptxlok sabha Elections in india- 2024 .pptx
lok sabha Elections in india- 2024 .pptxdigiyvbmrkt
 
Emerging issues in migration policies.ppt
Emerging issues in migration policies.pptEmerging issues in migration policies.ppt
Emerging issues in migration policies.pptNandinituteja1
 
Geostrategic significance of South Asian countries.ppt
Geostrategic significance of South Asian countries.pptGeostrategic significance of South Asian countries.ppt
Geostrategic significance of South Asian countries.pptUsmanKaran
 
16042024_First India Newspaper Jaipur.pdf
16042024_First India Newspaper Jaipur.pdf16042024_First India Newspaper Jaipur.pdf
16042024_First India Newspaper Jaipur.pdfFIRST INDIA
 
15042024_First India Newspaper Jaipur.pdf
15042024_First India Newspaper Jaipur.pdf15042024_First India Newspaper Jaipur.pdf
15042024_First India Newspaper Jaipur.pdfFIRST INDIA
 
Transforming Andhra Pradesh: TDP's Legacy in Road Connectivity
Transforming Andhra Pradesh: TDP's Legacy in Road ConnectivityTransforming Andhra Pradesh: TDP's Legacy in Road Connectivity
Transforming Andhra Pradesh: TDP's Legacy in Road Connectivitynarsireddynannuri1
 
13042024_First India Newspaper Jaipur.pdf
13042024_First India Newspaper Jaipur.pdf13042024_First India Newspaper Jaipur.pdf
13042024_First India Newspaper Jaipur.pdfFIRST INDIA
 
Political-Ideologies-and-The-Movements.pptx
Political-Ideologies-and-The-Movements.pptxPolitical-Ideologies-and-The-Movements.pptx
Political-Ideologies-and-The-Movements.pptxSasikiranMarri
 
Power in International Relations (Pol 5)
Power in International Relations (Pol 5)Power in International Relations (Pol 5)
Power in International Relations (Pol 5)ssuser583c35
 
Mitochondrial Fusion Vital for Adult Brain Function and Disease Understanding...
Mitochondrial Fusion Vital for Adult Brain Function and Disease Understanding...Mitochondrial Fusion Vital for Adult Brain Function and Disease Understanding...
Mitochondrial Fusion Vital for Adult Brain Function and Disease Understanding...The Lifesciences Magazine
 
12042024_First India Newspaper Jaipur.pdf
12042024_First India Newspaper Jaipur.pdf12042024_First India Newspaper Jaipur.pdf
12042024_First India Newspaper Jaipur.pdfFIRST INDIA
 
14042024_First India Newspaper Jaipur.pdf
14042024_First India Newspaper Jaipur.pdf14042024_First India Newspaper Jaipur.pdf
14042024_First India Newspaper Jaipur.pdfFIRST INDIA
 

Recently uploaded (14)

11042024_First India Newspaper Jaipur.pdf
11042024_First India Newspaper Jaipur.pdf11042024_First India Newspaper Jaipur.pdf
11042024_First India Newspaper Jaipur.pdf
 
Foreign Relation of Pakistan with Neighboring Countries.pptx
Foreign Relation of Pakistan with Neighboring Countries.pptxForeign Relation of Pakistan with Neighboring Countries.pptx
Foreign Relation of Pakistan with Neighboring Countries.pptx
 
lok sabha Elections in india- 2024 .pptx
lok sabha Elections in india- 2024 .pptxlok sabha Elections in india- 2024 .pptx
lok sabha Elections in india- 2024 .pptx
 
Emerging issues in migration policies.ppt
Emerging issues in migration policies.pptEmerging issues in migration policies.ppt
Emerging issues in migration policies.ppt
 
Geostrategic significance of South Asian countries.ppt
Geostrategic significance of South Asian countries.pptGeostrategic significance of South Asian countries.ppt
Geostrategic significance of South Asian countries.ppt
 
16042024_First India Newspaper Jaipur.pdf
16042024_First India Newspaper Jaipur.pdf16042024_First India Newspaper Jaipur.pdf
16042024_First India Newspaper Jaipur.pdf
 
15042024_First India Newspaper Jaipur.pdf
15042024_First India Newspaper Jaipur.pdf15042024_First India Newspaper Jaipur.pdf
15042024_First India Newspaper Jaipur.pdf
 
Transforming Andhra Pradesh: TDP's Legacy in Road Connectivity
Transforming Andhra Pradesh: TDP's Legacy in Road ConnectivityTransforming Andhra Pradesh: TDP's Legacy in Road Connectivity
Transforming Andhra Pradesh: TDP's Legacy in Road Connectivity
 
13042024_First India Newspaper Jaipur.pdf
13042024_First India Newspaper Jaipur.pdf13042024_First India Newspaper Jaipur.pdf
13042024_First India Newspaper Jaipur.pdf
 
Political-Ideologies-and-The-Movements.pptx
Political-Ideologies-and-The-Movements.pptxPolitical-Ideologies-and-The-Movements.pptx
Political-Ideologies-and-The-Movements.pptx
 
Power in International Relations (Pol 5)
Power in International Relations (Pol 5)Power in International Relations (Pol 5)
Power in International Relations (Pol 5)
 
Mitochondrial Fusion Vital for Adult Brain Function and Disease Understanding...
Mitochondrial Fusion Vital for Adult Brain Function and Disease Understanding...Mitochondrial Fusion Vital for Adult Brain Function and Disease Understanding...
Mitochondrial Fusion Vital for Adult Brain Function and Disease Understanding...
 
12042024_First India Newspaper Jaipur.pdf
12042024_First India Newspaper Jaipur.pdf12042024_First India Newspaper Jaipur.pdf
12042024_First India Newspaper Jaipur.pdf
 
14042024_First India Newspaper Jaipur.pdf
14042024_First India Newspaper Jaipur.pdf14042024_First India Newspaper Jaipur.pdf
14042024_First India Newspaper Jaipur.pdf
 

Greek Election Candidates

  • 1. Stat rosa pristina nomine, nomine nuda tenemus César de Pablo Sanchez
  • 2. Overview of previous work TAC-KBP 2010 - Combining Similarities and Regression Classifiers for Entity Linking 1. Task definition: KBP and EL 2. System description 3. Results 4. Conclusions
  • 4. Drug Drug Interactions Relation extraction Anaphora resolution
  • 5. OPINATOR - Opinion Mining Sentiment loaded dictionaries Sentiment classification Opinion summarization Search/Navigation
  • 6. Knowledge acquisition List candidates for the Greek elections in June.
  • 7. Knowledge acquisition List candidates for the Greek elections in June.
  • 8. Knowledge acquisition List candidates for the Greek elections in June. What party does Tsipras represents? How old is he? What does Syriza means?
  • 9. Knowledge acquisition List candidates for the Greek elections in June. What party does Tsipras represents? How old is he? What does Syriza means?
  • 10. Knowledge acquisition List candidates for the Greek elections in June. What party does Tsipras represents? How old is he? What does Syriza means? How old is Samaras?
  • 11. Knowledge acquisition List candidates for the Greek elections in June. What party does Tsipras represents? How old is he? What does Syriza means? How old is Samaras?
  • 12. TAC-KBP 2010 - Combining Similarities and Regression Classifiers for Entity Linking 1. Task definition: KBP and EL 2. System description 3. Results 4. Conclusions
  • 13. TAC-KBP 2010 - Combining Similarities and Regression Classifiers for Entity Linking Knowledge Base Population César de Pablo, Juan Perea, Paloma Martínez
  • 14. Knowledge Base Population Knowledge Base KBP
  • 15. Knowledge Base Population Knowledge Base KBP from Wikipedia dump (2008) ● Title, name, type, id, ● wiki text, ● several facts as [name, value] ● 1.3 million English newswire documents ● Published from 1994 and 2008 ● 488.240 webpages
  • 16. IE = KBP? QA = KBP?
  • 17. IE = KBP? Accurate extraction of facts – not annotation Learn facts from corpus - repetition is not important but helps confidence Asserting wrong information is bad Scalability Provenance QA = KBP?
  • 18. IE = KBP? Accurate extraction of facts – not annotation Slots are fixed but targets change Learn facts from corpus - repetition is not Leverage knowledge from the KB important but helps confidence Global resolution - ground information to the KB Asserting wrong information is bad Avoid contradiction Scalability Detect novel info Provenance QA = KBP?
  • 19. Task at TAC - KBP ● Task –1: Slot Filling in Entity Linking grounding entity mentions document to KB entries ● Slot Filling – Learning attributes about target entities Task 2: Entity Linking
  • 20. Task at TAC - KBP ● Task –1: Slot Filling in Entity Linking grounding entity mentions document to KB entries ● Slot Filling – Learning attributes about target entities
  • 21. Task at TAC - KBP ● Entity Linking – grounding entity mentions in document to KB entries ● Slot Filling – Learning attributes about target entities Task 2: Entity Linking
  • 22. Entity Linking: Example For a name string and a document, determine which mentions in ● Entity Linking – grounding entity entity in a KB if any is being referred to by entries string document to KB the name ● <query id="EL006455"> Slot Filling – Learning attributes about target <name>Reserve Bank</name> entities <docid>eng-NG-31-100316-11150589</docid> <entity>E0700143</entity> </query> <query id="EL06472"> <name>Reserve Bank</name> <docid>eng-NG-31-142262-10040510</docid> <entity>E0421510</entity> </query>
  • 23. Entity Linking: Example For a name string and a document, determine which mentions in ● Entity Linking – grounding entity entity in a KB if any is being referred to by entries string document to KB the name ● <query id="EL006455"> Slot Filling – Learning attributes about target <name>Reserve Bank</name> entities <docid>eng-NG-31-100316-11150589</docid> <entity>E0700143</entity> … </query> E0421510: Reserve Bank of Australia … E0700143: Reserve Bank of India <query id="EL06472"> .... <name>Reserve Bank</name> <docid>eng-NG-31-142262-10040510</docid> NIL <entity>E0421510</entity> </query>
  • 24. Entity Linking: Challenges Focus on confusable entities ● Entity Linking – grounding entity mentions in ● Ambiguous names : Reserve Bank, Alan Jackson, Fonda document to KB entries ●● Slot Filling – Learning attributes about target entities
  • 25. Entity Linking: Challenges Focus on confusable entities ● Entity Linking – grounding entity mentions in ● Ambiguous names entries document to KB ●● Multiple Name– Learning attributes about target Slot Filling variants: Saddam Hussain, Saddam Hussein entities
  • 26. Entity Linking: Challenges Focus on confusable entities ● Entity Linking – grounding entity mentions in ● Ambiguous names entries document to KB ●● Multiple Name– Learning attributes about target Slot Filling variants ● entities Acronym expansion: CDC, AZ
  • 27. Entity Linking: Challenges Focus on confusable entities ● Entity Linking – grounding entity mentions in ● Ambiguous names entries document to KB ●● Multiple Name– Learning attributes about target Slot Filling variants ● entities Acronym expansion ● Variety of cases : Centre for Disease Control, European Centre for Disease Control, AZ, Arizona, Astra Zeneca
  • 28. Entity Linking: Challenges Focus on confusable entities ● Entity Linking – grounding entity mentions in ● Ambiguous names entries document to KB ●● Multiple Name– Learning attributes about target Slot Filling variants ● entities Acronym expansion ● Variety of cases ● Pilot task – entity linking withouth text support ● Identify missing entities – then cluster (2011)
  • 29. Entity Linking: Evaluation Name mention – document pairs ● Accuracy micro = num correct / num queries ● Accuracy macro = group by entities (2009) queries NIL set genre % NIL 3904 2229 eval 2009 news 0.571 1500 426 train 2010 web 0.284 2250 1230 eval 2010 news + 0.547 web
  • 30. uc3m EL system ● Supervised architecture ● Entity Linking – grounding entity mentions in ● Use similarities to KB entries or parts of them – avoid a document between objects wide feature vector ● Slot Filling – Learning attributes about target ● entities 1) Candidate Entity Retrieval 2) Candidate Filtering 3) Validation (NIL classification)
  • 32. 1) Candidate Retrieval ● Each KB article is indexed using Lucene, using several ● Entity Linking – grounding entity mentions in indexes and fields KB entries document to ● ● ALIASFilling – names plus aliases extracted from wiki slots: Slot - include Learning attributes about target alias, abbreviation, website, etc. entities ● NER – Named entities extracted from text: <id, ne, text> ● KB - entity slots <id, [(slot_name,slot_value)]> ● WIKIPEDIA – anchorList, category, redirect, outlinks, inlinks ● Each EL query transforms into several Lucene queries – result [KB name, score] list
  • 33. 1) Candidate Retrieval ● EL Query: [Michael Jordan,eng-NG-31-100316-11150589] ● Entity Linking – grounding entity mentions in ● Lucene queries:to KB entries document ● ● name=Michael AND name = Jordan Slot Filling – Learning attributes about target ● entities alias=Michael AND alias = Jordan ● abbr=Michael AND abbr = Jordan ● For each query: ● [EL0989789, Michael Jordan, 25.00] ● [EL6565356, Michael B. Jordan , 25.00] ● [EL6565356, Michael I. Jordan , 25.00] ● [EL6565356, Michael-Hakim Jordan , 25.00] ● [EL6565356, Jordan , 20.00]
  • 34. 2) Candidate Filtering ● Classification problem ● Entity Linking – grounding entity mentions in ● decide (EL query KB entries name + wiki text ) is a good document to + text , KB match ● Slot Filling – Learning attributes about target ● In fact, rank by prediction confidence entities ● Use similarity scores as features – norm and unnorm ● Use a cost sensitive classifier. ● Best results: Model trees with linear regression leafs
  • 35. Features ● Index-based scores: ● Entity Linking – grounding entity mentions in ● sim (EL queries, KB entries) directly from initial retrieval document to KB entries ● Context-similarity Learning attributes about target ● Slot Filling – scores: ● entities sim(document, wikitext) o sim(document,slots) ● Name similarity score: ● sim (EL queries, KB entries) – more expensive: equal, QcontainsE, EcontainsQ, Jaro, Jaro-Winkler, SLIM (based on SecondString)
  • 36. 3) Validation ● Classification – selected candidate is good enough or NIL ● Entity Linking – grounding entity mentions in ● Positive examples KBcorrect candidate example document to – entries ● ● Slot Filling – Learning attributes about target Negative examples – top ranked entities for those queries entities that do not have a link in the KB ● Balanced dataset ● Best classifier: Logistic Regression
  • 37. EL results - main ● ● ● Entity Linking – grounding entity mentions in document to KB web news entriesnews+web Highest Median ● 750 ORG 0.69 0.67 0.67 0.85 0.68 ● Slot GPE 0.52– Learning attributes about target 749 Filling 0.53 0.51 0.80 0.60 ● entities 0.82 751 PER 0.76 0.85 0.96 0.85 ● 2250 ALL 0.67 0.65 0.68 0.87 0.69 ● ● Influence of domain?
  • 38. EL results - main ● ● ● Entity Linking – grounding entity mentions in document to KB web news entriesnews+web Highest Median ● 750 ORG 0.69 0.67 0.67 0.85 0.68 ● Slot GPE 0.52– Learning attributes about target 749 Filling 0.53 0.51 0.80 0.60 ● entities 0.82 751 PER 0.76 0.85 0.96 0.85 ● 2250 ALL 0.67 0.65 0.68 0.87 0.69 ●
  • 39. EL results - main ● ● ● Entity Linking – grounding entity mentions in document to KB web news entriesnews+web Highest Median ● 750 ORG 0.69 0.67 0.67 0.85 0.68 ● Slot GPE 0.52– Learning attributes about target 749 Filling 0.53 0.51 0.80 0.60 ● entities 0.82 751 PER 0.76 0.85 0.96 0.85 ● 2250 ALL 0.67 0.65 0.68 0.87 0.69 ● GPE are particularly difficult
  • 40. EL results - main ● AA ● Entity Linking – grounding entity mentions in document to KB web news entriesnews+web Highest Median 750 ORG 0.69 0.67 0.67 0.85 0.68 ● Slot GPE 0.52– Learning attributes about target 749 Filling 0.53 0.51 0.80 0.60 entities 0.82 751 PER 0.76 0.85 0.96 0.85 2250 ALL 0.67 0.65 0.68 0.87 0.69 news web news+web Highest Median 2250 ALL 0.67 0.65 0.68 0.87 0.69 1020 noNIL 0.51 0.59 0.49 1230 NIL 0.81 0.70 0.82
  • 41. EL results - main ● AA ● Entity Linking – grounding entity mentions in document to KB web news entriesnews+web Highest Median 750 ORG 0.69 0.67 0.67 0.85 0.68 ● Slot GPE 0.52– Learning attributes about target 749 Filling 0.53 0.51 0.80 0.60 entities 0.82 751 PER 0.76 0.85 0.96 0.85 2250 ALL 0.67 0.65 0.68 0.87 0.69 news web news+web Highest Median 2250 ALL 0.67 0.65 0.68 0.87 0.69 1020 noNIL 0.51 0.59 0.49 1230 NIL 0.81 0.70 0.82
  • 42. EL results – pilot w/o text ● ● ● Entity Linking – grounding entity mentions in document to KB entries news(main) news +n-sim NIL +n-sim all ● 2250 ALL 0.67 0.58 0.66 0.70 ● Slot Filling – Learning attributes about target 1020 noNIL 0.51 0.35 0.40 0.47 ● entities NIL 0.81 1230 0.77 0.88 0.88 ● ● Including name similarity scores helped
  • 43. EL results – pilot w/o text ● ● ● Entity Linking – grounding entity mentions in document to KB entries news(main) news +n-sim NIL +n-sim all ● 2250 ALL 0.67 0.58 0.66 0.70 ● Slot Filling – Learning attributes about target 1020 noNIL 0.51 0.35 0.40 0.47 ● entities NIL 0.81 1230 0.77 0.88 0.88 ● ● Including name similarity scores helped
  • 44. EL systems comparison ● Prior on Link probability/popularity (Stanford-UBC 2009, LCC 2010, ● Entity Linking – grounding entity mentions in Microsoft 2011) document to KB entries Learning to rank algorithms: ListNet (CUNY 2011) ● Slot Filling – Learning attributes about target ● Expand queries: acronym expansion/correference (NUS 2011) entities ● Unsupervised system – entity co-ocurrence + PageRank (WebTLab 2010) ● Inductive EL – first cluster, then link (LCC 2011) ● Collective entity linking (Microsoft 2011)
  • 45. Conclusion ● Supervised EL system ● Entity Linking – grounding entity mentions in ● Influence of training size document to KB entries ●● beware of training data distribution Slot Filling – Learning attributes about target ● entities Consider name-similarities even for reranking ● Improve initial candidate retrieval ● Perform collective Entity Linking ● Efficiency?
  • 46. Related tasks ● Cluster Documents Mentioning Entities ● Entity correference – document and cross- document ● Add missing links between Wikipedia pages ● Link entities to matching Wikipedia articles