SlideShare a Scribd company logo
1 of 1
Download to read offline
ROSeAnn:Reconciling Opinions of 
Semantic Annotations The Need for Integration Conflicting Opinions 
Supervised Aggregation (MEMM) 
Another brilliant goal by England midfielder David 
Beckham earned Manchester United a point from a 1-1 
draw at Stamford Bridge on Saturday after Gianfranco Zola 
had put Chelsea ahead. 
United stayed top of the English league standings as 
Liverpool could only draw 0-0 at home to Blackburn 
despite dominating throughout. Newcastle moved into third 
as a Les Ferdinand goal was enough to win 1-0 at bottom 
club Middlesbrough. 
Ian Marshall scored a hat-trick in the first 27 minutes to 
help Leicester beat Derby 4-2 while Sheffield Wednesday 
came from two down to win 3-2 at Southampton. 
Leeds won 1-0 at Sunderland and Coventry against Everton 
and Nottingham Forest against Aston Villa ended goalless. 
Semantic annotators label text snippets 
as referring to certain entities, e.g., 
Barack Obama, London, or as instances 
of particular entity types, e.g., actors, 
governmental organisations, countries. 
Semantic Annotators 
A growing number of freely available 
online services can enrich documents 
with semantic annotations. 
Unfortunately, their opinions about an entity 
often disagree as they might be based on very 
diverse background knowledge such as training 
corpora, knowledge bases, contextual 
information, POS tags, and crowds. 
AlchemyAPI:Person 
DBPediaSpotlight:SoccerClub 
Lupedia:Settlement 
Wikimeta:Organisation 
StanfordNER:Organisation 
IllinoisNER:Organisation 
Extractiv:City 
AlchemyAPI:Facility 
AlchemyAPI:GeographicFeature 
DBPediaSpotlight:SportsTeam 
Zemanta:SportsTeam 
OpenCalais:GeographicFeature 
Luying Chen, Stefano Ortona, Giorgio Orsi, and Michael Benedikt 
University of Oxford - Department of Computer Science 
(name.surname@cs.ox.ac.uk) 
Unsupervised Weighted Repair (WR) 
http://diadem.cs.ox.ac.uk/roseann 
Each annotator comes with a vocabulary 
of semantically-related entity types that 
often overlap on common-sense entities, 
such as places, persons, and companies. 
However, each annotator covers only a 
fraction of a much larger universe of 
concepts. By relating such vocabularies 
to each other via mappings we can 
achieve much better coverage. 
Museum 
Accuracy of individual annotators 
varies greatly, redundancy and 
logical relationships can be used to 
gain confidence about an entity. 
Each annotator contributes some 
original types. None of them can 
be dropped without losing recall. 
Empirical Evaluation 
1 
0.8 
0.6 
0.4 
0.2 
0 
Precision Recall FScore 
Person Date Movie 
1 
0.8 
0.6 
0.4 
0.2 
0 
Precision Recall Fscore 
Location Sport Movie 
Thing 
Organisation Facility 
Place Person 
Point of 
Interest 
Club 
Soccer 
Club 
Location 
Settlement Natural 
City 
Feature 
Geographic 
= Feature 
Organisation(X) Person(X) 
Organisation(X) Location(X) 
Settlement(X) GeographicFeature(X) 
Location(X) PointOfInterest(X) 
Place(X) Person(X) 
Person(X) Facility(X) 
Annotation vocabularies are semantically 
related to each other via existing knowledge 
bases (DBPedia, Freebase) or via common-sense. 
These relations can be used to map 
them to a common ontology to check for 
logical conflicts or compatibility. 
IS-A constraints enable type inference, 
thus increasing recall at the expense of 
precision. Disjointness constraints induce 
logical inconsistencies used to locate 
potentially erroneous annotations, thus 
increasing precision. 
WR computes an ontology-aware repair of the 
set of annotations that is logically consistent 
and “fair” to the annotators involved. 
WR is unsupervised and does not assume any 
prior knowledge about the annotators. 
If the global ontology only states IS-A and 
disjointness constraints, a solution can be 
computed efficiently (repair → 2-SAT). 
WR is designed for scenarios where training 
data is unavailable or sparse. 
Conflicts also occur at span-level, i.e., 
annotators agreeing on the entity type but 
not on the extension of the span. 
Notion of span adapted to support 
composite annotations consisting of tokens 
carrying logically incompatible types, e.g., 
“[[Subic] [Naval Base]]”. 
0.85 
0.8 
0.75 
0.7 
0.65 
0.6 
Politician 
Politician 
Place 
Extractiv WeightedRepair MEMM 
μP μR μF1 MP MR MF1 
Place ≠ Person 
0.9 
0.8 
0.7 
0.6 
0.5 
0.4 
0.3 
0.2 
0.1 
Politician 
Place 
Politician 
Zemanta WeightedRepair MEMM 
Score(Politician) -> +1 
Score(Person) -> +1 
Score(Place) -> -1 
μP μR μF1 MP MR MF1 
0.9 
0.8 
0.7 
0.6 
Fox WeightedRepair MEMM 
μP μR μF1 MP MR MF1 
0.9 
0.8 
0.7 
0.6 
0.5 
0.4 
0.3 
NERD WeightedRepair MEMM 
μP μR μF1 MP MR MF1 
WR and MEMM have been tested on 
~670 documents from 4 corpora: MUC7, 
Reuters, NETagger, and Fox. 
Comparison have been carried out against 
individual annotators and competitor 
aggregators such as Fox and NERD. 
Semantic (micro/macro) precision and 
(micro/macro) recall metrics have been 
adopted for the comparison. 
When training data is available, 
supervision can be used to 
learn the most probable 
sequences of annotations given 
those available from gold 
standard annotated documents. 
MEMM can learn unorthodox 
relationships among annotations that do 
not necessarily follow standard inference 
rules, e.g., it can learn to predict a 
subclass C from (a set of) annotations 
mentioning a superclass C’ of C. 
Person 
Politician 
Place 
¦ u 
c 
max Score(c) xc 
Politician 
Person 
Politician 
WR and MEMM perform in average better than all individual 
annotators and aggregators with the exception of OpenCalais. 
However, its vocabulary represents some 18% of all types. 
MEMM is more accurate than WR, but on sparse datasets WR 
shows better performance than MEMM. WR delivers higher 
recall than MEMM that, in turn, is more precise than WR. 
512 
256 
128 
64 
32 
16 
WR Solution Computation 
Reuters MUC7 NETTagger FOX 
2 5 8 11 
msec 
# Annotators 
10000 
1000 
msec 
100 
10 
1 
MEMM Prediction 
Reuters MUC7 NETTagger FOX 
2 5 8 11 
# Annotators 
Online aggregation of annotation is feasible in practice 
(~300ms for WR and ~1s for MEMM). The aggregation time is 
orders of magnitude less than the time required to invoke the 
online services and collect their answers. 
Apart from the entity type and the 
source annotator, the feature set for 
MEMM includes ontological features 
such as IS-A and disjointness. All 
features are token based. 
Online annotators are often black boxes 
characterised by a continuously evolving 
vocabulary, where entity types are 
added, merged, or removed. 
Region 
Person Country 
Scientist 
Planet 
Brand 
Product 
Planet 
Ocean 
Company 
Mansour 
WR receives as input 
an annotated span 
and produces as 
output a logically 
consistent set of 
annotations. 
An atomic score is 
computed for each 
opinion, based on 
(inferred) support / 
opposition by other 
opinions. 
Annotations are 
inserted or deleted 
from the initial 
solution to obtain a 
consistent set of 
annotation that 
maximizes the 
objective function. 
An initial solution 
consists of the 
possibly inconsistent 
union of all entity 
types. 
Only the most 
specific annotations 
are retained in the 
final solution. 
xc 1   
xc 1

More Related Content

Viewers also liked

Overview of Dan Olteanu's Research presentation
Overview of Dan Olteanu's Research presentationOverview of Dan Olteanu's Research presentation
Overview of Dan Olteanu's Research presentationDBOnto
 
PAGOdA paper
PAGOdA paperPAGOdA paper
PAGOdA paperDBOnto
 
PDQ Poster
PDQ PosterPDQ Poster
PDQ PosterDBOnto
 
Semantic Faceted Search with SemFacet presentation
Semantic Faceted Search with SemFacet presentationSemantic Faceted Search with SemFacet presentation
Semantic Faceted Search with SemFacet presentationDBOnto
 
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...DBOnto
 
SemFacet Poster
SemFacet PosterSemFacet Poster
SemFacet PosterDBOnto
 
PAGOdA Presentation
PAGOdA PresentationPAGOdA Presentation
PAGOdA PresentationDBOnto
 
ROSeAnn Presentation
ROSeAnn PresentationROSeAnn Presentation
ROSeAnn PresentationDBOnto
 
PAGOdA poster
PAGOdA posterPAGOdA poster
PAGOdA posterDBOnto
 
Optique - poster
Optique - posterOptique - poster
Optique - posterDBOnto
 
DIADEM: domain-centric intelligent automated data extraction methodology Pres...
DIADEM: domain-centric intelligent automated data extraction methodology Pres...DIADEM: domain-centric intelligent automated data extraction methodology Pres...
DIADEM: domain-centric intelligent automated data extraction methodology Pres...DBOnto
 
Welcome by Ian Horrocks
Welcome by Ian HorrocksWelcome by Ian Horrocks
Welcome by Ian HorrocksDBOnto
 
Parallel Datalog Reasoning in RDFox Presentation
Parallel Datalog Reasoning in RDFox PresentationParallel Datalog Reasoning in RDFox Presentation
Parallel Datalog Reasoning in RDFox PresentationDBOnto
 
Query Distributed RDF Graphs: The Effects of Partitioning Paper
Query Distributed RDF Graphs: The Effects of Partitioning PaperQuery Distributed RDF Graphs: The Effects of Partitioning Paper
Query Distributed RDF Graphs: The Effects of Partitioning PaperDBOnto
 

Viewers also liked (14)

Overview of Dan Olteanu's Research presentation
Overview of Dan Olteanu's Research presentationOverview of Dan Olteanu's Research presentation
Overview of Dan Olteanu's Research presentation
 
PAGOdA paper
PAGOdA paperPAGOdA paper
PAGOdA paper
 
PDQ Poster
PDQ PosterPDQ Poster
PDQ Poster
 
Semantic Faceted Search with SemFacet presentation
Semantic Faceted Search with SemFacet presentationSemantic Faceted Search with SemFacet presentation
Semantic Faceted Search with SemFacet presentation
 
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...
 
SemFacet Poster
SemFacet PosterSemFacet Poster
SemFacet Poster
 
PAGOdA Presentation
PAGOdA PresentationPAGOdA Presentation
PAGOdA Presentation
 
ROSeAnn Presentation
ROSeAnn PresentationROSeAnn Presentation
ROSeAnn Presentation
 
PAGOdA poster
PAGOdA posterPAGOdA poster
PAGOdA poster
 
Optique - poster
Optique - posterOptique - poster
Optique - poster
 
DIADEM: domain-centric intelligent automated data extraction methodology Pres...
DIADEM: domain-centric intelligent automated data extraction methodology Pres...DIADEM: domain-centric intelligent automated data extraction methodology Pres...
DIADEM: domain-centric intelligent automated data extraction methodology Pres...
 
Welcome by Ian Horrocks
Welcome by Ian HorrocksWelcome by Ian Horrocks
Welcome by Ian Horrocks
 
Parallel Datalog Reasoning in RDFox Presentation
Parallel Datalog Reasoning in RDFox PresentationParallel Datalog Reasoning in RDFox Presentation
Parallel Datalog Reasoning in RDFox Presentation
 
Query Distributed RDF Graphs: The Effects of Partitioning Paper
Query Distributed RDF Graphs: The Effects of Partitioning PaperQuery Distributed RDF Graphs: The Effects of Partitioning Paper
Query Distributed RDF Graphs: The Effects of Partitioning Paper
 

Recently uploaded

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

ROSeAnn: Reconciling Opinions of Semantic Annotators Poster

  • 1. ROSeAnn:Reconciling Opinions of Semantic Annotations The Need for Integration Conflicting Opinions Supervised Aggregation (MEMM) Another brilliant goal by England midfielder David Beckham earned Manchester United a point from a 1-1 draw at Stamford Bridge on Saturday after Gianfranco Zola had put Chelsea ahead. United stayed top of the English league standings as Liverpool could only draw 0-0 at home to Blackburn despite dominating throughout. Newcastle moved into third as a Les Ferdinand goal was enough to win 1-0 at bottom club Middlesbrough. Ian Marshall scored a hat-trick in the first 27 minutes to help Leicester beat Derby 4-2 while Sheffield Wednesday came from two down to win 3-2 at Southampton. Leeds won 1-0 at Sunderland and Coventry against Everton and Nottingham Forest against Aston Villa ended goalless. Semantic annotators label text snippets as referring to certain entities, e.g., Barack Obama, London, or as instances of particular entity types, e.g., actors, governmental organisations, countries. Semantic Annotators A growing number of freely available online services can enrich documents with semantic annotations. Unfortunately, their opinions about an entity often disagree as they might be based on very diverse background knowledge such as training corpora, knowledge bases, contextual information, POS tags, and crowds. AlchemyAPI:Person DBPediaSpotlight:SoccerClub Lupedia:Settlement Wikimeta:Organisation StanfordNER:Organisation IllinoisNER:Organisation Extractiv:City AlchemyAPI:Facility AlchemyAPI:GeographicFeature DBPediaSpotlight:SportsTeam Zemanta:SportsTeam OpenCalais:GeographicFeature Luying Chen, Stefano Ortona, Giorgio Orsi, and Michael Benedikt University of Oxford - Department of Computer Science (name.surname@cs.ox.ac.uk) Unsupervised Weighted Repair (WR) http://diadem.cs.ox.ac.uk/roseann Each annotator comes with a vocabulary of semantically-related entity types that often overlap on common-sense entities, such as places, persons, and companies. However, each annotator covers only a fraction of a much larger universe of concepts. By relating such vocabularies to each other via mappings we can achieve much better coverage. Museum Accuracy of individual annotators varies greatly, redundancy and logical relationships can be used to gain confidence about an entity. Each annotator contributes some original types. None of them can be dropped without losing recall. Empirical Evaluation 1 0.8 0.6 0.4 0.2 0 Precision Recall FScore Person Date Movie 1 0.8 0.6 0.4 0.2 0 Precision Recall Fscore Location Sport Movie Thing Organisation Facility Place Person Point of Interest Club Soccer Club Location Settlement Natural City Feature Geographic = Feature Organisation(X) Person(X) Organisation(X) Location(X) Settlement(X) GeographicFeature(X) Location(X) PointOfInterest(X) Place(X) Person(X) Person(X) Facility(X) Annotation vocabularies are semantically related to each other via existing knowledge bases (DBPedia, Freebase) or via common-sense. These relations can be used to map them to a common ontology to check for logical conflicts or compatibility. IS-A constraints enable type inference, thus increasing recall at the expense of precision. Disjointness constraints induce logical inconsistencies used to locate potentially erroneous annotations, thus increasing precision. WR computes an ontology-aware repair of the set of annotations that is logically consistent and “fair” to the annotators involved. WR is unsupervised and does not assume any prior knowledge about the annotators. If the global ontology only states IS-A and disjointness constraints, a solution can be computed efficiently (repair → 2-SAT). WR is designed for scenarios where training data is unavailable or sparse. Conflicts also occur at span-level, i.e., annotators agreeing on the entity type but not on the extension of the span. Notion of span adapted to support composite annotations consisting of tokens carrying logically incompatible types, e.g., “[[Subic] [Naval Base]]”. 0.85 0.8 0.75 0.7 0.65 0.6 Politician Politician Place Extractiv WeightedRepair MEMM μP μR μF1 MP MR MF1 Place ≠ Person 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Politician Place Politician Zemanta WeightedRepair MEMM Score(Politician) -> +1 Score(Person) -> +1 Score(Place) -> -1 μP μR μF1 MP MR MF1 0.9 0.8 0.7 0.6 Fox WeightedRepair MEMM μP μR μF1 MP MR MF1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 NERD WeightedRepair MEMM μP μR μF1 MP MR MF1 WR and MEMM have been tested on ~670 documents from 4 corpora: MUC7, Reuters, NETagger, and Fox. Comparison have been carried out against individual annotators and competitor aggregators such as Fox and NERD. Semantic (micro/macro) precision and (micro/macro) recall metrics have been adopted for the comparison. When training data is available, supervision can be used to learn the most probable sequences of annotations given those available from gold standard annotated documents. MEMM can learn unorthodox relationships among annotations that do not necessarily follow standard inference rules, e.g., it can learn to predict a subclass C from (a set of) annotations mentioning a superclass C’ of C. Person Politician Place ¦ u c max Score(c) xc Politician Person Politician WR and MEMM perform in average better than all individual annotators and aggregators with the exception of OpenCalais. However, its vocabulary represents some 18% of all types. MEMM is more accurate than WR, but on sparse datasets WR shows better performance than MEMM. WR delivers higher recall than MEMM that, in turn, is more precise than WR. 512 256 128 64 32 16 WR Solution Computation Reuters MUC7 NETTagger FOX 2 5 8 11 msec # Annotators 10000 1000 msec 100 10 1 MEMM Prediction Reuters MUC7 NETTagger FOX 2 5 8 11 # Annotators Online aggregation of annotation is feasible in practice (~300ms for WR and ~1s for MEMM). The aggregation time is orders of magnitude less than the time required to invoke the online services and collect their answers. Apart from the entity type and the source annotator, the feature set for MEMM includes ontological features such as IS-A and disjointness. All features are token based. Online annotators are often black boxes characterised by a continuously evolving vocabulary, where entity types are added, merged, or removed. Region Person Country Scientist Planet Brand Product Planet Ocean Company Mansour WR receives as input an annotated span and produces as output a logically consistent set of annotations. An atomic score is computed for each opinion, based on (inferred) support / opposition by other opinions. Annotations are inserted or deleted from the initial solution to obtain a consistent set of annotation that maximizes the objective function. An initial solution consists of the possibly inconsistent union of all entity types. Only the most specific annotations are retained in the final solution. xc 1 xc 1