SlideShare une entreprise Scribd logo
1  sur  43
Télécharger pour lire hors ligne
What is the relevant information in a text? 
Silvia Giannini 
Visiting PhD student 
Politecnico di Bari 
Web & Media Group meeting | 27.10.2014
The scenario 
•Entertainment domain: BBC TV-programs (TV-series, movies, documentaries, …) 
•Aim: Enrich the content description with links to the Web of Data 
•Applications: Linked Data patterns for recommendations; multi- domain datasets creation, … 
Following the grandeur of Baroque, Rococo art is often dismissed as frivolous and unserious, but Waldemar Januszczak disagrees. The first episode is about travel in the 18th century and how it impacted greatly on some of the finest art ever made. The world was getting smaller and took on new influences shown in the glorious Bavarian pilgrimage architecture, Canaletto's romantic Venice and the blossoming of exotic designs and tastes all over Europe. The Rococo was art expressing itself in new, exciting ways.
How? 
•Yet another semantic annotation tool? 
•Peculiarities: 
- Different formats 
- Broad coverage of topics 
Following the grandeur of Baroque, Rococo art is often dismissed as frivolous and unserious, but Waldemar Januszczak disagrees. The first episode is about travel in the 18th century and how it impacted greatly on some of the finest art ever made. The world was getting smaller and took on new influences shown in the glorious Bavarian pilgrimage architecture, Canaletto's romantic Venice and the blossoming of exotic designs and tastes all over Europe. The Rococo was art expressing itself in new, exciting ways.
Multiple annotators integration 
Following the grandeur of Baroque, Rococo art is often dismissed as frivolous and unserious, but Waldemar Januszczak disagrees. The first episode is about travel in the 18th century and how it impacted greatly on some of the finest art ever made. The world was getting smaller and took on new influences shown in the glorious Bavarian pilgrimage architecture, Canaletto's romantic Venice and the blossoming of exotic designs and tastes all over Europe. The Rococo was art expressing itself in new, exciting ways. 
text enrichment 
“Canaletto” 
ontology:Location 
“Rococo” 
dbpedia:Rococo_(band) 
•Type mis-classification 
•URI mis-annotation 
•Not relevant labels 
The NERD framework
Multiple annotators integration 
•Feature-based solution for entity relevance definition and entity classification 
•Majority vote and disagreement metrics 
•Extractors can disagree on: 
–The existence of a label, e.g. some can identify a lable and other can’t 
–The span of the label, e.g. ‘Myra’ VS ‘Myra Gail’ 
–The type of the label, e.g. ‘Building’ VS ‘Organization’ 
–The URI of the label 
Proposal
Multiple annotators integration 
DEFINE THE RELEVANCE OF A LABEL 
•Feature-based solution for entity relevance definition and entity classification 
•Majority vote and disagreement metrics 
•Extractors can disagree on: 
–The existence of a label, e.g. some can identify a lable and other can’t 
–The span of the label, e.g. ‘Myra’ VS ‘Myra Gail’ 
–The type of the label, e.g. ‘Building’ VS ‘Organization’ 
–The URI of the label 
Proposal
Multiple annotators integration 
INFLUENCE THE CONFIDENCE OF AN ANNOTATION 
•Feature-based solution for entity relevance definition and entity classification 
•Majority vote and disagreement metrics 
•Extractors can disagree on: 
–The existence of a label, e.g. some can identify a lable and other can’t 
–The span of the label, e.g. ‘Myra’ VS ‘Myra Gail’ 
–The type of the label, e.g. ‘Building’ VS ‘Organization’ 
–The URI of the label 
Proposal
Workflow for relevance assessment 
Following the grandeur of Baroque, Rococo art is often dismissed as frivolous and unserious, but Waldemar Januszczak disagrees. […] The first episode is about travel in the 18th century and how it impacted greatly on some of the finest art ever made. The world was getting smaller and took on new influences shown in the glorious Bavarian pilgrimage architecture, Canaletto's romantic Venice and the blossoming of exotic designs and tastes all over Europe. The Rococo was art expressing itself in new, exciting ways. 
Relevant labels 
crowd 
NLP tools 
metrics 
Features matrix extraction 
Classifier
•Disagreement between TextRazor annotations in NERD and standalone TextRazor in terms of missing labels, missing types, granularity of types. 
PID: b0074t2b 
Title: Great plains 
Synopsis 
‘’The great plains are the vast open spaces of our planet. […] Close on their heels come an array of plains predators including eagles, wolves and lions. […]‘’ 
Label 
eagles#439#445 
Extractors 
Types URI 
textrazor(nerd) 
nerd:Thing http://en.wikipedia.org/wiki/Eagle 
textrazor 
dbpedia-owl:Bird http://en.wikipedia.org/wiki/Eagle 
Workflow for relevance assessment
Pre-processing 
•Alignment of extractors’ results: 
-Label: each label has a list of alternative labels contained in or overlapping with the given one 
-Type: same vocabulary for all extraction methods (529 classes of the Dbpedia ontology, extended with owl:Thing and Amount type) 
-URI: Dbpedia resources 
•Label 
•NERD ontology class 
•sameAs link 
•Label 
•DBpedia ontology class 
•Wikipedia page 
•Label 
•DBpedia category 
•Wikipedia page 
•Label 
•DBpedia ontology class 
•DBpedia URI
Majority-vote for relevance: longest-span strategy* 
extractor 
label 
startOffset 
endOffset 
Aligned label 
Rococo 
35 
41 
Rococo art#35#45 
Rococo art 
35 
45 
Rococo art#35#45 
Rococo Art 
35 
42 
41 
45 
Rococo art#35#45 
Rococo art#35#45 
Rococo Art 
35 
42 
41 
45 
Rococo art#35#45 
Rococo art#35#45 
•Label 
•NERD ontology class 
•sameAs link 
•Label 
•DBpedia ontology class 
•Wikipedia page 
•Label 
•DBpedia category 
•Wikipedia page 
•Label 
•DBpedia ontology class 
•DBpedia URI 
Label & span alignment 
The LONGEST-SPAN strategy 
*Analogously, the shortest-span strategy can be applied
Issues 
•In the previous example, Rococo and Art are related to the same category (Arts). Thus, the longest-span strategy for labels alignment will lead to a consistent conceptual category for the new label (Rococo Art). 
•Consider this program description: 
A journey back to the 1950s for a look at the wildest pop music of all time in a film that tells the stories of Bill Haley, Elvis Presley, Little Richard, Chuck Berry, Jerry Lee Lewis and Buddy Holly, giants from an era when pop music really was mad, bad and dangerous to know.The programme features the artists themselves, alongside people like Bill Haley's original Comets, the Crickets, Buddy Holly's widow Maria Elena, Jerry Lee Lewis's former wife Myra Gail and his sister, Chuck Berry's son and many more, including June Juanico, Elvis' first serious girlfriend.Other contributors include Tom Jones, Jamie Callum, Paul McCartney, Cliff Richard, Joe Brown, Marty Wilde, Green Day, Minnie Driver, Jack White, the Mavericks, Jools Holland, Hank Marvin, Fontella Bass, John Waters and more.Elvis's pelvis was just the start. Who had to change the lyrics to their biggest hit because the originals were too obscene? Who married their 13-year-old cousin? Who used lard to get their hair just right? And what happened on the day the music died? 
BBC Program: Kings of Rock and Roll (Pid: b007c95q)
Issues 
•In the previous example, Rococo and Art refer to the same conceptual category. Thus, the longest-span strategy for labels alignment will lead to a consistent conceptual category for the new label (Rococo Art). 
•Consider this program description: 
BBC Program: Kings of Rock and Roll (Pid: b007c95q) 
extractor 
label 
startOffset 
endOffset 
Type 
Aligned label 
Myra Gail 
453 
462 
Person 
Myra Gail#453#462 
Myra Myra Gail 
453 
453 
457 
462 
Settlement 
Person 
Myra Gail#453#462 
Myra Gail#453#462 
Myra 
Gail 
453 
458 
457 
462 
Band,Artist 
Person 
Myra Gail#453#462 
Myra Gail#453#462 
Myra Gail 
453 
462 
Thing 
Myra Gail#453#462
The HYBRID-SPAN strategy1 
Given two labels l1 and l2 and an upper ontology O, l1 and l2 belong to the same annotation span if: 
1. l1 is contained in l2 or l2 is contained in l1 and type(l1) and type(l2) are in super(sub)class relationship (e.g. Royal Academy[Organization] in Royal Academy of Music[University]) 
OR 
2. l1 and l2 are overlapping but neither l1 is contained in l2 nor l2 is contained in l1 (e.g., Royal Academy[Organization] and Academy of Music[Building]) 
OR 
3. l1 coincides with l2 (e.g., Royal Academy[Organization] and Royal Academy[Museum]) 
What about Thing type? 
1Chen, L., Ortona, S., Orsi, G., & Benedikt, M. (2013). Aggregating Semantic Annotators. Proceedings of the VLDB Endowment, Vol. 6, No. 13, (p. 1486-1497). Riva del Garda, Trento, Italy.
•Label 
•NERD ontology class 
•sameAs link 
•Label 
•DBpedia ontology class 
•Wikipedia page 
•Label 
•DBpedia category 
•Wikipedia page 
•Label 
•DBpedia ontology class 
•DBpedia URI 
Label & span 
alignment 
The HYBRID-SPAN strategy* 
extractor 
label 
startOffset 
endOffset 
Type 
Aligned label 
Myra Gail 
453 
462 
Person 
Myra Gail#453#462 
Myra 
Myra Gail 
453 
453 
457 
462 
Settlement 
Person 
Myra#453#457 
Myra Gail#453#462 
Myra 
Gail 
453 
458 
457 
462 
Band,Artist 
Person 
Myra#453#457 
Myra Gail#453#462 
Myra Gail 
453 
462 
Thing 
Myra Gail#453#462 
*The vocabulary alignment is required as previous step 
Majority-vote for relevance: hybrid-span strategy
Features for Relevance 
•F1: nerd(l) -> 1 if label l is extracted by NERD; 
0 otherwise 
label#offset 
Alternative labels 
F1 
F2 
F3 
F4 
F5 
F6 
F7 
F8 
F9 
F10 
F11 
wildlife 
#912#920 
1 
1 
0 
1 
0.75 
1 
0.75 
1 
0.75 
0 
0 
east africa 
#361#372 
africa 
#366#372 
0 
1 
1 
1 
0.75 
1 
0.75 
0.5 
0.375 
0 
0 
africa 
#366#372 
east africa 
#361#372 
1 
1 
0 
0 
0.5 
0.5 
0.25 
1 
0.5 
0 
0 
earth: two 
#227#237 
two million 
#234#245; earth 
#227#232; two million gazelles 
#234#254 
0 
0 
1 
0 
0.25 
1 
0.25 
0.5 
0.125 
0.29 
0.07 
…
Features for Relevance 
•F2: textrazor(l) -> 1 if label l is extracted by TextRazor; 
0 otherwise 
label#offset 
Alternative labels 
F1 
F2 
F3 
F4 
F5 
F6 
F7 
F8 
F9 
F10 
F11 
wildlife 
#912#920 
1 
1 
0 
1 
0.75 
1 
0.75 
1 
0.75 
0 
0 
east africa 
#361#372 
africa 
#366#372 
0 
1 
1 
1 
0.75 
1 
0.75 
0.5 
0.375 
0 
0 
africa 
#366#372 
east africa 
#361#372 
1 
1 
0 
0 
0.5 
0.5 
0.25 
1 
0.5 
0 
0 
earth: two 
#227#237 
two million 
#234#245; earth 
#227#232; two million gazelles 
#234#254 
0 
0 
1 
0 
0.25 
1 
0.25 
0.5 
0.125 
0.29 
0.07 
…
Features for Relevance 
•F3: tagme(l) -> 1 if label l is extracted by TAGME; 
0 otherwise 
label#offset 
Alternative labels 
F1 
F2 
F3 
F4 
F5 
F6 
F7 
F8 
F9 
F10 
F11 
wildlife 
#912#920 
1 
1 
0 
1 
0.75 
1 
0.75 
1 
0.75 
0 
0 
east africa 
#361#372 
africa 
#366#372 
0 
1 
1 
1 
0.75 
1 
0.75 
0.5 
0.375 
0 
0 
africa 
#366#372 
east africa 
#361#372 
1 
1 
0 
0 
0.5 
0.5 
0.25 
1 
0.5 
0 
0 
earth: two 
#227#237 
two million 
#234#245; earth 
#227#232; two million gazelles 
#234#254 
0 
0 
1 
0 
0.25 
1 
0.25 
0.5 
0.125 
0.29 
0.07 
…
Features for Relevance 
•F4: nltk(l) -> 1 if label l is extracted by the NLTK-based method; 
0 otherwise 
label#offset 
Alternative labels 
F1 
F2 
F3 
F4 
F5 
F6 
F7 
F8 
F9 
F10 
F11 
wildlife 
#912#920 
1 
1 
0 
1 
0.75 
1 
0.75 
1 
0.75 
0 
0 
east africa 
#361#372 
africa 
#366#372 
0 
1 
1 
1 
0.75 
1 
0.75 
0.5 
0.375 
0 
0 
africa 
#366#372 
east africa 
#361#372 
1 
1 
0 
0 
0.5 
0.5 
0.25 
1 
0.5 
0 
0 
earth: two 
#227#237 
two million 
#234#245; earth 
#227#232; two million gazelles 
#234#254 
0 
0 
1 
0 
0.25 
1 
0.25 
0.5 
0.125 
0.29 
0.07 
…
Features for Relevance 
•F5: abs(l) = 푛푒푟푑푙+푡푒푥푡푟푎푧표푟푙+푡푎푔푚푒푙+푛푙푡푘푙 |퐸푀| 
Absolute score for l over the set EM of all Extraction Methods (four in this setting) 
label#offset 
Alternative labels 
F1 
F2 
F3 
F4 
F5 
F6 
F7 
F8 
F9 
F10 
F11 
wildlife 
#912#920 
1 
1 
0 
1 
0.75 
1 
0.75 
1 
0.75 
0 
0 
east africa 
#361#372 
africa 
#366#372 
0 
1 
1 
1 
0.75 
1 
0.75 
0.5 
0.375 
0 
0 
africa 
#366#372 
east africa 
#361#372 
1 
1 
0 
0 
0.5 
0.5 
0.25 
1 
0.5 
0 
0 
earth: two 
#227#237 
two million 
#234#245; earth 
#227#232; two million gazelles 
#234#254 
0 
0 
1 
0 
0.25 
1 
0.25 
0.5 
0.125 
0.29 
0.07 
…
Features for Relevance 
label#offset 
Alternative labels 
F1 
F2 
F3 
F4 
F5 
F6 
F7 
F8 
F9 
F10 
F11 
wildlife 
#912#920 
1 
1 
0 
1 
0.75 
1 
0.75 
1 
0.75 
0 
0 
east africa 
#361#372 
africa 
#366#372 
0 
1 
1 
1 
0.75 
1 
0.75 
0.5 
0.375 
0 
0 
africa 
#366#372 
east africa 
#361#372 
1 
1 
0 
0 
0.5 
0.5 
0.25 
1 
0.5 
0 
0 
earth: two 
#227#237 
two million 
#234#245; earth 
#227#232; two million gazelles 
#234#254 
0 
0 
1 
0 
0.25 
1 
0.25 
0.5 
0.125 
0.29 
0.07 
… 
•F6: lss(l) = 푤푐푙 푤푐(푙LS) , where wc is the word count function and lLS is the longest span containing l in the union set of all labels recognized by each extraction methods 
Expresses the span overlapping between l and the longest span containing l, i.e. the portion of l contained in the longest span lLS
Features for Relevance 
•F7: wlss(l) = 푎푏푠푙∗푙푠푠(푙) 
Longest span score for l, weighted by the absolute score for l 
label#offset 
Alternative labels 
F1 
F2 
F3 
F4 
F5 
F6 
F7 
F8 
F9 
F10 
F11 
wildlife 
#912#920 
1 
1 
0 
1 
0.75 
1 
0.75 
1 
0.75 
0 
0 
east africa 
#361#372 
africa 
#366#372 
0 
1 
1 
1 
0.75 
1 
0.75 
0.5 
0.375 
0 
0 
africa 
#366#372 
east africa 
#361#372 
1 
1 
0 
0 
0.5 
0.5 
0.25 
1 
0.5 
0 
0 
earth: two 
#227#237 
two million 
#234#245; earth 
#227#232; two million gazelles 
#234#254 
0 
0 
1 
0 
0.25 
1 
0.25 
0.5 
0.125 
0.29 
0.07 
…
Features for Relevance 
•F8: sss(l) = 푤푐푙SS 푤푐(푙) , where lSS is the shortest span contained in l in the union set of all labels recognized by each extraction methods 
Expresses the span overlapping between l and the shortest span contained in l, i.e. the portion of l containing the shortest span lSS 
label#offset 
Alternative labels 
F1 
F2 
F3 
F4 
F5 
F6 
F7 
F8 
F9 
F10 
F11 
wildlife 
#912#920 
1 
1 
0 
1 
0.75 
1 
0.75 
1 
0.75 
0 
0 
east africa 
#361#372 
africa 
#366#372 
0 
1 
1 
1 
0.75 
1 
0.75 
0.5 
0.375 
0 
0 
africa 
#366#372 
east africa 
#361#372 
1 
1 
0 
0 
0.5 
0.5 
0.25 
1 
0.5 
0 
0 
earth: two 
#227#237 
two million 
#234#245; earth 
#227#232; two million gazelles 
#234#254 
0 
0 
1 
0 
0.25 
1 
0.25 
0.5 
0.125 
0.29 
0.07 
…
Features for Relevance 
•F9: wsss(l) = 푎푏푠푙∗푠푠푠(푙) 
Shortest-span score for l, weighted by the absolute score for l 
label#offset 
Alternative labels 
F1 
F2 
F3 
F4 
F5 
F6 
F7 
F8 
F9 
F10 
F11 
wildlife 
#912#920 
1 
1 
0 
1 
0.75 
1 
0.75 
1 
0.75 
0 
0 
east africa 
#361#372 
africa 
#366#372 
0 
1 
1 
1 
0.75 
1 
0.75 
0.5 
0.375 
0 
0 
africa 
#366#372 
east africa 
#361#372 
1 
1 
0 
0 
0.5 
0.5 
0.25 
1 
0.5 
0 
0 
earth: two 
#227#237 
two million 
#234#245; earth 
#227#232; two million gazelles 
#234#254 
0 
0 
1 
0 
0.25 
1 
0.25 
0.5 
0.125 
0.29 
0.07 
…
Features for Relevance 
•F10: oss(l) = |푗 ∩푙| |푗 ∪푙|푗 ∈푂퐿 |푂퐿| , where |OL| is the number of overlapping labels among the alternative ones. 
label#offset 
Alternative labels 
F1 
F2 
F3 
F4 
F5 
F6 
F7 
F8 
F9 
F10 
F11 
wildlife 
#912#920 
1 
1 
0 
1 
0.75 
1 
0.75 
1 
0.75 
0 
0 
east africa 
#361#372 
africa 
#366#372 
0 
1 
1 
1 
0.75 
1 
0.75 
0.5 
0.375 
0 
0 
africa 
#366#372 
east africa 
#361#372 
1 
1 
0 
0 
0.5 
0.5 
0.25 
1 
0.5 
0 
0 
earth: two 
#227#237 
two million 
#234#245; earth 
#227#232; two million gazelles 
#234#254 
0 
0 
1 
0 
0.25 
1 
0.25 
0.5 
0.125 
0.29 
0.07 
…
Features for Relevance 
•F11: woss(l) = 표푠푠푙∗푎푏푠(푙) 
label#offset 
Alternative labels 
F1 
F2 
F3 
F4 
F5 
F6 
F7 
F8 
F9 
F10 
F11 
wildlife 
#912#920 
1 
1 
0 
1 
0.75 
1 
0.75 
1 
0.75 
0 
0 
east africa 
#361#372 
africa 
#366#372 
0 
1 
1 
1 
0.75 
1 
0.75 
0.5 
0.375 
0 
0 
africa 
#366#372 
east africa 
#361#372 
1 
1 
0 
0 
0.5 
0.5 
0.25 
1 
0.5 
0 
0 
earth: two 
#227#237 
two million 
#234#245; earth 
#227#232; two million gazelles 
#234#254 
0 
0 
1 
0 
0.25 
1 
0.25 
0.5 
0.125 
0.29 
0.07 
…
Features for Relevance with type 
Label#offset 
type 
Alternative label 
F1 
… 
wildlife 
#912#920 
Thing 
1 
east africa 
#361#372 
Thing 
africa 
#366#372 
[Place,Continent] 
0 
Country 
0 
africa 
#366#372 
Place 
east africa 
#361#372 
[Thing,Country] 
1 
Continent 
1 
…
Features for Relevance with type 
•F1: nerd(l,t) -> 1 if label l with type t is extracted by NERD; 0 otherwise 
•F2: textrazor(l,t) -> 1 if label l with type t is extracted by TextRazor; 0 otherwise 
•F3: tagme(l,t) -> 1 if label l with type t is extracted by TAGME; 0 otherwise 
•F4: nltk(l,t) -> 1 if label l with type t is extracted by the NLTK-based method; 0 otherwise
Features for Relevance with type 
Label#offset 
type 
Alternative label 
F1 
F2 
F3 
F4 
F5a 
F5b 
F6 
… 
east africa 
#361#372 
Thing 
africa 
#366#372 
[Place,Continent] 
0 
1 
0 
0 
0.25 
0.33 
0.5 
Country 
0 
0 
1 
1 
0.5 
0.67 
0.5 
•F5a: abs(l,t) = 푛푒푟푑푙,푡+푡푒푥푡푟푎푧표푟푙,푡+푡푎푔푚푒푙,푡+푛푙푡푘푙,푡 |퐸푀| 
Absolute score for l with type t over the set EM of all Extraction Methods (four in this setting) 
•F5b: rel(l,t) = 푎푏푠푙,푡 푎푏푠푙 
Relative score for label l with type t over the total number of extraction methods recognizing l
Features for Relevance with type 
•F6: lss(l,t) = 푙푠푠(푙) 푛_푐푎푡(푙) , where n_cat is the number of different types associated with l. 
Expresses the span overlapping between l and the longest span containing l, weighted by the number of different types associated with the same label l. 
Label#offset 
type 
Alternative label 
F1 
F2 
F3 
F4 
F5a 
F5b 
F6 
… 
east africa 
#361#372 
Thing 
africa 
#366#372 
[Place,Continent] 
0 
1 
0 
0 
0.25 
0.33 
0.5 
Country 
0 
0 
1 
1 
0.5 
0.67 
0.5
Features for Relevance with type 
•F7a: wlss(l,t) = 푎푏푠푙,푡∗푙푠푠(푙,푡) 
Longest span score for l with type t, weighted by the absolute score for label l and type t 
•F7b: wrlss(l,t) = 푟푒푙푙,푡∗푙푠푠(푙,푡) 
Longest span score for l with type t, weighted by the relative score for label l and type t 
Label#offset 
type 
Alternative label 
F1 
F2 
F3 
F4 
F5a 
F5b 
F6 
… 
east africa 
#361#372 
Thing 
africa 
#366#372 
[Place,Continent] 
0 
1 
0 
0 
0.25 
0.33 
0.5 
Country 
0 
0 
1 
1 
0.5 
0.67 
0.5
Features for Relevance with type 
•F8: sss(l,t) = 푠푠푠(푙) 푛_푐푎푡(푙) 
•F9a: wsss(l,t) = 푎푏푠푙,푡∗푠푠푠(푙,푡) 
•F9b: wrsss(l,t) = 푟푒푙푙,푡∗푠푠푠(푙,푡) 
Label#offset 
type 
Alternative label 
F1 
F2 
F3 
F4 
F5a 
F5b 
F6 
… 
east africa #361#372 
Thing 
africa 
#366#372 
[Place,Continent] 
0 
1 
0 
0 
0.25 
0.33 
0.5 
Country 
0 
0 
1 
1 
0.5 
0.67 
0.5
Features for Relevance with type 
•F10: oss(l,t) = 표푠푠(푙) 푛_푐푎푡(푙) 
•F11a: woss(l,t) = 푎푏푠푙,푡∗표푠푠(푙,푡) 
•F11b: wross(l,t) = 푟푒푙푙,푡∗표푠푠(푙,푡) 
Label#offset 
type 
Alternative label 
F1 
F2 
F3 
F4 
F5a 
F5b 
F6 
… 
east africa 
#361#372 
Thing 
africa 
#366#372 
[Place,Continent] 
0 
1 
0 
0 
0.25 
0.33 
0.5 
Country 
0 
0 
1 
1 
0.5 
0.67 
0.5
Features for Relevance with type 
Label#offset 
type 
Alternative label 
F1 
F2 
F3 
F4 
… 
F12 
F13 
… 
east africa #361#372 
Thing 
africa 
#366#372 
[Place,Continent] 
0 
1 
0 
0 
1 
0.375 
Country 
0 
0 
1 
1 
0.5 
0.17 
•F12: hss(l,t) = |푖푛푇푟푒푒퐴퐿푙,푡| |퐴퐿| , where |inTreeAL(l,t)| is the number of Alternative Labels in the set AL with type in a sub(super)-sumption relation with t 
•F13: whss(l,t) = 1 푑푡푙,푡푗+1/|퐴퐿|푗 ∈푖푛푇푟푒푒퐴퐿, where 
|d(tl, tj)| is the distance between class tl and tj in the ontology
•Disagreement on the extractors corner (i.e., tools that more sistematically disagree with every other tool) could reveal: 
- bad quality tools (in recognizing specific set of labels/types) 
- specialized tools able to recognized particular entities better than all the other tools 
Disagreement metrics evaluation 
on the extractors corner2 
Disagreement for relevance: 
Humans VS Machine Annotation 
2G. Soberon, L. Aroyo, C. Welty, O. Inel, H. Lin, M. Overmeen, Measuring Crowd Truth: Disagreement Metrics Combined with Worker Behavior Filters, Proc. of CrowdSem2013 Workshop, ISWC2013.
Features for Relevance 
Label#offset 
Alternative labels 
F1 
F2 
F3 
F4 
F5 
F6 
F7 
F8 
F9 
F10 
F11 
wildlife 
#912#920 
1 
1 
0 
1 
0.75 
1 
0.75 
1 
0.75 
0 
0 
east africa 
#361#372 
africa 
#366#372 
0 
1 
1 
1 
0.75 
1 
0.75 
0.5 
0.375 
0 
0 
africa 
#366#372 
east africa 
#361#372 
1 
1 
0 
0 
0.5 
0.5 
0.25 
1 
0.5 
0 
0 
earth: two 
#227#237 
two million 
#234#245; earth 
#227#232; two million gazelles 
#234#254 
0 
0 
1 
0 
0.25 
1 
0.25 
0.5 
0.125 
0.29 
0.07 
… 
•DISTRIBUTED AGREEMENT 
•UNIQUE INFORMATION
Features for Relevance 
Label#offset 
Alternative labels 
F1 
F2 
F3 
F4 
F5 
F6 
F7 
F8 
F9 
F10 
F11 
wildlife 
#912#920 
1 
1 
0 
1 
0.75 
1 
0.75 
1 
0.75 
0 
0 
east africa 
#361#372 
africa 
#366#372 
0 
1 
1 
1 
0.75 
1 
0.75 
0.5 
0.375 
0 
0 
africa 
#366#372 
east africa 
#361#372 
1 
1 
0 
0 
0.5 
0.5 
0.25 
1 
0.5 
0 
0 
earth: two 
#227#237 
two million 
#234#245; earth 
#227#232; two million gazelles 
#234#254 
0 
0 
1 
0 
0.25 
1 
0.25 
0.5 
0.125 
0.29 
0.07 
… 
•ela(ei, ej, l) = 풆풊풍∗풆풋풍 |푳(풆풊,풑)| , where 푖≠푗. 푒푖푙 is the corresponding extractor score (F1-4) and 푳풆풊,풑 the number of labels recognized by extractor i in program p (the extractor-label agreement operator is not commutative)
Features for Relevance 
Label#offset 
Alternative labels 
F1 
F2 
F3 
F4 
F5 
F6 
F7 
F8 
F9 
F10 
F11 
wildlife 
#912#920 
1 
1 
0 
1 
0.75 
1 
0.75 
1 
0.75 
0 
0 
east africa 
#361#372 
africa 
#366#372 
0 
1 
1 
1 
0.75 
1 
0.75 
0.5 
0.375 
0 
0 
africa 
#366#372 
east africa 
#361#372 
1 
1 
0 
0 
0.5 
0.5 
0.25 
1 
0.5 
0 
0 
earth: two 
#227#237 
two million 
#234#245; earth 
#227#232; two million gazelles 
#234#254 
0 
0 
1 
0 
0.25 
1 
0.25 
0.5 
0.125 
0.29 
0.07 
… 
•avg_ela(ei, l) = 풆풍풂(풊≠풋풆풊,풆풋,풍) |푬푴| 
Average extractor-label agreement over the set of extraction methods
Features for Relevance 
Label#offset 
Alternative labels 
F1 
F2 
F3 
F4 
F5 
F6 
F7 
F8 
F9 
F10 
F11 
wildlife 
#912#920 
1 
1 
0 
1 
0.75 
1 
0.75 
1 
0.75 
0 
0 
east africa 
#361#372 
africa 
#366#372 
0 
1 
1 
1 
0.75 
1 
0.75 
0.5 
0.375 
0 
0 
africa 
#366#372 
east africa 
#361#372 
1 
1 
0 
0 
0.5 
0.5 
0.25 
1 
0.5 
0 
0 
earth: two 
#227#237 
two million 
#234#245; earth 
#227#232; two million gazelles 
#234#254 
0 
0 
1 
0 
0.25 
1 
0.25 
0.5 
0.125 
0.29 
0.07 
… 
•Both extractor-label agreement and the consequent average are evaluated also with reference to the pairs (label,type)
Other possible relevance features 
•TF-IDF (with type) 
Shall the corpus for idf contain more episodes of the same TV-series? 
Labels referring to characters mentioned in many episodes of the same TV series will gain a higher tf but lower idf score -> consider metadata 
Animated adventures of Pingu, the clumsy young penguin. Pingu helps his neighbour and is rewarded. Pingu's friend tries to get a reward too, but the neighbour refuses. They decide to play a trick on the neighbour, but it all ends with an innocent passer-by becoming the victim of their prank. 
BBC Program: Pingu's Trick (Pid: b0077x84)
•Enhance metadata (words in title and subject) 
Labels lemmatization (WordNetLemmatizer) 
Dani is understudying the part of a witch in Macbeth: The Musical, which means Jack and Sam get the job of ensuring little brother Max does not cause chaos. Dani's most loyal viewers, the aliens, have got bored of never getting to meet their heroine and her pals, and have decided to teleport down to Earth, where they soon find themselves embroiled in Max's scheme to win the 10,000 pound reward from the UFO Society. 
BBC Program: Alien Invasion (Pid: b00ph91v) 
Other possible relevance features
State of work 
•Dataset: 52 BBC programs 
•Realized: 
- Span and Type Alignment 
- Relevance scores for labels 
•To do: 
–Computation of relevance score for pairs (label,type) 
–Crowdsourcing tasks 
–Connecting relevance/relevance-with-type outputs 
–Evaluation of results (precision, recall, complementarity, …)
Does the method deal with complementarity? 
http://dbpedia.org/resource/Gazelle 
PID: b0074t2b 
Title: Great plains 
Synopsis 
‘’The great plains are the vast open spaces of our planet. These immense wilderness areas are seemingly empty. But any feeling of emptiness is an illusion - the plains of our planet support the greatest gatherings of wildlife on earth: two million gazelles on the Mongolian steppes, three million caribou in North America and one and a half million wildebeest in East Africa. […]‘’ 
Label 
two million gazelles#234#254 
Types 
Amount;Mammal;Single 
Extractors 
wikimeta(nerd);textrazor;tagme; 
http://dbpedia.org/resource/Two_in_a_Million/You're_My_Number_One 
COMPLEMENTARITY!! 
(Amount of Mammal)

Contenu connexe

Similaire à What is the relevant information in a text?

No Name Woman Essay. Read No Name Woman Analysis Essay Sample for Free at Sup...
No Name Woman Essay. Read No Name Woman Analysis Essay Sample for Free at Sup...No Name Woman Essay. Read No Name Woman Analysis Essay Sample for Free at Sup...
No Name Woman Essay. Read No Name Woman Analysis Essay Sample for Free at Sup...Theresa Moreno
 
Essay Intro Paragraph.pdf
Essay Intro Paragraph.pdfEssay Intro Paragraph.pdf
Essay Intro Paragraph.pdfJill Johnson
 
QFI Ender Bender 2014 - IR
QFI Ender Bender 2014 - IRQFI Ender Bender 2014 - IR
QFI Ender Bender 2014 - IRSrinath Bhashyam
 
Craig brandist the bakhtin circle- philosophy,culture and politics [pluto pr...
Craig brandist  the bakhtin circle- philosophy,culture and politics [pluto pr...Craig brandist  the bakhtin circle- philosophy,culture and politics [pluto pr...
Craig brandist the bakhtin circle- philosophy,culture and politics [pluto pr...Fabiola Rodríguez Santoyo
 
Essays Short Stories And One Act Plays
Essays Short Stories And One Act PlaysEssays Short Stories And One Act Plays
Essays Short Stories And One Act PlaysLauren Smith
 
Lisa_wade_portfolio2016 4
Lisa_wade_portfolio2016 4Lisa_wade_portfolio2016 4
Lisa_wade_portfolio2016 4Lisa Wade
 
Contemporary Art Library Resource Guide
Contemporary Art Library Resource GuideContemporary Art Library Resource Guide
Contemporary Art Library Resource GuidePAFA Library
 
Artcasting: reflections on inventive digital evaluation
Artcasting: reflections on inventive digital evaluationArtcasting: reflections on inventive digital evaluation
Artcasting: reflections on inventive digital evaluationjenrossity
 
Hamburg digital geography_Final
Hamburg digital geography_FinalHamburg digital geography_Final
Hamburg digital geography_Finalcultureplex
 
Hamburg digital geography
Hamburg digital geographyHamburg digital geography
Hamburg digital geographycultureplex
 
Example Of A Persuasive Essay On School Uniforms
Example Of A Persuasive Essay On School UniformsExample Of A Persuasive Essay On School Uniforms
Example Of A Persuasive Essay On School UniformsLisa Young
 
Art I Like 11 by Florent Vial
Art I Like 11 by Florent VialArt I Like 11 by Florent Vial
Art I Like 11 by Florent VialFlorent Vial
 
Intro Lecture AmArtASP 10/21 pps
Intro Lecture AmArtASP 10/21 ppsIntro Lecture AmArtASP 10/21 pps
Intro Lecture AmArtASP 10/21 ppsLori Kent
 
Writing effective museum text
Writing effective museum textWriting effective museum text
Writing effective museum textHelen Adams
 
801.a crash course in the 20th century art a guide to understanding and enjoy...
801.a crash course in the 20th century art a guide to understanding and enjoy...801.a crash course in the 20th century art a guide to understanding and enjoy...
801.a crash course in the 20th century art a guide to understanding and enjoy...ivanov1566334322
 

Similaire à What is the relevant information in a text? (20)

Parallel Lines
Parallel LinesParallel Lines
Parallel Lines
 
No Name Woman Essay. Read No Name Woman Analysis Essay Sample for Free at Sup...
No Name Woman Essay. Read No Name Woman Analysis Essay Sample for Free at Sup...No Name Woman Essay. Read No Name Woman Analysis Essay Sample for Free at Sup...
No Name Woman Essay. Read No Name Woman Analysis Essay Sample for Free at Sup...
 
Essay Intro Paragraph.pdf
Essay Intro Paragraph.pdfEssay Intro Paragraph.pdf
Essay Intro Paragraph.pdf
 
QFI Ender Bender 2014 - IR
QFI Ender Bender 2014 - IRQFI Ender Bender 2014 - IR
QFI Ender Bender 2014 - IR
 
Craig brandist the bakhtin circle- philosophy,culture and politics [pluto pr...
Craig brandist  the bakhtin circle- philosophy,culture and politics [pluto pr...Craig brandist  the bakhtin circle- philosophy,culture and politics [pluto pr...
Craig brandist the bakhtin circle- philosophy,culture and politics [pluto pr...
 
Essays Short Stories And One Act Plays
Essays Short Stories And One Act PlaysEssays Short Stories And One Act Plays
Essays Short Stories And One Act Plays
 
Lisa_wade_portfolio2016 4
Lisa_wade_portfolio2016 4Lisa_wade_portfolio2016 4
Lisa_wade_portfolio2016 4
 
Contemporary Art Library Resource Guide
Contemporary Art Library Resource GuideContemporary Art Library Resource Guide
Contemporary Art Library Resource Guide
 
Artcasting: reflections on inventive digital evaluation
Artcasting: reflections on inventive digital evaluationArtcasting: reflections on inventive digital evaluation
Artcasting: reflections on inventive digital evaluation
 
Hamburg digital geography_Final
Hamburg digital geography_FinalHamburg digital geography_Final
Hamburg digital geography_Final
 
Hamburg digital geography
Hamburg digital geographyHamburg digital geography
Hamburg digital geography
 
Example Of A Persuasive Essay On School Uniforms
Example Of A Persuasive Essay On School UniformsExample Of A Persuasive Essay On School Uniforms
Example Of A Persuasive Essay On School Uniforms
 
Art I Like 11 by Florent Vial
Art I Like 11 by Florent VialArt I Like 11 by Florent Vial
Art I Like 11 by Florent Vial
 
Intro Lecture AmArtASP 10/21 pps
Intro Lecture AmArtASP 10/21 ppsIntro Lecture AmArtASP 10/21 pps
Intro Lecture AmArtASP 10/21 pps
 
Writing effective museum text
Writing effective museum textWriting effective museum text
Writing effective museum text
 
801.a crash course in the 20th century art a guide to understanding and enjoy...
801.a crash course in the 20th century art a guide to understanding and enjoy...801.a crash course in the 20th century art a guide to understanding and enjoy...
801.a crash course in the 20th century art a guide to understanding and enjoy...
 
Art History Survey 2009
Art History Survey 2009Art History Survey 2009
Art History Survey 2009
 
Kraftwerk Influence
Kraftwerk InfluenceKraftwerk Influence
Kraftwerk Influence
 
Once Upon a Drop Cap
Once Upon a Drop CapOnce Upon a Drop Cap
Once Upon a Drop Cap
 
Quizapalooza (Finals)
Quizapalooza (Finals)Quizapalooza (Finals)
Quizapalooza (Finals)
 

Dernier

2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Calllward7
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxDilipVasan
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理pyhepag
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdfvyankatesh1
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp onlinebalibahu1313
 
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfGenerative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfEmmanuel Dauda
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictJack Cole
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理cyebo
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfscitechtalktv
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理pyhepag
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfMichaelSenkow
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...ssuserf63bd7
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Jon Hansen
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理cyebo
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeralNABLAS株式会社
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理pyhepag
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyRafigAliyev2
 

Dernier (20)

2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp online
 
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfGenerative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeral
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 

What is the relevant information in a text?

  • 1. What is the relevant information in a text? Silvia Giannini Visiting PhD student Politecnico di Bari Web & Media Group meeting | 27.10.2014
  • 2. The scenario •Entertainment domain: BBC TV-programs (TV-series, movies, documentaries, …) •Aim: Enrich the content description with links to the Web of Data •Applications: Linked Data patterns for recommendations; multi- domain datasets creation, … Following the grandeur of Baroque, Rococo art is often dismissed as frivolous and unserious, but Waldemar Januszczak disagrees. The first episode is about travel in the 18th century and how it impacted greatly on some of the finest art ever made. The world was getting smaller and took on new influences shown in the glorious Bavarian pilgrimage architecture, Canaletto's romantic Venice and the blossoming of exotic designs and tastes all over Europe. The Rococo was art expressing itself in new, exciting ways.
  • 3. How? •Yet another semantic annotation tool? •Peculiarities: - Different formats - Broad coverage of topics Following the grandeur of Baroque, Rococo art is often dismissed as frivolous and unserious, but Waldemar Januszczak disagrees. The first episode is about travel in the 18th century and how it impacted greatly on some of the finest art ever made. The world was getting smaller and took on new influences shown in the glorious Bavarian pilgrimage architecture, Canaletto's romantic Venice and the blossoming of exotic designs and tastes all over Europe. The Rococo was art expressing itself in new, exciting ways.
  • 4. Multiple annotators integration Following the grandeur of Baroque, Rococo art is often dismissed as frivolous and unserious, but Waldemar Januszczak disagrees. The first episode is about travel in the 18th century and how it impacted greatly on some of the finest art ever made. The world was getting smaller and took on new influences shown in the glorious Bavarian pilgrimage architecture, Canaletto's romantic Venice and the blossoming of exotic designs and tastes all over Europe. The Rococo was art expressing itself in new, exciting ways. text enrichment “Canaletto” ontology:Location “Rococo” dbpedia:Rococo_(band) •Type mis-classification •URI mis-annotation •Not relevant labels The NERD framework
  • 5. Multiple annotators integration •Feature-based solution for entity relevance definition and entity classification •Majority vote and disagreement metrics •Extractors can disagree on: –The existence of a label, e.g. some can identify a lable and other can’t –The span of the label, e.g. ‘Myra’ VS ‘Myra Gail’ –The type of the label, e.g. ‘Building’ VS ‘Organization’ –The URI of the label Proposal
  • 6. Multiple annotators integration DEFINE THE RELEVANCE OF A LABEL •Feature-based solution for entity relevance definition and entity classification •Majority vote and disagreement metrics •Extractors can disagree on: –The existence of a label, e.g. some can identify a lable and other can’t –The span of the label, e.g. ‘Myra’ VS ‘Myra Gail’ –The type of the label, e.g. ‘Building’ VS ‘Organization’ –The URI of the label Proposal
  • 7. Multiple annotators integration INFLUENCE THE CONFIDENCE OF AN ANNOTATION •Feature-based solution for entity relevance definition and entity classification •Majority vote and disagreement metrics •Extractors can disagree on: –The existence of a label, e.g. some can identify a lable and other can’t –The span of the label, e.g. ‘Myra’ VS ‘Myra Gail’ –The type of the label, e.g. ‘Building’ VS ‘Organization’ –The URI of the label Proposal
  • 8. Workflow for relevance assessment Following the grandeur of Baroque, Rococo art is often dismissed as frivolous and unserious, but Waldemar Januszczak disagrees. […] The first episode is about travel in the 18th century and how it impacted greatly on some of the finest art ever made. The world was getting smaller and took on new influences shown in the glorious Bavarian pilgrimage architecture, Canaletto's romantic Venice and the blossoming of exotic designs and tastes all over Europe. The Rococo was art expressing itself in new, exciting ways. Relevant labels crowd NLP tools metrics Features matrix extraction Classifier
  • 9. •Disagreement between TextRazor annotations in NERD and standalone TextRazor in terms of missing labels, missing types, granularity of types. PID: b0074t2b Title: Great plains Synopsis ‘’The great plains are the vast open spaces of our planet. […] Close on their heels come an array of plains predators including eagles, wolves and lions. […]‘’ Label eagles#439#445 Extractors Types URI textrazor(nerd) nerd:Thing http://en.wikipedia.org/wiki/Eagle textrazor dbpedia-owl:Bird http://en.wikipedia.org/wiki/Eagle Workflow for relevance assessment
  • 10. Pre-processing •Alignment of extractors’ results: -Label: each label has a list of alternative labels contained in or overlapping with the given one -Type: same vocabulary for all extraction methods (529 classes of the Dbpedia ontology, extended with owl:Thing and Amount type) -URI: Dbpedia resources •Label •NERD ontology class •sameAs link •Label •DBpedia ontology class •Wikipedia page •Label •DBpedia category •Wikipedia page •Label •DBpedia ontology class •DBpedia URI
  • 11. Majority-vote for relevance: longest-span strategy* extractor label startOffset endOffset Aligned label Rococo 35 41 Rococo art#35#45 Rococo art 35 45 Rococo art#35#45 Rococo Art 35 42 41 45 Rococo art#35#45 Rococo art#35#45 Rococo Art 35 42 41 45 Rococo art#35#45 Rococo art#35#45 •Label •NERD ontology class •sameAs link •Label •DBpedia ontology class •Wikipedia page •Label •DBpedia category •Wikipedia page •Label •DBpedia ontology class •DBpedia URI Label & span alignment The LONGEST-SPAN strategy *Analogously, the shortest-span strategy can be applied
  • 12. Issues •In the previous example, Rococo and Art are related to the same category (Arts). Thus, the longest-span strategy for labels alignment will lead to a consistent conceptual category for the new label (Rococo Art). •Consider this program description: A journey back to the 1950s for a look at the wildest pop music of all time in a film that tells the stories of Bill Haley, Elvis Presley, Little Richard, Chuck Berry, Jerry Lee Lewis and Buddy Holly, giants from an era when pop music really was mad, bad and dangerous to know.The programme features the artists themselves, alongside people like Bill Haley's original Comets, the Crickets, Buddy Holly's widow Maria Elena, Jerry Lee Lewis's former wife Myra Gail and his sister, Chuck Berry's son and many more, including June Juanico, Elvis' first serious girlfriend.Other contributors include Tom Jones, Jamie Callum, Paul McCartney, Cliff Richard, Joe Brown, Marty Wilde, Green Day, Minnie Driver, Jack White, the Mavericks, Jools Holland, Hank Marvin, Fontella Bass, John Waters and more.Elvis's pelvis was just the start. Who had to change the lyrics to their biggest hit because the originals were too obscene? Who married their 13-year-old cousin? Who used lard to get their hair just right? And what happened on the day the music died? BBC Program: Kings of Rock and Roll (Pid: b007c95q)
  • 13. Issues •In the previous example, Rococo and Art refer to the same conceptual category. Thus, the longest-span strategy for labels alignment will lead to a consistent conceptual category for the new label (Rococo Art). •Consider this program description: BBC Program: Kings of Rock and Roll (Pid: b007c95q) extractor label startOffset endOffset Type Aligned label Myra Gail 453 462 Person Myra Gail#453#462 Myra Myra Gail 453 453 457 462 Settlement Person Myra Gail#453#462 Myra Gail#453#462 Myra Gail 453 458 457 462 Band,Artist Person Myra Gail#453#462 Myra Gail#453#462 Myra Gail 453 462 Thing Myra Gail#453#462
  • 14. The HYBRID-SPAN strategy1 Given two labels l1 and l2 and an upper ontology O, l1 and l2 belong to the same annotation span if: 1. l1 is contained in l2 or l2 is contained in l1 and type(l1) and type(l2) are in super(sub)class relationship (e.g. Royal Academy[Organization] in Royal Academy of Music[University]) OR 2. l1 and l2 are overlapping but neither l1 is contained in l2 nor l2 is contained in l1 (e.g., Royal Academy[Organization] and Academy of Music[Building]) OR 3. l1 coincides with l2 (e.g., Royal Academy[Organization] and Royal Academy[Museum]) What about Thing type? 1Chen, L., Ortona, S., Orsi, G., & Benedikt, M. (2013). Aggregating Semantic Annotators. Proceedings of the VLDB Endowment, Vol. 6, No. 13, (p. 1486-1497). Riva del Garda, Trento, Italy.
  • 15. •Label •NERD ontology class •sameAs link •Label •DBpedia ontology class •Wikipedia page •Label •DBpedia category •Wikipedia page •Label •DBpedia ontology class •DBpedia URI Label & span alignment The HYBRID-SPAN strategy* extractor label startOffset endOffset Type Aligned label Myra Gail 453 462 Person Myra Gail#453#462 Myra Myra Gail 453 453 457 462 Settlement Person Myra#453#457 Myra Gail#453#462 Myra Gail 453 458 457 462 Band,Artist Person Myra#453#457 Myra Gail#453#462 Myra Gail 453 462 Thing Myra Gail#453#462 *The vocabulary alignment is required as previous step Majority-vote for relevance: hybrid-span strategy
  • 16. Features for Relevance •F1: nerd(l) -> 1 if label l is extracted by NERD; 0 otherwise label#offset Alternative labels F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 wildlife #912#920 1 1 0 1 0.75 1 0.75 1 0.75 0 0 east africa #361#372 africa #366#372 0 1 1 1 0.75 1 0.75 0.5 0.375 0 0 africa #366#372 east africa #361#372 1 1 0 0 0.5 0.5 0.25 1 0.5 0 0 earth: two #227#237 two million #234#245; earth #227#232; two million gazelles #234#254 0 0 1 0 0.25 1 0.25 0.5 0.125 0.29 0.07 …
  • 17. Features for Relevance •F2: textrazor(l) -> 1 if label l is extracted by TextRazor; 0 otherwise label#offset Alternative labels F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 wildlife #912#920 1 1 0 1 0.75 1 0.75 1 0.75 0 0 east africa #361#372 africa #366#372 0 1 1 1 0.75 1 0.75 0.5 0.375 0 0 africa #366#372 east africa #361#372 1 1 0 0 0.5 0.5 0.25 1 0.5 0 0 earth: two #227#237 two million #234#245; earth #227#232; two million gazelles #234#254 0 0 1 0 0.25 1 0.25 0.5 0.125 0.29 0.07 …
  • 18. Features for Relevance •F3: tagme(l) -> 1 if label l is extracted by TAGME; 0 otherwise label#offset Alternative labels F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 wildlife #912#920 1 1 0 1 0.75 1 0.75 1 0.75 0 0 east africa #361#372 africa #366#372 0 1 1 1 0.75 1 0.75 0.5 0.375 0 0 africa #366#372 east africa #361#372 1 1 0 0 0.5 0.5 0.25 1 0.5 0 0 earth: two #227#237 two million #234#245; earth #227#232; two million gazelles #234#254 0 0 1 0 0.25 1 0.25 0.5 0.125 0.29 0.07 …
  • 19. Features for Relevance •F4: nltk(l) -> 1 if label l is extracted by the NLTK-based method; 0 otherwise label#offset Alternative labels F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 wildlife #912#920 1 1 0 1 0.75 1 0.75 1 0.75 0 0 east africa #361#372 africa #366#372 0 1 1 1 0.75 1 0.75 0.5 0.375 0 0 africa #366#372 east africa #361#372 1 1 0 0 0.5 0.5 0.25 1 0.5 0 0 earth: two #227#237 two million #234#245; earth #227#232; two million gazelles #234#254 0 0 1 0 0.25 1 0.25 0.5 0.125 0.29 0.07 …
  • 20. Features for Relevance •F5: abs(l) = 푛푒푟푑푙+푡푒푥푡푟푎푧표푟푙+푡푎푔푚푒푙+푛푙푡푘푙 |퐸푀| Absolute score for l over the set EM of all Extraction Methods (four in this setting) label#offset Alternative labels F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 wildlife #912#920 1 1 0 1 0.75 1 0.75 1 0.75 0 0 east africa #361#372 africa #366#372 0 1 1 1 0.75 1 0.75 0.5 0.375 0 0 africa #366#372 east africa #361#372 1 1 0 0 0.5 0.5 0.25 1 0.5 0 0 earth: two #227#237 two million #234#245; earth #227#232; two million gazelles #234#254 0 0 1 0 0.25 1 0.25 0.5 0.125 0.29 0.07 …
  • 21. Features for Relevance label#offset Alternative labels F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 wildlife #912#920 1 1 0 1 0.75 1 0.75 1 0.75 0 0 east africa #361#372 africa #366#372 0 1 1 1 0.75 1 0.75 0.5 0.375 0 0 africa #366#372 east africa #361#372 1 1 0 0 0.5 0.5 0.25 1 0.5 0 0 earth: two #227#237 two million #234#245; earth #227#232; two million gazelles #234#254 0 0 1 0 0.25 1 0.25 0.5 0.125 0.29 0.07 … •F6: lss(l) = 푤푐푙 푤푐(푙LS) , where wc is the word count function and lLS is the longest span containing l in the union set of all labels recognized by each extraction methods Expresses the span overlapping between l and the longest span containing l, i.e. the portion of l contained in the longest span lLS
  • 22. Features for Relevance •F7: wlss(l) = 푎푏푠푙∗푙푠푠(푙) Longest span score for l, weighted by the absolute score for l label#offset Alternative labels F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 wildlife #912#920 1 1 0 1 0.75 1 0.75 1 0.75 0 0 east africa #361#372 africa #366#372 0 1 1 1 0.75 1 0.75 0.5 0.375 0 0 africa #366#372 east africa #361#372 1 1 0 0 0.5 0.5 0.25 1 0.5 0 0 earth: two #227#237 two million #234#245; earth #227#232; two million gazelles #234#254 0 0 1 0 0.25 1 0.25 0.5 0.125 0.29 0.07 …
  • 23. Features for Relevance •F8: sss(l) = 푤푐푙SS 푤푐(푙) , where lSS is the shortest span contained in l in the union set of all labels recognized by each extraction methods Expresses the span overlapping between l and the shortest span contained in l, i.e. the portion of l containing the shortest span lSS label#offset Alternative labels F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 wildlife #912#920 1 1 0 1 0.75 1 0.75 1 0.75 0 0 east africa #361#372 africa #366#372 0 1 1 1 0.75 1 0.75 0.5 0.375 0 0 africa #366#372 east africa #361#372 1 1 0 0 0.5 0.5 0.25 1 0.5 0 0 earth: two #227#237 two million #234#245; earth #227#232; two million gazelles #234#254 0 0 1 0 0.25 1 0.25 0.5 0.125 0.29 0.07 …
  • 24. Features for Relevance •F9: wsss(l) = 푎푏푠푙∗푠푠푠(푙) Shortest-span score for l, weighted by the absolute score for l label#offset Alternative labels F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 wildlife #912#920 1 1 0 1 0.75 1 0.75 1 0.75 0 0 east africa #361#372 africa #366#372 0 1 1 1 0.75 1 0.75 0.5 0.375 0 0 africa #366#372 east africa #361#372 1 1 0 0 0.5 0.5 0.25 1 0.5 0 0 earth: two #227#237 two million #234#245; earth #227#232; two million gazelles #234#254 0 0 1 0 0.25 1 0.25 0.5 0.125 0.29 0.07 …
  • 25. Features for Relevance •F10: oss(l) = |푗 ∩푙| |푗 ∪푙|푗 ∈푂퐿 |푂퐿| , where |OL| is the number of overlapping labels among the alternative ones. label#offset Alternative labels F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 wildlife #912#920 1 1 0 1 0.75 1 0.75 1 0.75 0 0 east africa #361#372 africa #366#372 0 1 1 1 0.75 1 0.75 0.5 0.375 0 0 africa #366#372 east africa #361#372 1 1 0 0 0.5 0.5 0.25 1 0.5 0 0 earth: two #227#237 two million #234#245; earth #227#232; two million gazelles #234#254 0 0 1 0 0.25 1 0.25 0.5 0.125 0.29 0.07 …
  • 26. Features for Relevance •F11: woss(l) = 표푠푠푙∗푎푏푠(푙) label#offset Alternative labels F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 wildlife #912#920 1 1 0 1 0.75 1 0.75 1 0.75 0 0 east africa #361#372 africa #366#372 0 1 1 1 0.75 1 0.75 0.5 0.375 0 0 africa #366#372 east africa #361#372 1 1 0 0 0.5 0.5 0.25 1 0.5 0 0 earth: two #227#237 two million #234#245; earth #227#232; two million gazelles #234#254 0 0 1 0 0.25 1 0.25 0.5 0.125 0.29 0.07 …
  • 27. Features for Relevance with type Label#offset type Alternative label F1 … wildlife #912#920 Thing 1 east africa #361#372 Thing africa #366#372 [Place,Continent] 0 Country 0 africa #366#372 Place east africa #361#372 [Thing,Country] 1 Continent 1 …
  • 28. Features for Relevance with type •F1: nerd(l,t) -> 1 if label l with type t is extracted by NERD; 0 otherwise •F2: textrazor(l,t) -> 1 if label l with type t is extracted by TextRazor; 0 otherwise •F3: tagme(l,t) -> 1 if label l with type t is extracted by TAGME; 0 otherwise •F4: nltk(l,t) -> 1 if label l with type t is extracted by the NLTK-based method; 0 otherwise
  • 29. Features for Relevance with type Label#offset type Alternative label F1 F2 F3 F4 F5a F5b F6 … east africa #361#372 Thing africa #366#372 [Place,Continent] 0 1 0 0 0.25 0.33 0.5 Country 0 0 1 1 0.5 0.67 0.5 •F5a: abs(l,t) = 푛푒푟푑푙,푡+푡푒푥푡푟푎푧표푟푙,푡+푡푎푔푚푒푙,푡+푛푙푡푘푙,푡 |퐸푀| Absolute score for l with type t over the set EM of all Extraction Methods (four in this setting) •F5b: rel(l,t) = 푎푏푠푙,푡 푎푏푠푙 Relative score for label l with type t over the total number of extraction methods recognizing l
  • 30. Features for Relevance with type •F6: lss(l,t) = 푙푠푠(푙) 푛_푐푎푡(푙) , where n_cat is the number of different types associated with l. Expresses the span overlapping between l and the longest span containing l, weighted by the number of different types associated with the same label l. Label#offset type Alternative label F1 F2 F3 F4 F5a F5b F6 … east africa #361#372 Thing africa #366#372 [Place,Continent] 0 1 0 0 0.25 0.33 0.5 Country 0 0 1 1 0.5 0.67 0.5
  • 31. Features for Relevance with type •F7a: wlss(l,t) = 푎푏푠푙,푡∗푙푠푠(푙,푡) Longest span score for l with type t, weighted by the absolute score for label l and type t •F7b: wrlss(l,t) = 푟푒푙푙,푡∗푙푠푠(푙,푡) Longest span score for l with type t, weighted by the relative score for label l and type t Label#offset type Alternative label F1 F2 F3 F4 F5a F5b F6 … east africa #361#372 Thing africa #366#372 [Place,Continent] 0 1 0 0 0.25 0.33 0.5 Country 0 0 1 1 0.5 0.67 0.5
  • 32. Features for Relevance with type •F8: sss(l,t) = 푠푠푠(푙) 푛_푐푎푡(푙) •F9a: wsss(l,t) = 푎푏푠푙,푡∗푠푠푠(푙,푡) •F9b: wrsss(l,t) = 푟푒푙푙,푡∗푠푠푠(푙,푡) Label#offset type Alternative label F1 F2 F3 F4 F5a F5b F6 … east africa #361#372 Thing africa #366#372 [Place,Continent] 0 1 0 0 0.25 0.33 0.5 Country 0 0 1 1 0.5 0.67 0.5
  • 33. Features for Relevance with type •F10: oss(l,t) = 표푠푠(푙) 푛_푐푎푡(푙) •F11a: woss(l,t) = 푎푏푠푙,푡∗표푠푠(푙,푡) •F11b: wross(l,t) = 푟푒푙푙,푡∗표푠푠(푙,푡) Label#offset type Alternative label F1 F2 F3 F4 F5a F5b F6 … east africa #361#372 Thing africa #366#372 [Place,Continent] 0 1 0 0 0.25 0.33 0.5 Country 0 0 1 1 0.5 0.67 0.5
  • 34. Features for Relevance with type Label#offset type Alternative label F1 F2 F3 F4 … F12 F13 … east africa #361#372 Thing africa #366#372 [Place,Continent] 0 1 0 0 1 0.375 Country 0 0 1 1 0.5 0.17 •F12: hss(l,t) = |푖푛푇푟푒푒퐴퐿푙,푡| |퐴퐿| , where |inTreeAL(l,t)| is the number of Alternative Labels in the set AL with type in a sub(super)-sumption relation with t •F13: whss(l,t) = 1 푑푡푙,푡푗+1/|퐴퐿|푗 ∈푖푛푇푟푒푒퐴퐿, where |d(tl, tj)| is the distance between class tl and tj in the ontology
  • 35. •Disagreement on the extractors corner (i.e., tools that more sistematically disagree with every other tool) could reveal: - bad quality tools (in recognizing specific set of labels/types) - specialized tools able to recognized particular entities better than all the other tools Disagreement metrics evaluation on the extractors corner2 Disagreement for relevance: Humans VS Machine Annotation 2G. Soberon, L. Aroyo, C. Welty, O. Inel, H. Lin, M. Overmeen, Measuring Crowd Truth: Disagreement Metrics Combined with Worker Behavior Filters, Proc. of CrowdSem2013 Workshop, ISWC2013.
  • 36. Features for Relevance Label#offset Alternative labels F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 wildlife #912#920 1 1 0 1 0.75 1 0.75 1 0.75 0 0 east africa #361#372 africa #366#372 0 1 1 1 0.75 1 0.75 0.5 0.375 0 0 africa #366#372 east africa #361#372 1 1 0 0 0.5 0.5 0.25 1 0.5 0 0 earth: two #227#237 two million #234#245; earth #227#232; two million gazelles #234#254 0 0 1 0 0.25 1 0.25 0.5 0.125 0.29 0.07 … •DISTRIBUTED AGREEMENT •UNIQUE INFORMATION
  • 37. Features for Relevance Label#offset Alternative labels F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 wildlife #912#920 1 1 0 1 0.75 1 0.75 1 0.75 0 0 east africa #361#372 africa #366#372 0 1 1 1 0.75 1 0.75 0.5 0.375 0 0 africa #366#372 east africa #361#372 1 1 0 0 0.5 0.5 0.25 1 0.5 0 0 earth: two #227#237 two million #234#245; earth #227#232; two million gazelles #234#254 0 0 1 0 0.25 1 0.25 0.5 0.125 0.29 0.07 … •ela(ei, ej, l) = 풆풊풍∗풆풋풍 |푳(풆풊,풑)| , where 푖≠푗. 푒푖푙 is the corresponding extractor score (F1-4) and 푳풆풊,풑 the number of labels recognized by extractor i in program p (the extractor-label agreement operator is not commutative)
  • 38. Features for Relevance Label#offset Alternative labels F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 wildlife #912#920 1 1 0 1 0.75 1 0.75 1 0.75 0 0 east africa #361#372 africa #366#372 0 1 1 1 0.75 1 0.75 0.5 0.375 0 0 africa #366#372 east africa #361#372 1 1 0 0 0.5 0.5 0.25 1 0.5 0 0 earth: two #227#237 two million #234#245; earth #227#232; two million gazelles #234#254 0 0 1 0 0.25 1 0.25 0.5 0.125 0.29 0.07 … •avg_ela(ei, l) = 풆풍풂(풊≠풋풆풊,풆풋,풍) |푬푴| Average extractor-label agreement over the set of extraction methods
  • 39. Features for Relevance Label#offset Alternative labels F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 wildlife #912#920 1 1 0 1 0.75 1 0.75 1 0.75 0 0 east africa #361#372 africa #366#372 0 1 1 1 0.75 1 0.75 0.5 0.375 0 0 africa #366#372 east africa #361#372 1 1 0 0 0.5 0.5 0.25 1 0.5 0 0 earth: two #227#237 two million #234#245; earth #227#232; two million gazelles #234#254 0 0 1 0 0.25 1 0.25 0.5 0.125 0.29 0.07 … •Both extractor-label agreement and the consequent average are evaluated also with reference to the pairs (label,type)
  • 40. Other possible relevance features •TF-IDF (with type) Shall the corpus for idf contain more episodes of the same TV-series? Labels referring to characters mentioned in many episodes of the same TV series will gain a higher tf but lower idf score -> consider metadata Animated adventures of Pingu, the clumsy young penguin. Pingu helps his neighbour and is rewarded. Pingu's friend tries to get a reward too, but the neighbour refuses. They decide to play a trick on the neighbour, but it all ends with an innocent passer-by becoming the victim of their prank. BBC Program: Pingu's Trick (Pid: b0077x84)
  • 41. •Enhance metadata (words in title and subject) Labels lemmatization (WordNetLemmatizer) Dani is understudying the part of a witch in Macbeth: The Musical, which means Jack and Sam get the job of ensuring little brother Max does not cause chaos. Dani's most loyal viewers, the aliens, have got bored of never getting to meet their heroine and her pals, and have decided to teleport down to Earth, where they soon find themselves embroiled in Max's scheme to win the 10,000 pound reward from the UFO Society. BBC Program: Alien Invasion (Pid: b00ph91v) Other possible relevance features
  • 42. State of work •Dataset: 52 BBC programs •Realized: - Span and Type Alignment - Relevance scores for labels •To do: –Computation of relevance score for pairs (label,type) –Crowdsourcing tasks –Connecting relevance/relevance-with-type outputs –Evaluation of results (precision, recall, complementarity, …)
  • 43. Does the method deal with complementarity? http://dbpedia.org/resource/Gazelle PID: b0074t2b Title: Great plains Synopsis ‘’The great plains are the vast open spaces of our planet. These immense wilderness areas are seemingly empty. But any feeling of emptiness is an illusion - the plains of our planet support the greatest gatherings of wildlife on earth: two million gazelles on the Mongolian steppes, three million caribou in North America and one and a half million wildebeest in East Africa. […]‘’ Label two million gazelles#234#254 Types Amount;Mammal;Single Extractors wikimeta(nerd);textrazor;tagme; http://dbpedia.org/resource/Two_in_a_Million/You're_My_Number_One COMPLEMENTARITY!! (Amount of Mammal)