Shared task summary for SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Scientific Publications
Paper: https://arxiv.org/abs/1704.02853
Abstract:
We describe the SemEval task of extracting keyphrases and relations between them from scientific documents, which is crucial for understanding which publications describe which processes, tasks and materials. Although this was a new task, we had a total of 26 submissions across 3 evaluation scenarios. We expect the task and the findings reported in this paper to be relevant for researchers working on understanding scientific content, as well as the broader knowledge base population and information extraction communities.
Judging the Relevance and worth of ideas part 2.pptx
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Scientific Publications
1. SemEval Task 10: ScienceIE –
Extracting Keyphrases and Relations
from Scientific Publications
Isabelle Augenstein*#, Mrinal Das$, Sebastian Riedel*,
Lakshmi Vikraman$, Andrew McCallum$
*University College London, #University of Copenhagen,
$University of Massachusetts Amherst
4 August 2017Supported by:
4. Extracting Keyphrases and Relations from
Scientific Publications
Isabelle Augenstein, Mrinal Das, Sebastian Riedel, Lakshmi Vikraman, Andrew McCallum
Subtasks:
A) Mention-level keyphrase identification
B) Mention-level keyphrase classification:
• PROCESS (e.g. methods, equipment)
• TASK
• MATERIAL (e.g. corpora, physical materials)
C) Mention-level semantic relation extraction:
• HYPONYM-OF
• SYNONYM-OF
… addresses the task of named entity recognition (NER), a subtask of information
extraction, using conditional random fields (CRF). Our method is evaluated on the
ConLL-2003 NER corpus.
Which papers present which
processes/tasks/materials?
How do they relate to one another?
Supported by:
5. Annotation & Dataset
- brat (Stenetorp, Pyysalo, Topić,
Ohta, Ananiadou, Tsujii, 2012)
- *.ann stand-off format
- Hosted on AWS S3
- Annotators work remotely
- 500 paragraphs from CS, Phys, MS: 350 train, 50 dev, 100 test
- Sampling semi-automatically from keyphrase / relation-rich paragraphs
- Full article text given to participants as well for context
6. Annotation & Dataset
- 13 paid student annotators, 8 completed annotation exercise
- Double-annotated by expert annotator given student annotations
- Up to 38 instances per annotator
Student Annotator IAA (Cohen’s kappa)
1 0.85
2 0.66
3 0.63
4 0.60
7. Dataset Statistics
Characteristic
Labels Material, Process, Task
Topics
Computer Science, Physics, Material
Science
Number all keyphrases
5730
Number unique keyphrases 1697
% singleton keyphrases 31%
% single-word mentions 18%
% mentions, word length >= 3 51%
% mentions, word length >= 5 22%
% mentions, noun phrases 93%
Most common keyphrases
‘Isogeometric analysis’, ‘samples’,
‘calibration process’, ‘Zirconium alloys’
8. Subtasks and Evaluation Scenarios
Subtasks
a) Mention-level keyphrase identification
b) Mention-level keyphrase classification (PROCESS, TASK,
MATERIAL)
c) Mention-level semantic relation extraction between keyphrases with
the same keyphrase types (HYPONYM-OF, SYNONYM-OF)
Evaluation Scenarios
1) Only plain text is given (Subtasks A, B, C)
2) Plain text with manually annotated keyphrase boundaries are given
(Subtasks B, C)
3) Plain text with manually annotated keyphrases and their types are
given (Subtask C)
9. Overall Participation
- 54 systems submitted in development phase
- 26 systems out of those participated in test phase
- Wide variety of approaches
- Neural networks
- CRFs
- Supervised approaches with careful feature engineering
- Rule-based systems
- Ensembles
10. Results Scenario 1
Teams Overall F1 A B C
s2 end2end
(Ammar et al., 2017)
0.43 0.55 0.44 0.28
TIAL UW 0.42 0.56 0.44
TTI COIN
(Tsujimura et al., 2017)
0.38 0.5 0.39 0.21
upper bound 0.84 0.85 0.85 0.77
random 0.00 0.03 0.01 0.00
17 participating systems
11. Results Scenario 2
Teams Overall F1 B C
MayoNLP
(Liu et al., 2017)
0.64 0.67 0.23
UKP/EELECTION
(Eger et al., 2017)
0.63 0.66
LABDA
(Segura-Bedmar et al.,
2017)
0.48 0.51
upper bound 0.84 0.85 0.77
random 0.15 0.23 0.01
4 participating systems
12. Results Scenario 3
Teams Overall F1 / C
MIT
(Lee et al., 2017a)
0.64
s2_rel
(Ammar et al., 2017)
0.54
NTNU-2
(Barik and Marsi, 2017)
0.5
upper bound 0.84
random 0.04
5 participating systems
13. Summary
- Most successful systems use RNNs (+ CRFs)
- However, best system for Scenario 1: SVM + well-engineered features
- Identifying keyphrases is most challenging subtask
- Dataset contains many long and infrequent keyphrases
- Systems relying memorising lists of keyphrases do not perform well
- Finding high-quality annotators for this task is hard – many student
annotators dropped out
- Better recruitment, pilot annotation, pick only top annotators
- Combining subtasks to evaluation scenarios caused confusion
- Many teams’ systems did not tackle relation extraction subtask – even
though it hurt their overall F1
14. Relevant Papers at ACL
Isabelle Augenstein, Mrinal Das, Sebastian Riedel, Lakshmi Vikraman
and Andrew McCallum. SemEval 2017 Task 10: ScienceIE - Extracting
Keyphrases and Relations from Scientific Publications. SemEval 2017.
https://arxiv.org/abs/1704.02853
Isabelle Augenstein, Anders Søgaard. Multi-Task Learning of
Keyphrase Boundary Classification. ACL 2017 (short).
https://arxiv.org/abs/1704.00514
Ed Collins, Isabelle Augenstein, Sebastian Riedel. A Supervised
Approach to Extractive Summarisation of Scientific Papers. CoNLL 2017.
https://arxiv.org/abs/1706.03946