SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Scientific Publications

SemEval Task 10: ScienceIE –
Extracting Keyphrases and Relations
from Scientific Publications
Isabelle Augenstein*#, Mrinal Das$, Sebastian Riedel*,
Lakshmi Vikraman$, Andrew McCallum$
*University College London, #University of Copenhagen,
$University of Massachusetts Amherst
4 August 2017Supported by:

Previous Tasks
SemEval 2010 Task 5 (Kim, Medelyan, Kan, Baldwin):
Automatic Keyphrase Extraction from Scientific Articles
Extract list of words/phrases
representing key topics from
scientific documents
-  context-independent
-  ranking evaluation
-  no relations
Main Title
Abstract
…........................................
…........................................
…........................................
…........................................
…........................................
…........................................
….................. ......................
….................. ......................
….................. ......................
….................. ......................
….................. ......................
….................. ......................
….................. ......................
….................. ......................
Keyphrases
1) ...........
2) ….......
3) ….......
4) ...... ....
5) .…......

Extracting Keyphrases and Relations from
Scientific Publications
Isabelle Augenstein, Mrinal Das, Sebastian Riedel, Lakshmi Vikraman, Andrew McCallum
Subtasks:
A) Mention-level keyphrase identification 
B) Mention-level keyphrase classification:
•  PROCESS (e.g. methods, equipment)
•  TASK
•  MATERIAL (e.g. corpora, physical materials)
C) Mention-level semantic relation extraction:
•  HYPONYM-OF
•  SYNONYM-OF
… addresses the task of named entity recognition (NER), a subtask of information
extraction, using conditional random fields (CRF). Our method is evaluated on the
ConLL-2003 NER corpus.
Which papers present which
processes/tasks/materials?
How do they relate to one another?
Supported by:

Annotation & Dataset
-  brat (Stenetorp, Pyysalo, Topić,
Ohta, Ananiadou, Tsujii, 2012)
-  *.ann stand-off format
-  Hosted on AWS S3
-  Annotators work remotely
-  500 paragraphs from CS, Phys, MS: 350 train, 50 dev, 100 test
-  Sampling semi-automatically from keyphrase / relation-rich paragraphs
-  Full article text given to participants as well for context

Annotation & Dataset
-  13 paid student annotators, 8 completed annotation exercise
-  Double-annotated by expert annotator given student annotations
-  Up to 38 instances per annotator
Student Annotator IAA (Cohen’s kappa)
1 0.85
2 0.66
3 0.63
4 0.60

Dataset Statistics
Characteristic
Labels Material, Process, Task
Topics
Computer Science, Physics, Material
Science
Number all keyphrases
5730
Number unique keyphrases 1697
% singleton keyphrases 31%
% single-word mentions 18%
% mentions, word length >= 3 51%
% mentions, word length >= 5 22%
% mentions, noun phrases 93%
Most common keyphrases
‘Isogeometric analysis’, ‘samples’,
‘calibration process’, ‘Zirconium alloys’

Subtasks and Evaluation Scenarios
Subtasks
a)  Mention-level keyphrase identification
b)  Mention-level keyphrase classification (PROCESS, TASK,
MATERIAL)
c)  Mention-level semantic relation extraction between keyphrases with
the same keyphrase types (HYPONYM-OF, SYNONYM-OF)
Evaluation Scenarios
1)  Only plain text is given (Subtasks A, B, C)
2)  Plain text with manually annotated keyphrase boundaries are given
(Subtasks B, C)
3)  Plain text with manually annotated keyphrases and their types are
given (Subtask C)

Overall Participation
-  54 systems submitted in development phase
-  26 systems out of those participated in test phase
-  Wide variety of approaches
-  Neural networks
-  CRFs
-  Supervised approaches with careful feature engineering
-  Rule-based systems
-  Ensembles

Results Scenario 1
Teams Overall F1 A B C
s2 end2end
(Ammar et al., 2017)
0.43 0.55 0.44 0.28
TIAL UW 0.42 0.56 0.44
TTI COIN
(Tsujimura et al., 2017)
0.38 0.5 0.39 0.21
upper bound 0.84 0.85 0.85 0.77
random 0.00 0.03 0.01 0.00
17 participating systems

Results Scenario 2
Teams Overall F1 B C
MayoNLP
(Liu et al., 2017)
0.64 0.67 0.23
UKP/EELECTION
(Eger et al., 2017)
0.63 0.66
LABDA
(Segura-Bedmar et al.,
2017)
0.48 0.51
upper bound 0.84 0.85 0.77
random 0.15 0.23 0.01

Results Scenario 3
Teams Overall F1 / C
MIT
(Lee et al., 2017a)
0.64
s2_rel
(Ammar et al., 2017)
0.54
NTNU-2
(Barik and Marsi, 2017)
0.5
upper bound 0.84
random 0.04

Summary
-  Most successful systems use RNNs (+ CRFs)
-  However, best system for Scenario 1: SVM + well-engineered features
-  Identifying keyphrases is most challenging subtask
-  Dataset contains many long and infrequent keyphrases
-  Systems relying memorising lists of keyphrases do not perform well
-  Finding high-quality annotators for this task is hard – many student
annotators dropped out
-  Better recruitment, pilot annotation, pick only top annotators
-  Combining subtasks to evaluation scenarios caused confusion
-  Many teams’ systems did not tackle relation extraction subtask – even
though it hurt their overall F1

Relevant Papers at ACL
Isabelle Augenstein, Mrinal Das, Sebastian Riedel, Lakshmi Vikraman
and Andrew McCallum. SemEval 2017 Task 10: ScienceIE - Extracting
Keyphrases and Relations from Scientific Publications. SemEval 2017.
https://arxiv.org/abs/1704.02853
Isabelle Augenstein, Anders Søgaard. Multi-Task Learning of
Keyphrase Boundary Classification. ACL 2017 (short).
Ed Collins, Isabelle Augenstein, Sebastian Riedel. A Supervised
Approach to Extractive Summarisation of Scientific Papers. CoNLL 2017.

Thank you!
isabelleaugenstein.github.io
augenstein@di.ku.dk
@iaugenstein
github.com/isabelleaugenstein

SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Scientific Publications

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Scientific Publications

Similar to SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Scientific Publications (20)

More from Isabelle Augenstein

More from Isabelle Augenstein (20)

Recently uploaded

Recently uploaded (20)

SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Scientific Publications