This paper describes the retrieval of correct semantic boundaries for predicate-argument structures annotated by dependency structure. Unlike phrase structure, in which arguments are annotated at the phrase level, dependency structure does not have phrases so the argument labels are associated with head words instead: the subtree of each head word is assumed to include the same set of words as the annotated phrase does in phrase structure. However, at least in English, retrieving such subtrees does not always guarantee retrieval of the correct phrase boundaries. In this paper, we present heuristics that retrieve correct phrase boundaries for semantic arguments, called semantic boundaries, from dependency trees. By applying heuristics, we achieved an F1-score of 99.54% for correct representation of semantic boundaries. Furthermore, error analysis showed that some of the errors could also be considered correct, depending on the interpretation of the annotation.
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Retrieving Correct Semantic Boundaries in Dependency Structure
1. Retrieving Correct Semantic Boundaries
in Dependency Structure
The 4th Linguistic Annotation Workshop at ACL’10
July 15th, 2010
Jinho D. Choi (University of Colorado at Boulder)
Martha Palmer (University of Colorado at Boulder)
2. Dependency Structure for SRL
• What is dependency?
- Syntactic or semantic relation between a pair of words.
TMP
LOC PMOD
NMOD
events
places in this city
year
• Why dependency structure for semantic role labeling?
- Dependency relations often correlate with semantic roles.
- Simpler structure → faster annotation → more gold-standard
faster parsing → more applications
Dep (Choi) vs. Phrase (Charniak) → 0.0025 vs. 0.5 (sec)
2
3. Phrase vs. Dependency Structure
• Constituent vs. Dependency
appear
SBJ LOC
-SBJ
results in
-LOC NMOD PMOD
The news
NMOD
today
NMOD
's
10/15 (66.67%) parsing papers at ACL’10
are on Dependency Parsing
3
4. PropBank in Phrase Structure
• A corpus annotated with verbal propositions and arguments.
• Arguments are annotated on phrases.
ARG0
ARGM-LOC
But there is no phrase
in dependency structure
4
5. PropBank in Dependency Structure
• Arguments are annotated on head words instead.
Phrase = Subtree of head-word
ARG0
ARGM-LOC
ROOT PMOD
NMOD
NMOD SBJ LOC NMOD
root The results appear in today 's news
5
6. Propbank in Dependency Structure
• Phase ≠ Subtree of head-word.
ARG1
Subtree of the head word
includes the predicate
NMOD NMOD LGS PMOD
The plant owned by Mark
6
7. Tasks
• Tasks
- Convert phrase structure (PS) to dependency structure (DS).
- Find correct head words in DS.
- Retrieve correct semantic boundaries from DS.
• Conversion
- Pennconverter, by Richard Johansson
• Used for CoNLL 2007 - 2009.
- Penn Treebank (Wall Street Journal)
• 49,208 trees were converted.
• 292,073 Propbank arguments exist.
7
8. System Overview
Penn Treebank PropBank
Pennconverter Heuristics
Dependency trees Head words
Automatic SRL System
Set of Head words
Heuristics
Set of chunks (phrases)
8
9. Finding correct head words
• Get the word-set Sp of
each argument in PS.
• For each word in Sp, find
the word wmax with the
maximum subtree in DS.
• Add the word to the
head-list Sd. }
Sp = { Yields, on, mutual, funds, to, slide}
• Remove the subtree of
wmax from Sp.
ROOT
SBJ
PMOD
• Repeat the search until
Sp becomes empty.
NMOD
root Yields
NMOD
on mutual funds
OPRD
continued
IM
to slide
Sd = [Yields , to ]
9
10. Retrieving correct semantic boundaries
• Retrieving the subtrees of head-words
- 100% recall, 87.62% precision, 96.11% F1-score.
- What does this mean?
• The state-of-art SRL system using DS performs about 86%.
• If your application requires actual argument phrases instead of
head-words, the performance becomes lower than 86%.
• Improve the precision by applying heuristics on:
- Modals, negations
- Verb chain, relative clauses
- Gerunds, past-participles
10
11. Verb Predicates whose Semantic
Arguments are their Syntactic Heads
• Semantic arguments of verb predicates can be the
syntactic heads of the verbs.
• General solution
- For each head word, retrieve the subtree of the head word
excluding the subtree of the verb predicate.
NMOD NMOD LGS PMOD
The plant owned by Mark
11
12. Examples
• Modals are the heads of the main verbs in DS.
ROOT COORD OBJ
SBJ COORD CONJ ADV NMOD
root He may or may not read the book
• Conjunctions
NMOD OBJ
DEP COORD CONJ NMOD
people who meet or exceed the expectation
• Past-participles
NMOD PMOD
NMOD NMOD
correspondence mailed about incomplete 8300s
12
13. Evaluations
• Models
- Model I
: retrieving all words in the subtrees (baseline).
- Model II : using all heuristics.
- Model III : II + excluding punctuation.
• Measurements
- Accuracy
: exact match
- Precision
- Recall
- F1-score
13
14. Evaluations
• Results
- Baseline
: 88.00%a, 92.51%p, 100%r , 96.11%f
- Final model
: 98.20%a, 99.14%p, 99.95%r, 99.54%f
• Statistically significant (t = 149, p < .0001)
100
97
Accuracy
94 Precision
Recall
91 F1
88
I II III
14
15. Error Analysis
• Overlapping arguments
ARG1
ARG1 ARGM-LOC
PMOD LOC PMOD
OBJ LOC NMOD OBJ NMOD
share burdens in the region share burdens in the region
15
16. Error Analysis
• PP attachment
NMOD
NMOD SBJ ADV PMOD
the enthusiasm investors showed for stocks
ARG1
ADV
NMOD
NMOD SBJ PMOD
the enthusiasm investors showed for stocks
ARG1
16
17. Conclusion
• Conclusion
- Find correct head words (min-set with max-coverage).
- Find correct semantic boundaries (99.54% F1-score).
- Suggest ways of reconstructing dependency structure so that
it can fit better with semantic roles.
- Can be used to fix some of the inconsistencies in both
Treebank and Propbank annotations.
• Future work
- Apply to different corpora.
- Find ways of automatically adding empty categories.
17
18. Acknowledgements
• Special thanks are due to Professor Joakim Nivre of
Uppsala University (Sweden) for his helpful insights.
• National Science Foundation CISE-CRI-0551615
• Towards a Comprehensive Linguistic Annotation and
CISE-CRI 0709167
• Collaborative: A Multi-Representational and Multi-
Layered Treebank for Hindi/Urdu
• Defense Advanced Research Projects Agency (DARPA/
IPTO) under the GALE program, DARPA/CMO
Contract No. HR0011-06-C-0022, subcontract from
BBN, Inc.
18
Notes de l'éditeur
Many SRL systems use phrase structure but
For 4M sentences: 2.7 hours vs. 23 days
Many SRL systems use phrase structure but
For 4M sentences: 2.7 hours vs. 23 days
Many SRL systems use phrase structure but
For 4M sentences: 2.7 hours vs. 23 days
Many SRL systems use phrase structure but
For 4M sentences: 2.7 hours vs. 23 days
Many SRL systems use phrase structure but
For 4M sentences: 2.7 hours vs. 23 days
Many SRL systems use phrase structure but
For 4M sentences: 2.7 hours vs. 23 days
Many SRL systems use phrase structure but
For 4M sentences: 2.7 hours vs. 23 days
Many SRL systems use phrase structure but
For 4M sentences: 2.7 hours vs. 23 days
Many SRL systems use phrase structure but
For 4M sentences: 2.7 hours vs. 23 days
Many SRL systems use phrase structure but
For 4M sentences: 2.7 hours vs. 23 days
Many SRL systems use phrase structure but
For 4M sentences: 2.7 hours vs. 23 days
Many SRL systems use phrase structure but
For 4M sentences: 2.7 hours vs. 23 days
Many SRL systems use phrase structure but
For 4M sentences: 2.7 hours vs. 23 days
Many SRL systems use phrase structure but
For 4M sentences: 2.7 hours vs. 23 days
Many SRL systems use phrase structure but
For 4M sentences: 2.7 hours vs. 23 days
Visualize the difference between phrase and dependency
-SBJ, still doesn&#x2019;t show relations between &#x2018;The results&#x2019; and &#x2018;appear&#x2019;
Visualize the difference between phrase and dependency
-SBJ, still doesn&#x2019;t show relations between &#x2018;The results&#x2019; and &#x2018;appear&#x2019;
Visualize the difference between phrase and dependency
-SBJ, still doesn&#x2019;t show relations between &#x2018;The results&#x2019; and &#x2018;appear&#x2019;
Visualize the difference between phrase and dependency
-SBJ, still doesn&#x2019;t show relations between &#x2018;The results&#x2019; and &#x2018;appear&#x2019;
Visualize the difference between phrase and dependency
-SBJ, still doesn&#x2019;t show relations between &#x2018;The results&#x2019; and &#x2018;appear&#x2019;