1. Exploring Higher Order Dependency Parsers
Pranava Swaroop Madhyastha
Supervised by: Prof. Michael Rosner & RNDr. Daniel Zeman
September 6, 2011
2. Introduction
◮ Dependency Grammar.
◮ Binary asymmetric relations - Head and Modifier - Highly
lexical relationships.
◮ A quick example:
◮ Projective Constraint
◮ Graph Based Dependency Parsing
◮ Arc-Factored Parsing
3. Problem Description?
◮ Augmentation of Features
◮ Semantic features
◮ Morpho-syntactic features
◮ Higher order parsing
◮ Context availability
◮ horizontal and vertical context availability
◮ Motivation
◮ Semi-supervised dependency parsing and improvements.
◮ Using well defined linguistic components.
4. What is Higher Order Dependency Parsing
◮ First-order model - decomposition of the tree into head and
modifier dependencies.
◮ Second-order models - inclusion of sibling relation of the
modifier tokens along with head and modifier or inclusion of
head and modifier and children of the modifier.
◮ Third-order models - one level up.
◮ An illustration
6. Features
◮ For a given φ - a feature vector and w - the list of related
parameters, each part is scored as
Part(x, p) = w .φ(x, p) (1)
◮ Each of these contributing feature vectors would be scored by
calculating the individual features in this fashion:
◮ dir.pos(h).pos(m)
◮ dir.form(h).pos(m)
◮ and so on ...
◮ The most basic feature patterns consider the surface form,
part-of-speech, lemma and other morphosyntactic attributes
of the head or the modifier of a dependency.
7. Experimentation done with:
◮ English - Penn Treebank
◮ Section 2 to 10 as training set - a set of 15000 sentences.
◮ Random sets of sentences from sections 15, 17, 19, 25 of the
Penn Treebank as development data - a set of 1000 sentences.
◮ Test set was chosen from Sections 0, 1, 21, 23 of the penn
treebank - a set of 2000 sentences.
◮ Czech - Prague Dependency Treebank
◮ The sentences were chosen from pdt2-full-automorph dataset.
◮ The training set consisted of train1 - train5 splits - a set of
15,000 sentences..
◮ The development set consisted of train6 and train7 splits - a
set of 1000 sentences.
◮ The test set was made up of dtest and etest parts - a set of
2000 sentences.
8. Experimentation
◮ Fine and Coarse Grained Wordsenses
◮ Approximation
◮ For English:
◮ Both Fine and Coarse Grained Wordsense extraction make use
of WordNet::SenseRelate package.
◮ Fine grained wordsense basically restricts a word to a particular
sense - Word - noun and first sense (extracted from the
wordnet)
◮ Coarse Grained wordsense is a more generic wordsense
description Word - the semantic file to which the word belongs
to.
◮ For Czech:
◮ Only Fine Grained Wordsense extraction (approximately).
◮ extracted by using the sempos which is already tagged in the
prague dependency treebank.
9. Results for the Wordsense augmentation experiment
◮ Sibling based parsers show a statistically significant
improvement.
◮ For English with Fine Grained wordsense addition - Third
order grand-sibling based parser gives an improvement of
+0.81 percent (Unlabeled Accuracy Score). A closer
statistical examination showed that sibling based interactions
which are close to each other have better precision.
◮ For English with Coarse Grained wordsense addition - the
second order sibling based parser gives an improvement of
approximately +1.09 percent.
◮ Again for Czech with fine grained wordsense augmentation,
the 3rd order sibling based parser gives an improvement of
approximately +1.20 percent.
10. Results for Morphosyntactic augmentation experiment
◮ Morphosyntactic augmentation was basically used directly by
extracting tags from the corpus.
◮ For Czech, instead of the 15 Letter tagset, we tried out a
subset (which includes - Person, Number, POSSGender,
Tense, Voice and Case)
◮ For English we integrated the fine grained part-of-speech.
11. Results
◮ Both for English and Czech, there is a significant
improvement in the parsing accuracy when it is parsed with
the grandchild based algorithms.
◮ For Czech, the third order grand sibling based algorithm
shows an improvement of +1.72 percent.
◮ For English, the third order grand sibling based algorithm
shows an improvement of +1.21 percent.
12. Conclusion
◮ Semantic features work better with sibling based parsers
(larger horizontal contexts).
◮ Morpho-syntactic features work better with grandchild based
parsers (larger vertical contexts).
◮ Features can be instrumental in several tasks, which include
accurate labeling of semantic roles and other related tasks.
◮ Linguistic information can be better handled by a higher order
parsing algorithm.
13. Future Work
◮ Higher order parsers with labels (we have not yet tested
labeled accuracy scores).
◮ Joint extraction of word-senses and semantic roles.
◮ Experimentation with lexical clusters.
◮ Thorough experimentation of several features.
◮ Maximum and Minimum order requirements.