Chinese Grammar vs English Grammar in Universal Dependency
1. Chinese Grammar VS English Grammar
in Universal Dependency
Hang Jiang, Jinho D. Choi, PhD
Department of Mathematics and Computer Science, Emory University, Atlanta, GA, USA 30322
The aims of UD
• Provide a concise generic set of features that are important to analyze different
languages.
• Annotate different corpus consistently across languages with some extensions on
some specific languages.
• This eventually makes parsing more accurate and easier
Status of UD
• 47 languages and dialects have had their own treebanks
• Chinese as the most spoken language of the world is excluded.
Project Goal
In this project, we are going to to compare the difference between Chinese and
English in UD in order to set up basic differences in building up Chinese treebank in
UD.
Motivation for UD
Different from people’s intuition, English and Chinese have similar basic grammar
that can be explained by Universal Grammar. English share similar structures with
Chinese in:
• Core dependents of clausal predicates (objects, subjects and complements but not
clausal complement)
• Root, coordination and loose joining
Those dependents and some other similar dependents basically show UD can be
applied to Chinese.
A good example can be shown below.
Fig.2 The graph representation shows that the Chinese and English sentences have highly corresponding dependency relation with each
other in many cases..
• The dependency relation is amazingly similar in a word-to-word level for both
Chinese and English sentences.
English Structures fit Chinese
English has many distinctive that features make us wonder whether English has
brought some extra UD relations to UD that other languages may not need.
English grammar show dramatic differences from that of Chinese mainly in (not
limited to):
• noun dependents (acl, det)
• non-core dependents of clausal predicates (nmod, advmod, neg)
• special clausal dependents (vocative, aux, mark, discourse, auxpass, expl)
• case markers (case).
Of expression is a good example.
Fig.3 An alternative way of saying ‘the weather office won’
• In English, of expression’s corresponding structure doesn’t exist in Chinese.
However, the regular noun modifiers are often followed by de (的), which is also a
case relation. So there still exists case relation in Chinese.
However, the following example is an exception found in our project.
Fig.4 The expletive it in English doesn’t exist in Chinese.
• The expletive it doesn't exist in Chinese at all. Instead Chinese has pro-dropping
and assumes the subject is weather in this context. However, it is still indisputable
that expl is necessary across languages.
• As a result, UD relation is considered very concise and generic after comparing
Chinese and English grammar.
English UD Examples unfit Chinese
Chinese has many different structural features compared with English. However, those
features are mainly distributed in (not limited to):
• noun classifiers
• prepositions, postpositions
• adjectives, comparatives
• aspect marker
• auxiliaries
Below are two Chinese examples with clear dependency relation.
1. The first example here is about consecutive verb use in Chinese.
Fig.5 Corresponding English to this example should be “He walks up (to somewhere).”
• The phenomenon of the consecutive use of verbs in Chinese can actually be treated
as asyndetic conjunction, which means the coordinating conjunction is omitted.
Chinese Structures Missing in UD examples
2. The use of prepositions and postpositions in Chinese
Fig. 6 The sentence means that “At school, I am always criticized.”
• 在(at) and 里(inside) are respectively preposition and postposition in Chinese.
Nevertheless, Ba sentence is the exception and we have to assign an ambiguous dep to
it. See the example in Fig.7.
Fig.7 English translationis that “It was I that let John finish and check homework for one time.”
• In this SOV ba sentence, it is not possible to treat ba as a preposition and assign a
case relation to ba and John(约翰) because every word can only have one head in
dependency relation. As a result, the isolated ba has to be dep related to the verb
following ba.
Contributions
• Show that UD is robust and basically compatible with Chinese
• Find out that ba sentence as a counterexample that Chinese doesn’t fit UD relation
• Provide clear relations, instead of dep, to Chinese distinctive structures in order to
better adapt UD to Chinese compared with Stanford parser
Future Work
• Explore in more details how UD can be adapted to fit Chinese by adapting
universal features and POS tags to Chinese morphology
• Build up a comprehensive guideline for Chinese UD and then construct Chinese
UD treebank.
Contributions and Future Work
Reference
• Choi, Jinho D., and Martha Palmer. Guidelines for the Clear style constituent to
dependency conversion. Technical Report 01-12, University of Colorado at
Boulder, 2012.
• De Marneffe, Marie-Catherine, and Christopher D. Manning. "The Stanford typed
dependencies representation." Coling 2008: Proceedings of the workshop on Cross-
Framework and Cross-Domain Parser Evaluation. Association for Computational
Linguistics, 2008.
• McDonald, Ryan T., et al. "Universal Dependency Annotation for Multilingual
Parsing." ACL (2). 2013.
Acknowledgement
• This research was supported by Emory NLP in terms of its assistance with Emory
NLP demo. See http://nlp.mathcs.emory.edu/.
Reference & Acknowledgement
English Spanish French Hindi Arabic
Tokens # 254K 423K 389K 351K 282K
Sentences
#
16K 16K 16K 16K 7K
Fig.1 The size of UD structures for some languages
UD (Universal Dependency) is an annotation scheme for multilingual dependency
structures, providing universal grammar.
• Dependency relation is a linguistic relation discussing mainly the notions of
subject, object, clausal complement, noun modifier, noun determiner and so on.
• Therefore, UD has a set of syntactic rules to label relations of words by
dependency relations.
Introduction