HU, Yuxiu (Harbin Institute of Technology Shenzhen Graduate School, China)
BODOMO, Adams (The University of Hong Kong)
http://citers2013.cite.hku.hk/en/paper_603.htm
---------------------------
Author(s) bear(s) the responsibility in case of any infringement of the Intellectual Property Rights of third parties.
---------------------------
CITE was notified by the author(s) that if the presentation slides contain any personal particulars, records and personal data (as defined in the Personal Data (Privacy) Ordinance) such as names, email addresses, photos of students, etc, the author(s) have/has obtained the corresponding person's consent.
3. Introduction
• In the age of IT, marked by an increasing use of
computers in language learning, an effective
grammar check system is highly expected.
• Grammar checking is one of the most common
natural language processing technologies used
by the general public, yet little research have
been done on it (Clement, Gerdes & Marlet,
2011).
4. Introduction
• Optimality Theory (OT) has been applied in
computational phonology, and it has been
shown that the basic assumptions and
methods of OT can be formalized and
implemented to the benefit of both research
areas (Kager, 1999; Kuhn, 2003; Ma, 2001;
Ma & Wang, 2008).
5. OT-LFG
• OT-LFG is the combination of Optimality
Theory(OT) and Lexical Functional
Grammar(LFG).
• LFG is used as the representational basis of
OT-LFG. The basic idea of OT is that variation
across languages is a result of competition
among a set of universal and violable
constraints.
8. Mini candidate sets
• “Finding the right candidates to study may be
the hardest but also useful practical skill in
doing OT” (McCarthy, 2002:34)
• “OT is inherently comparative; The
grammaticality of a structural description is
determined not in isolation, but with respect
to competing candidates.”(Tesar &
Smolensky, 1998: 238)
9. Error-driven Constraint Demotion
•
• Output Target parse p
• ( his/her own parse p’)
• Therefore, it is natural to conclude that an informative winner/loser pair
would be composed of a target parse p and a learner’s wrong parse p’ of
the same input I.
Input
11. Features in OT-LFG inputs
• The input in OT-LFG is a meaning, and candidates are
forms/syntactic structures (Fikkert & Hoop, 2009). The features in
the input that will be used in this study are listed and explained as
below:
- “REF-NUM” is used for the semantic referent’s number
specification.
- “REF-SR” refers to the specificity status of the semantic referent.
- “REF-HK” is used to specify if the listener has the knowledge of
the semantic referent, assumed by the speaker.
• There are two senses of Number in this study. In the input, REF-
NUM refers to semantic number, while the number annotation in
the output is a syntactic feature.
12. Illustration
1. [-SR, +HK]
i. Omission
a. The horse has four legs.
b. *Horse has four legs.
• This is a mini candidate set.
13. Constraints
• *FunctN: Avoid functional structure in the nominal
domain. (de Swart & Zwarts, 2008). Each functional
projection in the noun phrase performs a violation of
this constraint. If this constraint is ranked higher than
faithfulness constraints that require expression of
meanings in functional layers above NP, then there
would be no determiners. Therefore, in Mandarin this
constraint should outrank the faithfulness constraints
that are involved. Contrary to Mandarin, the
faithfulness constraints should be ranked high in
English.
14. • FDR: parse a discourse referent by means of a
functional layer above NP. (de Swart &
Zwarts, 2008)This faithfulness constraint
requires that a discourse referent must be
parsed by means of a functional layer above
an NP. There must be functional layers above
NP. This faithfulness constraint and the
constraint *FunctN form a conflict to decide
the existence of functional layers above NPs.
15.
16.
17. Constraints ranking in English
• (↑Num)=SG – FunctN >> *Def/[-Fam] >> FDR
>> *FunctN, FPL >> *PLMorph
18. Conclusion
• OT-LFG can provide successful and clear
account for linguistic issues that concern
interface analyses like the interface analysis
of syntax and semantics in the acquisition of
English articles.
• OT-LFG could provide a theoretical basis for
investigating language learning through
different rankings of the same universal
constraints.
19. • Therefore, it is possible to apply the EVAL
process of OT-LFG as the theoretical basis
and practical guidance for building an online
grammar check system.
21. References
• Clement, L., Gerdes, K. & Marlet, R. 2011. A Grammar Correction Algorithm:
Deep Parsing and Minimal Corrections for a Grammar Checker. LNAI 5591, pp.47-
63. Springer – Verlag Berlin Heidelberg.
• de Swart, H., & Zwarts, J. (2008). Article Use Across Languages: An OT Typology.
Proceedings of SuB12, Oslo: ILOS2008, 628-644.
• Fikkert, P., & Hoop, H. D. (2009). Language Acquisition in Optimality Theory.
Linguistics, 47(2), 311-357.
• Kager, R.1999. Optimality Theory. Cambridge, New York: Cambridge University
Press.
• Kuhn, J. 2003. Optimality-Theoretic Syntax: A Declarative Approach. Stanford:
CSLI Publications, Centre for the Study of Language and Information.
• Lam, O. 2004. Aspects of the Cantonese Verb Phrase : Order and Rank.
Unpublished MPhil Dissertation, The University of Hong Kong, Hong Kong.
• Ma, Q. W. 2008. Optimality Theory. Shanghai Education Press.
• Ma, Q. W & Wang, H. M. 2008. The Future and Development of Optimality
Theory. Contemporary Linguistics. 3:237-245.
• McCarthy, J. J. 2002. A Thematic Guide to Optimality Theory. Cambridge:
Cambridge University Press.
• Tesar, B., & Smolensky, P. 1998. Learnability in Optimality Theory. Linguistic
Inquiry, 29(2), 229-268.
Notes de l'éditeur
As is said in the abstract, this study attempts to present and explore the properties of OT-LFG that may be used for building an online grammar check system, we focus on the linguistic analysis withOT-LFG for the possible application in grammar check online system.
How OT-LFG works
Clement, Gerdes & Marlet (2011) thinks that one of the reasons is that linguistics, the indispensible theoretical basis of grammar check system, has been focusing itself on its own development as a science of language, though some scholars in sociolinguistics, psycholinguistics, and foreign language teaching have shown indirect interest for the development of error grammars.
proposed by Prince & Smolensky (1993) for phonological studies in the first place, Both OT research and research on computational linguistics
So OT explains language variability with language universality
Languages are universal in terms of the assumption of a constraint set universal to all the languages in the world, but languages are variable because of the different rankings of universal constraints in different languages. This universal constraint set is abbreviated as CON in the schematic representation above. Different rankings of these constraints result in different languages such as languages X and Y. Therefore, learning a language is to check whether the paring of form and meaning observed is predicted as optimal by the learner’s own system (Kuhn, 2003),. Given an input, an infinite set of output candidates is generated, and then evaluated by a set of hierarchically ranked constraints, at last the optimal candidate is selected (Kager, 1999). The ranking of constraints is language specific, but constraints are universal, although universals do not play the same role in every language (Archangeli, 1997).
An output is “optimal” when it incurs the least serious violation ranked in a constraint hierarchy of a language. For a given input (meaning), the grammar generates and then evaluates an infinite set of output candidates and selects the optimal candidate. The higher the constraint that a candidate violates is, the greater cost to harmony is made. Candidate b violates C1 that is ranked highest, which makes candidate b be out of the competition in the first round, while the other candidates all satisfy C1. Even if candidate b violates C1 only and satisfies the other two constraints, it is still out, because of the highest ranked position of C1. In the second round, only candidates a and d satisfy C2, which is ranked highest in this round. At last candidate d wins with respect to the constraints concerned or activated and the hierarchical ranking of these constraints. This is the basic assumption of OT working process.
“The diversity and infinity of candidates is a source of worry to many people when they first encounter OT.” (McCarthy, 2008:17)They suggest the use of pairs of winner and loser. and the correct ranking must make the grammatical structure more harmonic than the ungrammatical competitor, so given a suitable set of such winner and loser pairs, we can find a ranking that makes sure the winner is more harmonic than its corresponding loser in each pair (Tesar & Smolensky, 1998a). Then how to select informative winner/loser pairs for investigating learning problems? The current study made use of the selecting idea of Error-Driven Constraint Demotion to choose informative winner/loser pairs. Here I will give a brief introduction of the basic idea of Error-Driven Constraint Demotion on winner/loser pair selection.
Error-Driven Constraint Demotion (Tesar, 1998) incorporates the basic principle of Constraint Demotion into a procedure for learning a grammar from informative winner/loser pairs. The procedure for choosing informative winner/loser pairs is described in Tesar and Smolensky (1998a) very clearly. When a learner is learning a grammar, given an input I, he/she would compute his/her own parse p’ for the input I, which he/she think is optimal with respect to his/her current constraint ranking. If this parse p’ is different from the target parse p, learning would occur, otherwise it wouldn’t. The reason is if the target parse p is the same as the learner’s parse p’, then it means that the current constraint ranking also choose p as the optimal candidate, so no need to rerank the constraints, that is no error occurs and no learning is involved. On the contrary, if the learner’s parse p’ is different from the target parse p, which means that the target parse p is not optimal according to the learner’s current constraint ranking, after receiving the positive target parse p, the learner has to rerank or modify his/her constraint ranking to make sure that the target parse p is optimal. Therefore, it is natural to conclude that an informative winner/loser pair would be composed of a target parse p and a learner’s wrong parse p’ of the same input I. In this study, the errors made by the subjects are losers and the correct forms of the corresponding English usage are winners. Different losers could have the same winner. They are all listed and analyzed as different pairs of loser and winner, and each pair is a mini candidate set. Under Huebner (1983)’s taxonomy and the types of error identified in the subjects’ compositions,
In This study we examined the use of English articles to illustrate and explore the EVAL function of OT-LFG, not only because of the powerful explanatory ability of OT to the interface problem of English article use, but also because the multifaceted analyses requirement of English article use is very representative of natural language processing. Data were collected through conducting an article diagnostic test and a Chinese to English translation task. The collected data have been classified with Huebner (1983)’s semantic contexts of noun phrase reference. Based on the classification, the data will be analyzed within the framework of OT-LFG with the help of Error-driven constraint demotion that is a principle of OT learnability.
[+HK] is defined as the state of knowledge known to the hearer and this state is assumed by the speaker, while [+SR] is defined as the state of a particular referent in a speaker’s mind, and the hearer’s state of the referent is not involved. [+definite, +specific]A: Where’s your mother?B: She is meeting the principal of my brother’s elementary school. He is a very nice man. He is talking to my mother about my brother’s grade. [+definite, -specific]A: It’s already 4pm. Why isn’t your little brother home from school?B: He just called and told me that he got in trouble! He is talking to the principal of his school! I don’t know who that is. I hope my brother comes home soon.
The semantic context is generics/Omission is one type of English article error detected in the data, proposed by Prince & Smolensky (1993) for phonological studies in the first place, ‘I/n this case, a conflict between the constraints FDR and *FunctN is involved. Since there must be a functional projection before a singular referent in the semantic context [-SR, +HK] in English, and the horse is more harmonic than horse in this case in English, so the conflict is resolved by ranking FDR higher than *FunctN in the target language, English, while in the native language, Mandarin, no functional projection is needed even before a bare singular referent in the context, so the conflict is resolved by ranking *FunctN higher than FDR in Mandarin. This ranking difference is illustrated with two tableaux in which the inputs are the same (Tableau 1 and Tableau 2)
Languages differ in the ranking of universal constraints
F-structure displays the predicate and its grammatical functions such as SUBJECT and OBJECT and features such as NUMBER and PRDICATE in attribute-value matrixes. C-structure shows constituency relations by organizing constituents according to X-bar theory.
As illustrated in Tableau 1 and Tableau 2, the inputs are identical, but the same meaning is represented by two different optimal forms in two different languages depending on the ranking of the two constraints concerned. The target language, English, says that semantic interpretations in the input must be represented syntactically, while Mandarin says that no syntactic representation is needed here in this case. In order to make the correct target form the winner, FDR must be ranked higher than *FunctN, so we got FDR >>*FunctN based on this set of candidates for correct English expressions of singular generics.
After analyzing the candidate sets in all semantic contexts, we can sum up the constraint ranking results for EnglishIn order to map correct article forms in English to meanings, and to make sure that correct forms of nouns in terms of morphological changes for number have been used, the constraints must be ranked as in the ranking. However, because of the influence of the constraints ranking in Mandarin, Mandarin-speaking learners may rank *FunctN higher than FDR and *PLMorph higher than FPL as how they are ranked in their native language, which results in omission errors.
Therefore, it is possible to apply the EVAL process of OT-LFG as theoretical basis and practical guidance build an online grammar check system.