SlideShare une entreprise Scribd logo
1  sur  4
Télécharger pour lire hors ligne
METHOD FOR AN AUTOMATIC GENERATION OF A
            SEMANTIC-LEVEL CONTEXTUAL TRANSLATIONAL
                           DICTIONARY

                                                    Dmitry Kan
Department of Technology of Programming, Saint-Petersburg State University, Universitetsky prosp. 35, Peterhof, Russia
                                             dmitry.kan@gmail.com




Keywords:     Translational Dictionary, Semantic Analyzer, Computer Semantics, Machine Translation, Disambiguation.

Abstract:     In this paper we demonstrate the semantic feature machine translation (MT) system as a combination of two
              fundamental approaches, where the rule-based side is supported by the functional model of the Russian
              language and the statistical side utilizes statistical word alignment. The MT system relies on a semantic-
              level contextual translational dictionary as its key component. We will present the method for an automatic
              generation of the dictionary where disambiguation is done on a semantic level.



1    INTRODUCTION                                             semantic-level contextual translational dictionary in
                                                              Section 4. Finally, we present the experimental MT
There are two fundamental approaches to Machine               system in Section 5. We list the main features of the
Translation (MT): rule-based approach and                     presented MT system in Section 6.
statistical approach, which is based on streams of                Classic MT triangle (cf. Fig. 1) separates
input data. Each of these two approaches has their            semantic and syntactic transfers.
advantages: the rule-based approach deeply studies
and formalizes the linguistic rules of a natural
language; statistical approach gives an opportunity
to rapidly prototype new algorithms with application
to several different natural languages using data
streams. Along with it, there are as well
disadvantages attributed to each of the two
fundamental approaches to MT: rule-based approach
commonly lacks automation of language
formalization process and is usually bound to one
natural language (or very few similar languages);
statistical approach generally avoids deep view into
the properties of a natural language giving away the               Figure 1: MT triangle adopted from Klueva, 2007.
task of language formalization to a numerical
algorithm. In this paper we would like to present an              The functional theory of the Russian language
ongoing project of an MT system, that merges rule-            Tuzov, 2004 however shows that these two levels
based and statistic approaches in its components              are interconnected. A morphological surface may
where it is possible.                                         point to two or more parts of speech. All of them can
    Since the main component of the system is                 be equally considered as candidates on the syntactic
translational dictionary, we prepare the grounds of           level. However a word’s semantics is required for a
its automatic generation in Section 2. The rule-based         successful final resolution of a sentence meaning
side of the system is supported by the functional             (see Section 3 for more detail).
model of Russian language in Tuzov, 2004. It is                   Bennett, 1990 questions the need of a full-blown
described in brief in Section 3. We utilize the results       semantic analysis for MT and instead suggests
of Sections 2 and 3 for automatic creation of a               advancing the „semantic feature system‟ to achieve




                                                                                                                      415
ICSOFT 2011 - 6th International Conference on Software and Data Technologies




cost effectiveness. In this paper we present the MT         стремлении ({2}) удержать ({}) власть
system, which has semantics coded on the level of           ({5}) ,
dictionary entries and uses semantic analyzer to            ({6}) Первез ({7}) Мушарраф ({8})
                                                            отверг ({9 10})
represent the meaning of an input sentence.                 конституционную ({14 15})
    The related work Homola, 2009 shows how to              систему ({})
build a translational dictionary between Chech and          Пакистана ({11 12 13}) и ({16})
English with the statistical word alignment.                объявил ({17}) о ({18})
However Homola, 2009 does not provide a method              введении ({})
of resolving the word ambiguities. The main                 чрезвычайного ({19 21})
challenge during generation of a translation                положения ({}) . ({22})
dictionary is resolving ambiguity of numerous word          The above structure represents a mapping of words
sequence pairs. In this work we suggest an approach         in Russian sentence into sequences of words in its
which allowed us to solve the task by semantic              English translation. Table 1 contains the mapping
interpretation of each dictionary entry.                    for the above example.

                                                            Table 1: Word alignment for English and Russian
                                                            sentences.
2     TASK OF WORD ALIGNMENT
                                                                      Russian                       English
                                                                       NULL                            of
To build an MT system which uses statistic
                                                                     отчаянном                Desperate to hold
modelling of a natural language one needs a parallel
                                                                    стремлении                         to
corpus. Based on the corpus a model of translation is                  власть                        power
built and it contains phrase translational dictionary.                    ,                             ,
In the process of constructing the dictionary, phrases                 Первез                        Pervez
of the parallel corpus get mapped together through                  Мушарраф                       Musharraf
maximizing the probability of their co-occurrence in                   отверг                    has discarded
the parallel corpus. Maximization is conducted over              конституционную           constitutional framework
all possible word sequences of two aligned sentences                 Пакистана                    Pakistan ´ s
in two languages (cf. Fig. 2).                                           и                            and
                                                                      объявил                       declared
                                                                         о                              a
                                                                  чрезвычайного                state emergency
                                                                          .                             .

Figure 2: Example of one word-level alignment of English
and French sentences adopted from Och, 1999.                3     COMPUTER SEMANTICS
The algorithm of word alignment is described in Al-         According to the theory in Tuzov, 2004 any natural
Onaizan et al, 1999 and Och, 1999, while the full           language is functional in strict mathematical sense.
mathematical formulation of statistical MT is given         Each sentence can be represented in a form of
in Brown, Della Pietra, Della Pietra and Mercer,            superposition of its functions-words:
1993. The toolset Moses by Koehn, 2007 allows
building a statistical MT system and includes
GIZA++ as one of its component. GIZA++
                                                             S = F ( f1 ( w11 ,..., w1k ),..., f n ( wn1 ,..., wnl )),
                                                                                                                         (1)
implements the above algorithms for word                     wij ≠ whm , ∀i ≠ h, j ≠ m
alignment and outputs the following structure for
each pair of sentences in the parallel corpus:
                                                            Definitional domain and values domain of F,f1,..,fn
Desperate to hold onto power , Pervez
                                                            belong to reality. In (1) word inequalities mean, that
Musharraf has                                               we count each word only once. This holds even if
discarded Pakistan ' s constitutional                       there are several repetitions of a word with the same
framework and                                               or different semantics in the input sentence. In the
declared a state of emergency .                             process of a sentence analysis, semantic analyzer
NULL ({20}) В ({})                                          implemented by Tuzov, 2004 operates with the word
отчаянном ({1 3 4})                                         senses extracted both from the hierarchical ontology




416
METHOD FOR AN AUTOMATIC GENERATION OF A SEMANTIC-LEVEL CONTEXTUAL TRANSLATIONAL
                                                                         DICTIONARY



with more than 2000 classes and from the syntactic-     English analogue. One important property of the
semantic dictionary. Since a dictionary word is         dictionary is that its entries are context dependent.
generally represented as n-ary function with            This is provided by two circumstances: 1) each
semantic constraints, its final semantics in the        sentence in Russian had its expert translation into
sentence depends on the exact words in its context      English and 2) each Russian word has been
within the sentence. The Tuzov’s syntactic-semantic     attributed with a semantic formula that was a result
dictionary contains more than 150, 000 semantic         of semantic assembling of the corresponding
formulas.                                               Russian sentence.
                                                            Consider one entry of the above extract in more
                                                        detail:
4   TRANSLATIONAL                                           В Y1>HabU(Y1:,ПРЕД:Z1) 
    DICTIONARY                                              <149>--->Within
                                                            The semantic formula on the left of ---> sign has
For creation of the MT system that is based on the      several components: the word “В” (the Russian
functional theory in Tuzov, 2004 we need to             preposition with a lot of meanings, roughly
translate the semantic dictionary onto the target       corresponding to the English prepositions in, at,
language. For the process automation we have            within, into, to, of etc); the basis function HabU(x,y)
chosen the method described in the Section 2. For       with its arguments (the function defines that x
the parallel corpus we have used list of parallel       possesses y), which in the case of HabU are Y1 and
sentences in Russian and English from the package       Z1 (prepositional case);  sign followed by the order
UMC Klyueva and Bojar, 2008. GIZA++ has                 number of the semantic alternative in the semantic-
generated 1,3 million of phrase pairs, including        syntactic dictionary.
duplicates. Applying the semantic analyzer on each          Another example:
Russian sentence we have obtained the semantic
                                                           Y1>Loc(Y1:,ВНУТРИ$12/313/05(ПРЕД:Z1))
alternatives of each of the words in the built pairs,    <146>--->at
which correspond to their local context. As a result,
each word in the original translational phrase             The second argument of Loc(x,y) basis function
dictionary was substituted with its semantic formula.   (defines, that x is located in / at y) is itself a
This solves the disambiguation on semantic level.       function-preposition that takes one argument in
After removing the duplicates from the modified         prepositional case. The second argument has the
dictionary, the final version of semantic-level         name “ВНУТРИ” which is appended with
contextual translational dictionary has been built      ontological class number $12/313/05. In this class
with about 18, 000 word pairs. The dictionary is        number, 12 refers to physical objects, 313 refers to
subject to further clean up procedures and              inhabited locality, 05 refers to physical position of
enrichment. Here is an extract from the final           an object in prepositional case which is expressed as
dictionary:                                             adverb in Russian (“ВНУТРИ” is both “where?”
В Y1>HabU(Y1:,ПРЕД:Z1)  <149>--->Within               and “how?”).
В Y1>Loc(Y1:,ВНУТРИ$12/313/05(ПРЕД:Z1)) 
<146>--->at
В Y1>Loc(Y1:,Oper01(#,ПРЕД:Z1))  <208>---
>In                                                     5    EXPERIMENTAL MT SYSTEM
В Y1>Loc(Y1:,ПРЕД:Z1)  <224>--->Throughout
...
НА Y1>Direkt(Y1:,ВЕРХ$12/141/05(ВИН:Z1))              The semantic-level translational dictionary obtained
<67>--->at                                              in the Section 4 forms the ground for the
НА Y1>Direkt(Y1:,РОД:Z1)  <100>--->on                 experimental Russian to English MT system. In
НА Y1>Direkt(Y1:,РОД:Z1)  <69>--->for                 order to achieve fluency on the target language side
...
ОБРАЗ (РОД:Z1)  <2>--->a way                          and to reduce the noise in the automatically
ОБЩЕМИРОВОЙ                                             generated semantic-level translation dictionary, we
A1>Rel(A1:НЕЧТО$1,ПОЛНЫЙ$12/207/05(МИР$1227)            have devised the Semantic Machine Translation
)
 <1>--->global
                                                        Model (SMTM) for translating sentence P onto the
...                                                     target language L2:

   Each dictionary entry contains semantic formula
corresponding to the original Russian word and its




                                                                                                           417
ICSOFT 2011 - 6th International Conference on Software and Data Technologies




 SMTM P =                                                               applying the method described in Section 4.
                                                     ,   (2)
 arg max Ω (t ,..., t ) = arg max ∑ δ i (tk , tl )
                  S                       s
          i 1        m                                             We have presented the MT system that can be
   i =1,n                  k =1,m −1
                             l =2 ,m
                                     i
                                                               categorized as a semantic feature system according
                                                               to Bennett, 1990, which has complex semantic
where:                                                         analysis system under the hood.
                      1, t k tl ∈ L M
δ i (t , t ) = {
    S                              2                     (3)
            k l       0, t k tl ∉ L M                          REFERENCES
                                   2
Index S in (2), (3) suggests that the definitional             Al-Onaizan Y., Curin J., Jahr M., Knight K., Lafferty J.,
                         S        S                               Melamed D., Och F. J., Purdy J., Smith N. A.,
domain of functions δ      and Ω is the set of                    Yarowsky D. 1999. Statistical Machine Translation.
                       i          i
                                                                  Final report. Statistical Machine Translation, JHU
semantic formulas coded with symbols tl ∈ L2 in                   Workshop.
the translational dictionary. L2M in (2) is the                Bennett W. S. 1990. How Much Semantics is Necessary
statistical model of L2. The translation algorithm                for MT systems? Proceedings of the third International
starts with translation of an input Russian sentence              Conference on Theoretical and Methodological Issues
semantic representation. It then extracts the                     in Machine Translation of Natural Languages,
                                                                  Linguistic Research Center, University of Texas,
calculated semantic alternatives for each of the
                                                                  Austin, TX:261-270.
words and maps them onto their English                         Brown P. F., Della Pietra V. J., Della Pietra S. A., Mercer
translations. Form the final sentence in English                  R.L. 1993. The Mathematics of Statistical Machine
using the model (2). Some of the translations show                Translation: Parameter Estimation. Computational
the advantage of semantic processing over statistical             linguistics, 19(2):263-311.
modelling Moses by Koehn, 2007 in that the system              Klueva N., Bojar O. 2008. UMC 0.1: Chech-Russian-
picks the correct semantic alternatives of polysemic              English Multilingual Corpus. Proceedings of the
words and their correct translations. The                         conference “Corpora 2008”, Saint-Petersburg.
experimental MT system also translates more words              Klueva N. 2007. Semantics in Machine Translation.
                                                                  WDS’07 Proceedings of Contributed Papers, Part
than Moses from Russian to English, because it
                                                                  I:141-144.
operates with the word lemmas while Moses is                   Koehn P. et al. 2007. Moses: Open Source Toolkit for
sensible to word surfaces.                                        Statistical Machine Translation. Annual meeting of
                                                                  the Association for Computational Linguistics,
                                                                  demonstration session, Prague, Chech Re-public.
6       FEATURES OF THE                                        Och F. J. 1999. An Efficient Method for Determining
                                                                  Bilingual Word Classes. Ninth conf. of the Europ.
        EXPERIMENTAL MT SYSTEM                                    Chapter of the Association for Computational
                                                                  Linguistics, EACL’99:71-76.
The presented MT system is in its early stage of               Tuzov V., 2004. Computer Semantics of the Russian
incorporating the semantic component into the                     Language,       Saint-Petersburg     State    University
process of MT. We have utilized statistical methods               Publishing House. Saint-Petersburg.
for automating the construction of a semantic-level
translational dictionary and functional theory of
natural language to resolve the ambiguity and
introduce semantic context. Here is the list of the
main features of the MT system:
     • Dictionary entries contain semantic
          attributes of the Russian words;
        •    Each entry represents a sample of a context
             extracted using statistical word alignment
             and coded with the corresponding semantic
             formula;

        •    The MT system is automatically extendable
             through acquiring new parallel corpora and




418

Contenu connexe

Tendances

CBAS: CONTEXT BASED ARABIC STEMMER
CBAS: CONTEXT BASED ARABIC STEMMERCBAS: CONTEXT BASED ARABIC STEMMER
CBAS: CONTEXT BASED ARABIC STEMMERijnlc
 
Corpus-based part-of-speech disambiguation of Persian
Corpus-based part-of-speech disambiguation of PersianCorpus-based part-of-speech disambiguation of Persian
Corpus-based part-of-speech disambiguation of PersianIDES Editor
 
Some alternative ways to find m ambiguous binary words corresponding to a par...
Some alternative ways to find m ambiguous binary words corresponding to a par...Some alternative ways to find m ambiguous binary words corresponding to a par...
Some alternative ways to find m ambiguous binary words corresponding to a par...ijcsa
 
Lecture 2: From Semantics To Semantic-Oriented Applications
Lecture 2: From Semantics To Semantic-Oriented ApplicationsLecture 2: From Semantics To Semantic-Oriented Applications
Lecture 2: From Semantics To Semantic-Oriented ApplicationsMarina Santini
 
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONAN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONijnlc
 
Hybrid part of-speech tagger for non-vocalized arabic text
Hybrid part of-speech tagger for non-vocalized arabic textHybrid part of-speech tagger for non-vocalized arabic text
Hybrid part of-speech tagger for non-vocalized arabic textijnlc
 
Hps a hierarchical persian stemming method
Hps a hierarchical persian stemming methodHps a hierarchical persian stemming method
Hps a hierarchical persian stemming methodijnlc
 
Lectura 3.5 word normalizationintwitter finitestate_transducers
Lectura 3.5 word normalizationintwitter finitestate_transducersLectura 3.5 word normalizationintwitter finitestate_transducers
Lectura 3.5 word normalizationintwitter finitestate_transducersMatias Menendez
 
Building of Database for English-Azerbaijani Machine Translation Expert System
Building of Database for English-Azerbaijani Machine Translation Expert SystemBuilding of Database for English-Azerbaijani Machine Translation Expert System
Building of Database for English-Azerbaijani Machine Translation Expert SystemWaqas Tariq
 
Review of Parsing in Modern Greek - A New Approach
Review of Parsing in Modern Greek - A New ApproachReview of Parsing in Modern Greek - A New Approach
Review of Parsing in Modern Greek - A New ApproachCSCJournals
 
IMPROVING RULE-BASED METHOD FOR ARABIC POS TAGGING USING HMM TECHNIQUE
IMPROVING RULE-BASED METHOD FOR ARABIC POS TAGGING USING HMM TECHNIQUEIMPROVING RULE-BASED METHOD FOR ARABIC POS TAGGING USING HMM TECHNIQUE
IMPROVING RULE-BASED METHOD FOR ARABIC POS TAGGING USING HMM TECHNIQUEcsandit
 
MORPHOLOGICAL SEGMENTATION WITH LSTM NEURAL NETWORKS FOR TIGRINYA
MORPHOLOGICAL SEGMENTATION WITH LSTM NEURAL NETWORKS FOR TIGRINYAMORPHOLOGICAL SEGMENTATION WITH LSTM NEURAL NETWORKS FOR TIGRINYA
MORPHOLOGICAL SEGMENTATION WITH LSTM NEURAL NETWORKS FOR TIGRINYAijnlc
 
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityEffective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityIDES Editor
 

Tendances (16)

CBAS: CONTEXT BASED ARABIC STEMMER
CBAS: CONTEXT BASED ARABIC STEMMERCBAS: CONTEXT BASED ARABIC STEMMER
CBAS: CONTEXT BASED ARABIC STEMMER
 
Presentation1
Presentation1Presentation1
Presentation1
 
P-6
P-6P-6
P-6
 
Corpus-based part-of-speech disambiguation of Persian
Corpus-based part-of-speech disambiguation of PersianCorpus-based part-of-speech disambiguation of Persian
Corpus-based part-of-speech disambiguation of Persian
 
Some alternative ways to find m ambiguous binary words corresponding to a par...
Some alternative ways to find m ambiguous binary words corresponding to a par...Some alternative ways to find m ambiguous binary words corresponding to a par...
Some alternative ways to find m ambiguous binary words corresponding to a par...
 
1 pos chunker -6-10
1 pos chunker -6-101 pos chunker -6-10
1 pos chunker -6-10
 
Lecture 2: From Semantics To Semantic-Oriented Applications
Lecture 2: From Semantics To Semantic-Oriented ApplicationsLecture 2: From Semantics To Semantic-Oriented Applications
Lecture 2: From Semantics To Semantic-Oriented Applications
 
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONAN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
 
Hybrid part of-speech tagger for non-vocalized arabic text
Hybrid part of-speech tagger for non-vocalized arabic textHybrid part of-speech tagger for non-vocalized arabic text
Hybrid part of-speech tagger for non-vocalized arabic text
 
Hps a hierarchical persian stemming method
Hps a hierarchical persian stemming methodHps a hierarchical persian stemming method
Hps a hierarchical persian stemming method
 
Lectura 3.5 word normalizationintwitter finitestate_transducers
Lectura 3.5 word normalizationintwitter finitestate_transducersLectura 3.5 word normalizationintwitter finitestate_transducers
Lectura 3.5 word normalizationintwitter finitestate_transducers
 
Building of Database for English-Azerbaijani Machine Translation Expert System
Building of Database for English-Azerbaijani Machine Translation Expert SystemBuilding of Database for English-Azerbaijani Machine Translation Expert System
Building of Database for English-Azerbaijani Machine Translation Expert System
 
Review of Parsing in Modern Greek - A New Approach
Review of Parsing in Modern Greek - A New ApproachReview of Parsing in Modern Greek - A New Approach
Review of Parsing in Modern Greek - A New Approach
 
IMPROVING RULE-BASED METHOD FOR ARABIC POS TAGGING USING HMM TECHNIQUE
IMPROVING RULE-BASED METHOD FOR ARABIC POS TAGGING USING HMM TECHNIQUEIMPROVING RULE-BASED METHOD FOR ARABIC POS TAGGING USING HMM TECHNIQUE
IMPROVING RULE-BASED METHOD FOR ARABIC POS TAGGING USING HMM TECHNIQUE
 
MORPHOLOGICAL SEGMENTATION WITH LSTM NEURAL NETWORKS FOR TIGRINYA
MORPHOLOGICAL SEGMENTATION WITH LSTM NEURAL NETWORKS FOR TIGRINYAMORPHOLOGICAL SEGMENTATION WITH LSTM NEURAL NETWORKS FOR TIGRINYA
MORPHOLOGICAL SEGMENTATION WITH LSTM NEURAL NETWORKS FOR TIGRINYA
 
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityEffective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
 

Similaire à Icsoft 2011 51_cr

gemini model research paper by google is great
gemini model research paper by google is greatgemini model research paper by google is great
gemini model research paper by google is greatAdityaChourasiya9
 
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...Lifeng (Aaron) Han
 
Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERTAbdurrahimDerric
 
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...ijnlc
 
Cross-lingual event-mining using wordnet as a shared knowledge interface
Cross-lingual event-mining using wordnet as a shared knowledge interfaceCross-lingual event-mining using wordnet as a shared knowledge interface
Cross-lingual event-mining using wordnet as a shared knowledge interfacepathsproject
 
Measuring word alignment_quality_for_statistical_machine_translation_tcm17-29663
Measuring word alignment_quality_for_statistical_machine_translation_tcm17-29663Measuring word alignment_quality_for_statistical_machine_translation_tcm17-29663
Measuring word alignment_quality_for_statistical_machine_translation_tcm17-29663Yafi Azhari
 
Developing an architecture for translation engine using ontology
Developing an architecture for translation engine using ontologyDeveloping an architecture for translation engine using ontology
Developing an architecture for translation engine using ontologyAlexander Decker
 
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...Waqas Tariq
 
Ijarcet vol-2-issue-2-323-329
Ijarcet vol-2-issue-2-323-329Ijarcet vol-2-issue-2-323-329
Ijarcet vol-2-issue-2-323-329Editor IJARCET
 
Towards Building Semantic Role Labeler for Indian Languages
Towards Building Semantic Role Labeler for Indian LanguagesTowards Building Semantic Role Labeler for Indian Languages
Towards Building Semantic Role Labeler for Indian LanguagesAlgoscale Technologies Inc.
 
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...ijaia
 
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTKPUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTKijistjournal
 
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTKPUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTKijistjournal
 
Usage of regular expressions in nlp
Usage of regular expressions in nlpUsage of regular expressions in nlp
Usage of regular expressions in nlpeSAT Journals
 
Abstract Symbolic Automata: Mixed syntactic/semantic similarity analysis of e...
Abstract Symbolic Automata: Mixed syntactic/semantic similarity analysis of e...Abstract Symbolic Automata: Mixed syntactic/semantic similarity analysis of e...
Abstract Symbolic Automata: Mixed syntactic/semantic similarity analysis of e...FACE
 

Similaire à Icsoft 2011 51_cr (20)

gemini model research paper by google is great
gemini model research paper by google is greatgemini model research paper by google is great
gemini model research paper by google is great
 
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
 
Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERT
 
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
 
Cross-lingual event-mining using wordnet as a shared knowledge interface
Cross-lingual event-mining using wordnet as a shared knowledge interfaceCross-lingual event-mining using wordnet as a shared knowledge interface
Cross-lingual event-mining using wordnet as a shared knowledge interface
 
Measuring word alignment_quality_for_statistical_machine_translation_tcm17-29663
Measuring word alignment_quality_for_statistical_machine_translation_tcm17-29663Measuring word alignment_quality_for_statistical_machine_translation_tcm17-29663
Measuring word alignment_quality_for_statistical_machine_translation_tcm17-29663
 
Developing an architecture for translation engine using ontology
Developing an architecture for translation engine using ontologyDeveloping an architecture for translation engine using ontology
Developing an architecture for translation engine using ontology
 
AICOL2015_paper_16
AICOL2015_paper_16AICOL2015_paper_16
AICOL2015_paper_16
 
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...
 
Ijarcet vol-2-issue-2-323-329
Ijarcet vol-2-issue-2-323-329Ijarcet vol-2-issue-2-323-329
Ijarcet vol-2-issue-2-323-329
 
Towards Building Semantic Role Labeler for Indian Languages
Towards Building Semantic Role Labeler for Indian LanguagesTowards Building Semantic Role Labeler for Indian Languages
Towards Building Semantic Role Labeler for Indian Languages
 
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
 
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTKPUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
 
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTKPUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
 
ijcai11
ijcai11ijcai11
ijcai11
 
P99 1067
P99 1067P99 1067
P99 1067
 
semeval2016
semeval2016semeval2016
semeval2016
 
Usage of regular expressions in nlp
Usage of regular expressions in nlpUsage of regular expressions in nlp
Usage of regular expressions in nlp
 
Usage of regular expressions in nlp
Usage of regular expressions in nlpUsage of regular expressions in nlp
Usage of regular expressions in nlp
 
Abstract Symbolic Automata: Mixed syntactic/semantic similarity analysis of e...
Abstract Symbolic Automata: Mixed syntactic/semantic similarity analysis of e...Abstract Symbolic Automata: Mixed syntactic/semantic similarity analysis of e...
Abstract Symbolic Automata: Mixed syntactic/semantic similarity analysis of e...
 

Plus de Dmitry Kan

London IR Meetup - Players in Vector Search_ algorithms, software and use cases
London IR Meetup - Players in Vector Search_ algorithms, software and use casesLondon IR Meetup - Players in Vector Search_ algorithms, software and use cases
London IR Meetup - Players in Vector Search_ algorithms, software and use casesDmitry Kan
 
Vector databases and neural search
Vector databases and neural searchVector databases and neural search
Vector databases and neural searchDmitry Kan
 
Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...
Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...
Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...Dmitry Kan
 
IR: Open source state
IR: Open source stateIR: Open source state
IR: Open source stateDmitry Kan
 
SentiScan: система автоматической разметки тональности в social media
SentiScan: система автоматической разметки тональности в social mediaSentiScan: система автоматической разметки тональности в social media
SentiScan: система автоматической разметки тональности в social mediaDmitry Kan
 
Social spam detection by SemanticAnalyzer Group
Social spam detection by SemanticAnalyzer GroupSocial spam detection by SemanticAnalyzer Group
Social spam detection by SemanticAnalyzer GroupDmitry Kan
 
Lucene revolution eu 2013 dublin writeup
Lucene revolution eu 2013 dublin writeupLucene revolution eu 2013 dublin writeup
Lucene revolution eu 2013 dublin writeupDmitry Kan
 
Starget sentiment analyzer for English
Starget sentiment analyzer for EnglishStarget sentiment analyzer for English
Starget sentiment analyzer for EnglishDmitry Kan
 
Linguistic component Tokenizer for the Russian language
Linguistic component Tokenizer for the Russian languageLinguistic component Tokenizer for the Russian language
Linguistic component Tokenizer for the Russian languageDmitry Kan
 
Linguistic component Lemmatizer for the Russian language
Linguistic component Lemmatizer for the Russian languageLinguistic component Lemmatizer for the Russian language
Linguistic component Lemmatizer for the Russian languageDmitry Kan
 
Linguistic component Sentiment Analyzer for the Russian language
Linguistic component Sentiment Analyzer for the Russian languageLinguistic component Sentiment Analyzer for the Russian language
Linguistic component Sentiment Analyzer for the Russian languageDmitry Kan
 
Solr onfitnesse learningfromberlinbuzzwords
Solr onfitnesse learningfromberlinbuzzwordsSolr onfitnesse learningfromberlinbuzzwords
Solr onfitnesse learningfromberlinbuzzwordsDmitry Kan
 
MTEngine: Semantic-level Crowdsourced Machine Translation
MTEngine: Semantic-level Crowdsourced Machine TranslationMTEngine: Semantic-level Crowdsourced Machine Translation
MTEngine: Semantic-level Crowdsourced Machine TranslationDmitry Kan
 
Rule based approach to sentiment analysis at romip’11 slides
Rule based approach to sentiment analysis at romip’11 slidesRule based approach to sentiment analysis at romip’11 slides
Rule based approach to sentiment analysis at romip’11 slidesDmitry Kan
 
Machine translation course program (in English)
Machine translation course program (in English)Machine translation course program (in English)
Machine translation course program (in English)Dmitry Kan
 
Rule based approach to sentiment analysis at ROMIP 2011
Rule based approach to sentiment analysis at ROMIP 2011Rule based approach to sentiment analysis at ROMIP 2011
Rule based approach to sentiment analysis at ROMIP 2011Dmitry Kan
 
Poster: Method for an automatic generation of a semantic-level contextual tra...
Poster: Method for an automatic generation of a semantic-level contextual tra...Poster: Method for an automatic generation of a semantic-level contextual tra...
Poster: Method for an automatic generation of a semantic-level contextual tra...Dmitry Kan
 
Semantic feature machine translation system
Semantic feature machine translation systemSemantic feature machine translation system
Semantic feature machine translation systemDmitry Kan
 
NoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache HadoopNoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache HadoopDmitry Kan
 
Introduction To Machine Translation 1
Introduction To Machine Translation 1Introduction To Machine Translation 1
Introduction To Machine Translation 1Dmitry Kan
 

Plus de Dmitry Kan (20)

London IR Meetup - Players in Vector Search_ algorithms, software and use cases
London IR Meetup - Players in Vector Search_ algorithms, software and use casesLondon IR Meetup - Players in Vector Search_ algorithms, software and use cases
London IR Meetup - Players in Vector Search_ algorithms, software and use cases
 
Vector databases and neural search
Vector databases and neural searchVector databases and neural search
Vector databases and neural search
 
Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...
Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...
Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...
 
IR: Open source state
IR: Open source stateIR: Open source state
IR: Open source state
 
SentiScan: система автоматической разметки тональности в social media
SentiScan: система автоматической разметки тональности в social mediaSentiScan: система автоматической разметки тональности в social media
SentiScan: система автоматической разметки тональности в social media
 
Social spam detection by SemanticAnalyzer Group
Social spam detection by SemanticAnalyzer GroupSocial spam detection by SemanticAnalyzer Group
Social spam detection by SemanticAnalyzer Group
 
Lucene revolution eu 2013 dublin writeup
Lucene revolution eu 2013 dublin writeupLucene revolution eu 2013 dublin writeup
Lucene revolution eu 2013 dublin writeup
 
Starget sentiment analyzer for English
Starget sentiment analyzer for EnglishStarget sentiment analyzer for English
Starget sentiment analyzer for English
 
Linguistic component Tokenizer for the Russian language
Linguistic component Tokenizer for the Russian languageLinguistic component Tokenizer for the Russian language
Linguistic component Tokenizer for the Russian language
 
Linguistic component Lemmatizer for the Russian language
Linguistic component Lemmatizer for the Russian languageLinguistic component Lemmatizer for the Russian language
Linguistic component Lemmatizer for the Russian language
 
Linguistic component Sentiment Analyzer for the Russian language
Linguistic component Sentiment Analyzer for the Russian languageLinguistic component Sentiment Analyzer for the Russian language
Linguistic component Sentiment Analyzer for the Russian language
 
Solr onfitnesse learningfromberlinbuzzwords
Solr onfitnesse learningfromberlinbuzzwordsSolr onfitnesse learningfromberlinbuzzwords
Solr onfitnesse learningfromberlinbuzzwords
 
MTEngine: Semantic-level Crowdsourced Machine Translation
MTEngine: Semantic-level Crowdsourced Machine TranslationMTEngine: Semantic-level Crowdsourced Machine Translation
MTEngine: Semantic-level Crowdsourced Machine Translation
 
Rule based approach to sentiment analysis at romip’11 slides
Rule based approach to sentiment analysis at romip’11 slidesRule based approach to sentiment analysis at romip’11 slides
Rule based approach to sentiment analysis at romip’11 slides
 
Machine translation course program (in English)
Machine translation course program (in English)Machine translation course program (in English)
Machine translation course program (in English)
 
Rule based approach to sentiment analysis at ROMIP 2011
Rule based approach to sentiment analysis at ROMIP 2011Rule based approach to sentiment analysis at ROMIP 2011
Rule based approach to sentiment analysis at ROMIP 2011
 
Poster: Method for an automatic generation of a semantic-level contextual tra...
Poster: Method for an automatic generation of a semantic-level contextual tra...Poster: Method for an automatic generation of a semantic-level contextual tra...
Poster: Method for an automatic generation of a semantic-level contextual tra...
 
Semantic feature machine translation system
Semantic feature machine translation systemSemantic feature machine translation system
Semantic feature machine translation system
 
NoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache HadoopNoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache Hadoop
 
Introduction To Machine Translation 1
Introduction To Machine Translation 1Introduction To Machine Translation 1
Introduction To Machine Translation 1
 

Dernier

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Dernier (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Icsoft 2011 51_cr

  • 1. METHOD FOR AN AUTOMATIC GENERATION OF A SEMANTIC-LEVEL CONTEXTUAL TRANSLATIONAL DICTIONARY Dmitry Kan Department of Technology of Programming, Saint-Petersburg State University, Universitetsky prosp. 35, Peterhof, Russia dmitry.kan@gmail.com Keywords: Translational Dictionary, Semantic Analyzer, Computer Semantics, Machine Translation, Disambiguation. Abstract: In this paper we demonstrate the semantic feature machine translation (MT) system as a combination of two fundamental approaches, where the rule-based side is supported by the functional model of the Russian language and the statistical side utilizes statistical word alignment. The MT system relies on a semantic- level contextual translational dictionary as its key component. We will present the method for an automatic generation of the dictionary where disambiguation is done on a semantic level. 1 INTRODUCTION semantic-level contextual translational dictionary in Section 4. Finally, we present the experimental MT There are two fundamental approaches to Machine system in Section 5. We list the main features of the Translation (MT): rule-based approach and presented MT system in Section 6. statistical approach, which is based on streams of Classic MT triangle (cf. Fig. 1) separates input data. Each of these two approaches has their semantic and syntactic transfers. advantages: the rule-based approach deeply studies and formalizes the linguistic rules of a natural language; statistical approach gives an opportunity to rapidly prototype new algorithms with application to several different natural languages using data streams. Along with it, there are as well disadvantages attributed to each of the two fundamental approaches to MT: rule-based approach commonly lacks automation of language formalization process and is usually bound to one natural language (or very few similar languages); statistical approach generally avoids deep view into the properties of a natural language giving away the Figure 1: MT triangle adopted from Klueva, 2007. task of language formalization to a numerical algorithm. In this paper we would like to present an The functional theory of the Russian language ongoing project of an MT system, that merges rule- Tuzov, 2004 however shows that these two levels based and statistic approaches in its components are interconnected. A morphological surface may where it is possible. point to two or more parts of speech. All of them can Since the main component of the system is be equally considered as candidates on the syntactic translational dictionary, we prepare the grounds of level. However a word’s semantics is required for a its automatic generation in Section 2. The rule-based successful final resolution of a sentence meaning side of the system is supported by the functional (see Section 3 for more detail). model of Russian language in Tuzov, 2004. It is Bennett, 1990 questions the need of a full-blown described in brief in Section 3. We utilize the results semantic analysis for MT and instead suggests of Sections 2 and 3 for automatic creation of a advancing the „semantic feature system‟ to achieve 415
  • 2. ICSOFT 2011 - 6th International Conference on Software and Data Technologies cost effectiveness. In this paper we present the MT стремлении ({2}) удержать ({}) власть system, which has semantics coded on the level of ({5}) , dictionary entries and uses semantic analyzer to ({6}) Первез ({7}) Мушарраф ({8}) отверг ({9 10}) represent the meaning of an input sentence. конституционную ({14 15}) The related work Homola, 2009 shows how to систему ({}) build a translational dictionary between Chech and Пакистана ({11 12 13}) и ({16}) English with the statistical word alignment. объявил ({17}) о ({18}) However Homola, 2009 does not provide a method введении ({}) of resolving the word ambiguities. The main чрезвычайного ({19 21}) challenge during generation of a translation положения ({}) . ({22}) dictionary is resolving ambiguity of numerous word The above structure represents a mapping of words sequence pairs. In this work we suggest an approach in Russian sentence into sequences of words in its which allowed us to solve the task by semantic English translation. Table 1 contains the mapping interpretation of each dictionary entry. for the above example. Table 1: Word alignment for English and Russian sentences. 2 TASK OF WORD ALIGNMENT Russian English NULL of To build an MT system which uses statistic отчаянном Desperate to hold modelling of a natural language one needs a parallel стремлении to corpus. Based on the corpus a model of translation is власть power built and it contains phrase translational dictionary. , , In the process of constructing the dictionary, phrases Первез Pervez of the parallel corpus get mapped together through Мушарраф Musharraf maximizing the probability of their co-occurrence in отверг has discarded the parallel corpus. Maximization is conducted over конституционную constitutional framework all possible word sequences of two aligned sentences Пакистана Pakistan ´ s in two languages (cf. Fig. 2). и and объявил declared о a чрезвычайного state emergency . . Figure 2: Example of one word-level alignment of English and French sentences adopted from Och, 1999. 3 COMPUTER SEMANTICS The algorithm of word alignment is described in Al- According to the theory in Tuzov, 2004 any natural Onaizan et al, 1999 and Och, 1999, while the full language is functional in strict mathematical sense. mathematical formulation of statistical MT is given Each sentence can be represented in a form of in Brown, Della Pietra, Della Pietra and Mercer, superposition of its functions-words: 1993. The toolset Moses by Koehn, 2007 allows building a statistical MT system and includes GIZA++ as one of its component. GIZA++ S = F ( f1 ( w11 ,..., w1k ),..., f n ( wn1 ,..., wnl )), (1) implements the above algorithms for word wij ≠ whm , ∀i ≠ h, j ≠ m alignment and outputs the following structure for each pair of sentences in the parallel corpus: Definitional domain and values domain of F,f1,..,fn Desperate to hold onto power , Pervez belong to reality. In (1) word inequalities mean, that Musharraf has we count each word only once. This holds even if discarded Pakistan ' s constitutional there are several repetitions of a word with the same framework and or different semantics in the input sentence. In the declared a state of emergency . process of a sentence analysis, semantic analyzer NULL ({20}) В ({}) implemented by Tuzov, 2004 operates with the word отчаянном ({1 3 4}) senses extracted both from the hierarchical ontology 416
  • 3. METHOD FOR AN AUTOMATIC GENERATION OF A SEMANTIC-LEVEL CONTEXTUAL TRANSLATIONAL DICTIONARY with more than 2000 classes and from the syntactic- English analogue. One important property of the semantic dictionary. Since a dictionary word is dictionary is that its entries are context dependent. generally represented as n-ary function with This is provided by two circumstances: 1) each semantic constraints, its final semantics in the sentence in Russian had its expert translation into sentence depends on the exact words in its context English and 2) each Russian word has been within the sentence. The Tuzov’s syntactic-semantic attributed with a semantic formula that was a result dictionary contains more than 150, 000 semantic of semantic assembling of the corresponding formulas. Russian sentence. Consider one entry of the above extract in more detail: 4 TRANSLATIONAL В Y1>HabU(Y1:,ПРЕД:Z1) DICTIONARY <149>--->Within The semantic formula on the left of ---> sign has For creation of the MT system that is based on the several components: the word “В” (the Russian functional theory in Tuzov, 2004 we need to preposition with a lot of meanings, roughly translate the semantic dictionary onto the target corresponding to the English prepositions in, at, language. For the process automation we have within, into, to, of etc); the basis function HabU(x,y) chosen the method described in the Section 2. For with its arguments (the function defines that x the parallel corpus we have used list of parallel possesses y), which in the case of HabU are Y1 and sentences in Russian and English from the package Z1 (prepositional case); sign followed by the order UMC Klyueva and Bojar, 2008. GIZA++ has number of the semantic alternative in the semantic- generated 1,3 million of phrase pairs, including syntactic dictionary. duplicates. Applying the semantic analyzer on each Another example: Russian sentence we have obtained the semantic Y1>Loc(Y1:,ВНУТРИ$12/313/05(ПРЕД:Z1)) alternatives of each of the words in the built pairs, <146>--->at which correspond to their local context. As a result, each word in the original translational phrase The second argument of Loc(x,y) basis function dictionary was substituted with its semantic formula. (defines, that x is located in / at y) is itself a This solves the disambiguation on semantic level. function-preposition that takes one argument in After removing the duplicates from the modified prepositional case. The second argument has the dictionary, the final version of semantic-level name “ВНУТРИ” which is appended with contextual translational dictionary has been built ontological class number $12/313/05. In this class with about 18, 000 word pairs. The dictionary is number, 12 refers to physical objects, 313 refers to subject to further clean up procedures and inhabited locality, 05 refers to physical position of enrichment. Here is an extract from the final an object in prepositional case which is expressed as dictionary: adverb in Russian (“ВНУТРИ” is both “where?” В Y1>HabU(Y1:,ПРЕД:Z1) <149>--->Within and “how?”). В Y1>Loc(Y1:,ВНУТРИ$12/313/05(ПРЕД:Z1)) <146>--->at В Y1>Loc(Y1:,Oper01(#,ПРЕД:Z1)) <208>--- >In 5 EXPERIMENTAL MT SYSTEM В Y1>Loc(Y1:,ПРЕД:Z1) <224>--->Throughout ... НА Y1>Direkt(Y1:,ВЕРХ$12/141/05(ВИН:Z1)) The semantic-level translational dictionary obtained <67>--->at in the Section 4 forms the ground for the НА Y1>Direkt(Y1:,РОД:Z1) <100>--->on experimental Russian to English MT system. In НА Y1>Direkt(Y1:,РОД:Z1) <69>--->for order to achieve fluency on the target language side ... ОБРАЗ (РОД:Z1) <2>--->a way and to reduce the noise in the automatically ОБЩЕМИРОВОЙ generated semantic-level translation dictionary, we A1>Rel(A1:НЕЧТО$1,ПОЛНЫЙ$12/207/05(МИР$1227) have devised the Semantic Machine Translation ) <1>--->global Model (SMTM) for translating sentence P onto the ... target language L2: Each dictionary entry contains semantic formula corresponding to the original Russian word and its 417
  • 4. ICSOFT 2011 - 6th International Conference on Software and Data Technologies SMTM P = applying the method described in Section 4. , (2) arg max Ω (t ,..., t ) = arg max ∑ δ i (tk , tl ) S s i 1 m We have presented the MT system that can be i =1,n k =1,m −1 l =2 ,m i categorized as a semantic feature system according to Bennett, 1990, which has complex semantic where: analysis system under the hood. 1, t k tl ∈ L M δ i (t , t ) = { S 2 (3) k l 0, t k tl ∉ L M REFERENCES 2 Index S in (2), (3) suggests that the definitional Al-Onaizan Y., Curin J., Jahr M., Knight K., Lafferty J., S S Melamed D., Och F. J., Purdy J., Smith N. A., domain of functions δ and Ω is the set of Yarowsky D. 1999. Statistical Machine Translation. i i Final report. Statistical Machine Translation, JHU semantic formulas coded with symbols tl ∈ L2 in Workshop. the translational dictionary. L2M in (2) is the Bennett W. S. 1990. How Much Semantics is Necessary statistical model of L2. The translation algorithm for MT systems? Proceedings of the third International starts with translation of an input Russian sentence Conference on Theoretical and Methodological Issues semantic representation. It then extracts the in Machine Translation of Natural Languages, Linguistic Research Center, University of Texas, calculated semantic alternatives for each of the Austin, TX:261-270. words and maps them onto their English Brown P. F., Della Pietra V. J., Della Pietra S. A., Mercer translations. Form the final sentence in English R.L. 1993. The Mathematics of Statistical Machine using the model (2). Some of the translations show Translation: Parameter Estimation. Computational the advantage of semantic processing over statistical linguistics, 19(2):263-311. modelling Moses by Koehn, 2007 in that the system Klueva N., Bojar O. 2008. UMC 0.1: Chech-Russian- picks the correct semantic alternatives of polysemic English Multilingual Corpus. Proceedings of the words and their correct translations. The conference “Corpora 2008”, Saint-Petersburg. experimental MT system also translates more words Klueva N. 2007. Semantics in Machine Translation. WDS’07 Proceedings of Contributed Papers, Part than Moses from Russian to English, because it I:141-144. operates with the word lemmas while Moses is Koehn P. et al. 2007. Moses: Open Source Toolkit for sensible to word surfaces. Statistical Machine Translation. Annual meeting of the Association for Computational Linguistics, demonstration session, Prague, Chech Re-public. 6 FEATURES OF THE Och F. J. 1999. An Efficient Method for Determining Bilingual Word Classes. Ninth conf. of the Europ. EXPERIMENTAL MT SYSTEM Chapter of the Association for Computational Linguistics, EACL’99:71-76. The presented MT system is in its early stage of Tuzov V., 2004. Computer Semantics of the Russian incorporating the semantic component into the Language, Saint-Petersburg State University process of MT. We have utilized statistical methods Publishing House. Saint-Petersburg. for automating the construction of a semantic-level translational dictionary and functional theory of natural language to resolve the ambiguity and introduce semantic context. Here is the list of the main features of the MT system: • Dictionary entries contain semantic attributes of the Russian words; • Each entry represents a sample of a context extracted using statistical word alignment and coded with the corresponding semantic formula; • The MT system is automatically extendable through acquiring new parallel corpora and 418