SlideShare une entreprise Scribd logo
1  sur  16
Télécharger pour lire hors ligne
RUSSIAN LEARNER TRANSLATOR CORPUS:
design, research potential and
applications
Andrey Kutuzov
National Research University Higher School of Economics
Maria Kunilovskaya
Tyumen State University
17th International Conference on Text, Speech and Dialogue
Brno, Czech Republic, September 8–12 2014
General description
• inspired by MeLLANGE
• online and downloadable http://rus-ltc.org
• 1.3 mln tokens
• translations from 10 universities
• 11 source text genres (inc. essays, educational,
informational)
• multiple: 263 sources, 1952 translations
• bi-directional:
approx. 200 English ST(≈300K tokens) with their 1300
Russian translations (≈700 thousand tokens), and
over 40 Russian ST and approx. 600 English translations
• 10 types of linguistic and extralinguistic meta data
• Lexical and POS query interface (Freeling-based linguistic
mark-up) RusLTC at TSD-2014 2
Corpus design
1) Txt-archive structured by file-naming conventions
RU_1_23.txt and EN_1_23_9.txt
RU_1_23.head.txt and EN_1_23_9.head.txt
2) TMX file
• pair-wise alignment with LF aligner batch mode
• manual correction (Olifant /Heartsome tmx-editors)
• merging TUVs with identical source segments + adding XML tags to
link segments to head files (a homegrown script)
3) Error-tagged subcorpus
• a collection of 265 annotated translations (for 33 sources);
• stand-off machine readable annotation
• pre-defined error classification
• 6,471 error tags
• online tag-editor based of brat http://brat.nlplab.org/index.html
RusLTC at TSD-2014 3
Query interface
RusLTC at TSD-2014 4
BRAT-based online error tag editor
RusLTC at TSD-2014 5
Application and Research
RusLTC is a general purpose data source for translation studies
and translation education research, inc. study of
1. variation and choice in translation;
2. ’translationese’ and the translator interlanguage;
3. interdependence between the translation characteristics
and various meta data (direction and conditions of
translation, source text genre);
4. translation-related “problem areas” or rich points in source
texts;
5. translation quality and translation quality assessment (TQA)
Direct use
• in the curriculum and materials design
• as a teaching and learning aid.
RusLTC at TSD-2014 6
RusLTC research: gender asymmetry
in translated texts
1) The same gender asymmetry in male and
female translations as in Russian original
(based on lexical variety)
2) Sentence length figures for female
translations contradict similar statistics for
originals
RusLTC at TSD-2014 7
Research based on RusLTC: splitting in
EN-RU translation
1) types of syntactic structures that undergo
splitting in English-Russian translation:
– coordination with “, and”
– non-restrictive relative clauses
2) most frequent mistakes associated with splitting:
– loss or misinterpretation of semantic relations
between propositions,
– issues with anaphora resolution and
– greater communicative value acquired by upgraded
sentences.
RusLTC at TSD-2014 8
Error-tagged part: inter-rater reliability
AIM: to gauge reliability of mark-up results based on
error classification proposed and establish the areas of
disagreement
RusLTC at TSD-2014 9
23
38
112
130
30
114
30
30
112
130
38
93
α=0.734 versus α=0.569
Error statistics analysis to inform translation
didactics
Hypothesis 1: The better one knows L1 the better she
understands the source/the better the transfer skills.
Hypothesis 2: Final year students make less mistakes than 4th
year students
Hypothesis 3: Test translations show better results than routine
translations because students are more motivated to
perform better
Hypothesis 4: The quantitative results of the error annotation
depend on the order of translations in the set (“order
effect”)
RusLTC at TSD-2014 10
Use in the classroom
1) Students have online access to:
• their own error-tagged and commented translations;
• peer translations;
• mistakes statistics which reflects their individual
progress and difficulties.
RusLTC at TSD-2014 11
2) Students’ rating based on the
quality of final translation
RusLTC at TSD-2014 12
Quality parameters used for consecutive ranking to arrive at relative evaluation:
1. number of critical errors,
2. number of content errors and
3. total number of mistakes.
3) Follow students’ individual
progress over the year
(based on the total number of mistakes normalized by the text
size)
RusLTC at TSD-2014 13
4) Think of remedial activities
RusLTC at TSD-2014 14
The top ten mistakes in the sample
1) Theory-based exercises utilizing multiple
concordances
• discussing translation strategies, identifying translation problems
and comparing/evaluating solutions
• developing skills to overcome known transfer issues in English-
Russian translation which are due to interlingual typological
differences
2) Corpus-driven exercises to prevent most
common mistakes
• developing L1 competence through building up corpus-querying
and documentary research skills;
• extending the scope of world knowledge through information
search and developing text analysis and text comprehension
aptitude.
5) Design materials and teaching aids
RusLTC at TSD-2014 15
Summary
1) Russian Learner Translator Corpus is an available and
extensive source of data for translation studies and
translator education research (http://www.rus-ltc.org/);
2) The error-tagged subcorpus (http://dev.rus-
ltc.org/brat/#/rusltc/) is a method to provide students
extensive feedback on their translations
3) and a means of accumulating research data on TQA;
4) RusLTC content is used in designing teaching materials.
Thank you!
RusLTC at TSD-2014 16

Contenu connexe

Tendances

Introduction+to+software+design
Introduction+to+software+designIntroduction+to+software+design
Introduction+to+software+designMunazza-Mah-Jabeen
 
Rianne Nieland's final presentation
Rianne Nieland's final presentationRianne Nieland's final presentation
Rianne Nieland's final presentationVictor de Boer
 
Typology_Course Syllabus_2014_DH_online
Typology_Course Syllabus_2014_DH_onlineTypology_Course Syllabus_2014_DH_online
Typology_Course Syllabus_2014_DH_onlineDorothea Hoffmann
 
Anti-plagiarism tools for our repositories
Anti-plagiarism tools for our repositoriesAnti-plagiarism tools for our repositories
Anti-plagiarism tools for our repositoriesJan Mach
 
What is applied linguistics
What is applied linguisticsWhat is applied linguistics
What is applied linguisticsClaudiapastrana
 
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked DataEnrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked DataHang Dong
 
Understanding Natural Language Queries over Relational Databases
Understanding Natural Language Queries over Relational DatabasesUnderstanding Natural Language Queries over Relational Databases
Understanding Natural Language Queries over Relational DatabasesAshis Kumar Chanda
 
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)Sebastian Ruder
 
Assigned Task- Revised
Assigned Task- RevisedAssigned Task- Revised
Assigned Task- Revisedsyidajaafar
 
Introducing Microblogging at MLA 2008
Introducing Microblogging at MLA 2008Introducing Microblogging at MLA 2008
Introducing Microblogging at MLA 2008Brian Croxall
 
Open Government Data Tutorial at CLEI 2013. Part 5 Semantic Issues
Open Government Data Tutorial at CLEI 2013. Part 5 Semantic IssuesOpen Government Data Tutorial at CLEI 2013. Part 5 Semantic Issues
Open Government Data Tutorial at CLEI 2013. Part 5 Semantic Issuesjpane
 
CMC Teacher Education SIG Presentation; Kurek
CMC Teacher Education SIG Presentation; KurekCMC Teacher Education SIG Presentation; Kurek
CMC Teacher Education SIG Presentation; KurekCmcTchrEdSIG
 
RusProfiling Gender Identification in Russian Texts PAN@FIRE
RusProfiling Gender Identification in Russian Texts PAN@FIRERusProfiling Gender Identification in Russian Texts PAN@FIRE
RusProfiling Gender Identification in Russian Texts PAN@FIREFrancisco Manuel Rangel Pardo
 
Collaborating to motivate second language
Collaborating to motivate second languageCollaborating to motivate second language
Collaborating to motivate second languagefaridnazman
 
Enabling Language Resources to Expose Translations as Linked Data on the Web
Enabling Language Resources to Expose Translations as Linked Data on the WebEnabling Language Resources to Expose Translations as Linked Data on the Web
Enabling Language Resources to Expose Translations as Linked Data on the WebJorge Gracia
 
Melissa Barton Editor Resume (2)
Melissa Barton Editor Resume (2)Melissa Barton Editor Resume (2)
Melissa Barton Editor Resume (2)Melissa Barton
 
GTTS System for the Spoken Web Search Task at MediaEval 2012
GTTS System for the Spoken Web Search Task at MediaEval 2012GTTS System for the Spoken Web Search Task at MediaEval 2012
GTTS System for the Spoken Web Search Task at MediaEval 2012MediaEval2012
 

Tendances (20)

Introduction+to+software+design
Introduction+to+software+designIntroduction+to+software+design
Introduction+to+software+design
 
Rianne Nieland's final presentation
Rianne Nieland's final presentationRianne Nieland's final presentation
Rianne Nieland's final presentation
 
Typology_Course Syllabus_2014_DH_online
Typology_Course Syllabus_2014_DH_onlineTypology_Course Syllabus_2014_DH_online
Typology_Course Syllabus_2014_DH_online
 
Anti-plagiarism tools for our repositories
Anti-plagiarism tools for our repositoriesAnti-plagiarism tools for our repositories
Anti-plagiarism tools for our repositories
 
What is applied linguistics
What is applied linguisticsWhat is applied linguistics
What is applied linguistics
 
CV
CVCV
CV
 
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked DataEnrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
 
Understanding Natural Language Queries over Relational Databases
Understanding Natural Language Queries over Relational DatabasesUnderstanding Natural Language Queries over Relational Databases
Understanding Natural Language Queries over Relational Databases
 
Research 10. how to write research methodology
Research 10. how to write research methodologyResearch 10. how to write research methodology
Research 10. how to write research methodology
 
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
 
Assigned Task- Revised
Assigned Task- RevisedAssigned Task- Revised
Assigned Task- Revised
 
Introducing Microblogging at MLA 2008
Introducing Microblogging at MLA 2008Introducing Microblogging at MLA 2008
Introducing Microblogging at MLA 2008
 
Open Government Data Tutorial at CLEI 2013. Part 5 Semantic Issues
Open Government Data Tutorial at CLEI 2013. Part 5 Semantic IssuesOpen Government Data Tutorial at CLEI 2013. Part 5 Semantic Issues
Open Government Data Tutorial at CLEI 2013. Part 5 Semantic Issues
 
CMC Teacher Education SIG Presentation; Kurek
CMC Teacher Education SIG Presentation; KurekCMC Teacher Education SIG Presentation; Kurek
CMC Teacher Education SIG Presentation; Kurek
 
sw owl
 sw owl sw owl
sw owl
 
RusProfiling Gender Identification in Russian Texts PAN@FIRE
RusProfiling Gender Identification in Russian Texts PAN@FIRERusProfiling Gender Identification in Russian Texts PAN@FIRE
RusProfiling Gender Identification in Russian Texts PAN@FIRE
 
Collaborating to motivate second language
Collaborating to motivate second languageCollaborating to motivate second language
Collaborating to motivate second language
 
Enabling Language Resources to Expose Translations as Linked Data on the Web
Enabling Language Resources to Expose Translations as Linked Data on the WebEnabling Language Resources to Expose Translations as Linked Data on the Web
Enabling Language Resources to Expose Translations as Linked Data on the Web
 
Melissa Barton Editor Resume (2)
Melissa Barton Editor Resume (2)Melissa Barton Editor Resume (2)
Melissa Barton Editor Resume (2)
 
GTTS System for the Spoken Web Search Task at MediaEval 2012
GTTS System for the Spoken Web Search Task at MediaEval 2012GTTS System for the Spoken Web Search Task at MediaEval 2012
GTTS System for the Spoken Web Search Task at MediaEval 2012
 

Similaire à RusLTC at TSD-2014 (Brno)

Corpus study design
Corpus study designCorpus study design
Corpus study designbikashtaly
 
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptxENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptxSyedNadeemAbbas6
 
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdf
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdfApplied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdf
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdfDr.Badriya Al Mamari
 
Academic Literacy The Analysis Of First-Year Ukrainian University Students ...
Academic Literacy  The Analysis Of First-Year Ukrainian University Students  ...Academic Literacy  The Analysis Of First-Year Ukrainian University Students  ...
Academic Literacy The Analysis Of First-Year Ukrainian University Students ...Daphne Smith
 
Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"Pascual Pérez-Paredes
 
“On the path to crystal mountain” a trek and elt in the 21st century
“On the path to crystal mountain” a trek and elt in the 21st century“On the path to crystal mountain” a trek and elt in the 21st century
“On the path to crystal mountain” a trek and elt in the 21st centuryDirectinterNetLocator.Com
 
AINL 2016: Fenogenova, Karpov, Kazorin
AINL 2016: Fenogenova, Karpov, KazorinAINL 2016: Fenogenova, Karpov, Kazorin
AINL 2016: Fenogenova, Karpov, KazorinLidia Pivovarova
 
Researching Multilingually in Higher Education: Opportunities and Challenges
Researching Multilingually in Higher Education:  Opportunities and ChallengesResearching Multilingually in Higher Education:  Opportunities and Challenges
Researching Multilingually in Higher Education: Opportunities and ChallengesRMBorders
 
LEXICOGRAPHY
LEXICOGRAPHY LEXICOGRAPHY
LEXICOGRAPHY mimisy
 
Publish perish as an instruction-end learning opportunity
Publish perish as an instruction-end learning opportunityPublish perish as an instruction-end learning opportunity
Publish perish as an instruction-end learning opportunityLawrie Hunter
 
DirectionsLength ~3-4 typed, double-spaced pages (approx. 750-1.docx
DirectionsLength ~3-4 typed, double-spaced pages (approx. 750-1.docxDirectionsLength ~3-4 typed, double-spaced pages (approx. 750-1.docx
DirectionsLength ~3-4 typed, double-spaced pages (approx. 750-1.docxcuddietheresa
 
Developing Teaching Materials with Authentic Data and Corpus Analysis Tools
Developing Teaching Materials with Authentic Data and Corpus Analysis ToolsDeveloping Teaching Materials with Authentic Data and Corpus Analysis Tools
Developing Teaching Materials with Authentic Data and Corpus Analysis ToolsCALPER
 
Innovative methods for data integration: Linked Data and NLP
Innovative methods for data integration: Linked Data and NLPInnovative methods for data integration: Linked Data and NLP
Innovative methods for data integration: Linked Data and NLPariadnenetwork
 
Approach to leverage Websites to APIs through Semantics
Approach to leverage Websites to APIs through SemanticsApproach to leverage Websites to APIs through Semantics
Approach to leverage Websites to APIs through SemanticsIoannis Stavrakantonakis
 
Comp app lexicography
Comp app lexicographyComp app lexicography
Comp app lexicographysyila239
 
Cross language information retrieval in indian
Cross language information retrieval in indianCross language information retrieval in indian
Cross language information retrieval in indianeSAT Publishing House
 
Cross-domain sentiment analysis of the natural Romanian language
Cross-domain sentiment analysis of the natural Romanian languageCross-domain sentiment analysis of the natural Romanian language
Cross-domain sentiment analysis of the natural Romanian languageICDEcCnferenece
 

Similaire à RusLTC at TSD-2014 (Brno) (20)

Corpus study design
Corpus study designCorpus study design
Corpus study design
 
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptxENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
 
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdf
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdfApplied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdf
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdf
 
Academic Literacy The Analysis Of First-Year Ukrainian University Students ...
Academic Literacy  The Analysis Of First-Year Ukrainian University Students  ...Academic Literacy  The Analysis Of First-Year Ukrainian University Students  ...
Academic Literacy The Analysis Of First-Year Ukrainian University Students ...
 
Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"
 
“On the path to crystal mountain” a trek and elt in the 21st century
“On the path to crystal mountain” a trek and elt in the 21st century“On the path to crystal mountain” a trek and elt in the 21st century
“On the path to crystal mountain” a trek and elt in the 21st century
 
AINL 2016: Fenogenova, Karpov, Kazorin
AINL 2016: Fenogenova, Karpov, KazorinAINL 2016: Fenogenova, Karpov, Kazorin
AINL 2016: Fenogenova, Karpov, Kazorin
 
Lexicography
 Lexicography Lexicography
Lexicography
 
Researching Multilingually in Higher Education: Opportunities and Challenges
Researching Multilingually in Higher Education:  Opportunities and ChallengesResearching Multilingually in Higher Education:  Opportunities and Challenges
Researching Multilingually in Higher Education: Opportunities and Challenges
 
Lexicography
 Lexicography Lexicography
Lexicography
 
LEXICOGRAPHY
LEXICOGRAPHY LEXICOGRAPHY
LEXICOGRAPHY
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Publish perish as an instruction-end learning opportunity
Publish perish as an instruction-end learning opportunityPublish perish as an instruction-end learning opportunity
Publish perish as an instruction-end learning opportunity
 
DirectionsLength ~3-4 typed, double-spaced pages (approx. 750-1.docx
DirectionsLength ~3-4 typed, double-spaced pages (approx. 750-1.docxDirectionsLength ~3-4 typed, double-spaced pages (approx. 750-1.docx
DirectionsLength ~3-4 typed, double-spaced pages (approx. 750-1.docx
 
Developing Teaching Materials with Authentic Data and Corpus Analysis Tools
Developing Teaching Materials with Authentic Data and Corpus Analysis ToolsDeveloping Teaching Materials with Authentic Data and Corpus Analysis Tools
Developing Teaching Materials with Authentic Data and Corpus Analysis Tools
 
Innovative methods for data integration: Linked Data and NLP
Innovative methods for data integration: Linked Data and NLPInnovative methods for data integration: Linked Data and NLP
Innovative methods for data integration: Linked Data and NLP
 
Approach to leverage Websites to APIs through Semantics
Approach to leverage Websites to APIs through SemanticsApproach to leverage Websites to APIs through Semantics
Approach to leverage Websites to APIs through Semantics
 
Comp app lexicography
Comp app lexicographyComp app lexicography
Comp app lexicography
 
Cross language information retrieval in indian
Cross language information retrieval in indianCross language information retrieval in indian
Cross language information retrieval in indian
 
Cross-domain sentiment analysis of the natural Romanian language
Cross-domain sentiment analysis of the natural Romanian languageCross-domain sentiment analysis of the natural Romanian language
Cross-domain sentiment analysis of the natural Romanian language
 

Dernier

NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Amil baba
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxUmeshTimilsina1
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17Celine George
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 

Dernier (20)

NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 

RusLTC at TSD-2014 (Brno)

  • 1. RUSSIAN LEARNER TRANSLATOR CORPUS: design, research potential and applications Andrey Kutuzov National Research University Higher School of Economics Maria Kunilovskaya Tyumen State University 17th International Conference on Text, Speech and Dialogue Brno, Czech Republic, September 8–12 2014
  • 2. General description • inspired by MeLLANGE • online and downloadable http://rus-ltc.org • 1.3 mln tokens • translations from 10 universities • 11 source text genres (inc. essays, educational, informational) • multiple: 263 sources, 1952 translations • bi-directional: approx. 200 English ST(≈300K tokens) with their 1300 Russian translations (≈700 thousand tokens), and over 40 Russian ST and approx. 600 English translations • 10 types of linguistic and extralinguistic meta data • Lexical and POS query interface (Freeling-based linguistic mark-up) RusLTC at TSD-2014 2
  • 3. Corpus design 1) Txt-archive structured by file-naming conventions RU_1_23.txt and EN_1_23_9.txt RU_1_23.head.txt and EN_1_23_9.head.txt 2) TMX file • pair-wise alignment with LF aligner batch mode • manual correction (Olifant /Heartsome tmx-editors) • merging TUVs with identical source segments + adding XML tags to link segments to head files (a homegrown script) 3) Error-tagged subcorpus • a collection of 265 annotated translations (for 33 sources); • stand-off machine readable annotation • pre-defined error classification • 6,471 error tags • online tag-editor based of brat http://brat.nlplab.org/index.html RusLTC at TSD-2014 3
  • 5. BRAT-based online error tag editor RusLTC at TSD-2014 5
  • 6. Application and Research RusLTC is a general purpose data source for translation studies and translation education research, inc. study of 1. variation and choice in translation; 2. ’translationese’ and the translator interlanguage; 3. interdependence between the translation characteristics and various meta data (direction and conditions of translation, source text genre); 4. translation-related “problem areas” or rich points in source texts; 5. translation quality and translation quality assessment (TQA) Direct use • in the curriculum and materials design • as a teaching and learning aid. RusLTC at TSD-2014 6
  • 7. RusLTC research: gender asymmetry in translated texts 1) The same gender asymmetry in male and female translations as in Russian original (based on lexical variety) 2) Sentence length figures for female translations contradict similar statistics for originals RusLTC at TSD-2014 7
  • 8. Research based on RusLTC: splitting in EN-RU translation 1) types of syntactic structures that undergo splitting in English-Russian translation: – coordination with “, and” – non-restrictive relative clauses 2) most frequent mistakes associated with splitting: – loss or misinterpretation of semantic relations between propositions, – issues with anaphora resolution and – greater communicative value acquired by upgraded sentences. RusLTC at TSD-2014 8
  • 9. Error-tagged part: inter-rater reliability AIM: to gauge reliability of mark-up results based on error classification proposed and establish the areas of disagreement RusLTC at TSD-2014 9 23 38 112 130 30 114 30 30 112 130 38 93 α=0.734 versus α=0.569
  • 10. Error statistics analysis to inform translation didactics Hypothesis 1: The better one knows L1 the better she understands the source/the better the transfer skills. Hypothesis 2: Final year students make less mistakes than 4th year students Hypothesis 3: Test translations show better results than routine translations because students are more motivated to perform better Hypothesis 4: The quantitative results of the error annotation depend on the order of translations in the set (“order effect”) RusLTC at TSD-2014 10
  • 11. Use in the classroom 1) Students have online access to: • their own error-tagged and commented translations; • peer translations; • mistakes statistics which reflects their individual progress and difficulties. RusLTC at TSD-2014 11
  • 12. 2) Students’ rating based on the quality of final translation RusLTC at TSD-2014 12 Quality parameters used for consecutive ranking to arrive at relative evaluation: 1. number of critical errors, 2. number of content errors and 3. total number of mistakes.
  • 13. 3) Follow students’ individual progress over the year (based on the total number of mistakes normalized by the text size) RusLTC at TSD-2014 13
  • 14. 4) Think of remedial activities RusLTC at TSD-2014 14 The top ten mistakes in the sample
  • 15. 1) Theory-based exercises utilizing multiple concordances • discussing translation strategies, identifying translation problems and comparing/evaluating solutions • developing skills to overcome known transfer issues in English- Russian translation which are due to interlingual typological differences 2) Corpus-driven exercises to prevent most common mistakes • developing L1 competence through building up corpus-querying and documentary research skills; • extending the scope of world knowledge through information search and developing text analysis and text comprehension aptitude. 5) Design materials and teaching aids RusLTC at TSD-2014 15
  • 16. Summary 1) Russian Learner Translator Corpus is an available and extensive source of data for translation studies and translator education research (http://www.rus-ltc.org/); 2) The error-tagged subcorpus (http://dev.rus- ltc.org/brat/#/rusltc/) is a method to provide students extensive feedback on their translations 3) and a means of accumulating research data on TQA; 4) RusLTC content is used in designing teaching materials. Thank you! RusLTC at TSD-2014 16