SlideShare une entreprise Scribd logo
1  sur  14
Télécharger pour lire hors ligne
Pre and post editing
environment for Apertium




                                   Lluís Villarejo
                           Learning Technologies
                                     March 2012
c



                 What is GSoC?
• It's a global program that offers student developers stipends
  to write code for various open source software projects.
• Since 2005

• Inspire young developers to participate in OSS projects.
• Give students more exposure to real-world soft dev
  scenarios.
• Get more open source code created and released.
• Help open source prjs identify and bring in new developers.
c



             Some participants
•   Apache Soft. Found.   •   Sakai Foundation
•   Debian                •   Mozilla
•   Facebook              •   Inclusive Design Inst.
•   Drupal                •   The Linux Foundation
•   Creative Commons      •   The GNU project
•   DocBook project       •   Wikimedia Foundation
•   GCC                   •   WordPress
•   Gnome                 •   Inclusive Design Inst.
•   ...                   •   ...
c



                How does it work?
•   Orgs present themselves as mentoring agents.
•   Orgs present a list of potential projects and mentors.
•   Accepted orgs should try to attract students' interest.
•   Students build project proposals.
•   Google finances slots for each org (5.000 + 500 USD).
•   The project community decides the student-slot assignation.
•   Between end of May and end of August.
c



               GsoC'11 statistics
• $7.2M budget

• 1115 students accepted from 68 countries

• 2096 mentors and co-mentors from 55 countries

• 175 Open Source organizations

• 18.1% of students have participated in previous years

• 97 countries with student applicants

• 88% overall success rate
c



Accepted Students GSoC'11
c



Why participating with Apertium?
• Strategically:
   – Apertium is a strategic agent inside UOC.
   – Developing Apertium means further developing
     internationalization aids for UOC.
   – Attract and onboard new developers for Apertium.
   – Collaboration with Google's Open Source initiatives.

• Functionally:
   – Opporutnity to further develop specific UOC needs with
     external funding.
   – Capitalize specific user feedback on translation quality.
c



              The Apertium case
• 20 proposed tasks
• 17 tasks got interest from students [1-9]
   – Pre and post-editing environment gets 11 students
     interested.

• Apertium community ranks the 17 tasks
   – Pre and post-editing environment ranks 4th

• Google assigns 9 slots to Apertium (49.500 USD)
  – Our task goes through and Camille Mougey is selected
    from the Grenoble Insitute of Technology.
c



      Pre and post-editing, why?
• An important part of the errors you get when translating a
  document are due to deficiencies in the original.
• The integration of existing resources can help to ease this
  burden:
   – Digital knowledge sources (digital dictionaries... )
   – Automatic tools (spell-checker, grammar checker, translation
     memory generation, search & replace...)
• These processes should be integrated naturally in the
  translation workflow → the need for an integrated web interface
  to Apertium.
• To improve the system we need to have access to the human
  post-editing process.
c



     Pre and post-editing, features
•   Pre and Post-editing web interface integrated with Apertium translation toolbox.
•   Spell checking on source and target languages. Integration with Aspell
•   Grammar checking on source and target languages. Integration with
    LanguageTool
•   Integration with several external dictionaries.
•   Search & replace functionalities on source and target languages.
•   Ability to deal with formatted text.
•   Logging system. All events are logged as they happen, ie at the very moment
    the user inserts or deletes text. This allows for a further data mining process to
    be run on the logs to detect commonly modified structures or vocabulary.
•   Translation memory generation. Integration of Maligna.
•   PDF translation through pdftohtml
•   Image translation. Through tesseract.
                                                                        Final report 2010
                                                                        Final report 2011
c



        Results & learned lessons
• Fully functional environment, goals accomplished.
• Automatic availability of feedback on post-editing human
  behaviour.

•   Jointly defined task (flexible framework provided).
•   Interest in developing great empathy with the student.
•   Motivated and pro-active student.
•   Student engagement.
•   Very frequent feedback.
•   Mentoring team with access to ABSOLUTELY ALL the
    information regarding the project.
c



                   Further work
• Proof of concept accomplished.
• Base platform developed so further work can be easily
  added.
• Integration of other resources (more external dictionaries).
• Extension of currently used resources (addition of
  grammar rules, dictionaries improvement, format range
  extension).
• Logging information mining to get deeper knowledge on
  the human post-editing process.
• Use of this mining process to improve Apertium translation
  engine.
c



                    GsoC 2012




• Logging information mining to get deeper knowledge on
  the human post-editing process.
• Use of this mining process to improve Apertium translation
  engine.
• Post-edition over formatted text.
c




   Thanks
Questions & answers

Contenu connexe

Similaire à Google Summer of Code 2011: UOC & Apertium

HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...
HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...
HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...Bluechip Technologies
 
Software management plans in research software
Software management plans in research softwareSoftware management plans in research software
Software management plans in research softwareShoaib Sufi
 
A community of developers stimulating innovation in uk higher education
A community of developers stimulating innovation in uk higher educationA community of developers stimulating innovation in uk higher education
A community of developers stimulating innovation in uk higher educationDevCSI
 
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising Anna Perricci
 
Google: Summer of Code 2010 (SIP-Communicator)
Google: Summer of Code 2010 (SIP-Communicator)Google: Summer of Code 2010 (SIP-Communicator)
Google: Summer of Code 2010 (SIP-Communicator)Vladimir Vassilev
 
Google Summer of Code
Google Summer of CodeGoogle Summer of Code
Google Summer of Codeguest59ccff
 
Foundation Comparison
Foundation ComparisonFoundation Comparison
Foundation ComparisonJody Garnett
 
Fostering pre-university student participation in OSGeo through the Google Co...
Fostering pre-university student participation in OSGeo through the Google Co...Fostering pre-university student participation in OSGeo through the Google Co...
Fostering pre-university student participation in OSGeo through the Google Co...Jeff McKenna
 
International pbl conf_5b-c_kizaki
International pbl conf_5b-c_kizakiInternational pbl conf_5b-c_kizaki
International pbl conf_5b-c_kizakiSatoru Kizaki
 
Venturing into the cloud
Venturing into the cloudVenturing into the cloud
Venturing into the cloudJeff Piontek
 
CPSeis & GeoCraft
CPSeis & GeoCraftCPSeis & GeoCraft
CPSeis & GeoCraftbillmenger
 
Using technology to learn languages
Using technology to learn languagesUsing technology to learn languages
Using technology to learn languagesDanny Liu
 
Open World Forum - The Agile and Open Source Way
Open World Forum - The Agile and Open Source WayOpen World Forum - The Agile and Open Source Way
Open World Forum - The Agile and Open Source WayAlexis Monville
 
OER Authoring and Delivery Platforms
OER Authoring and Delivery PlatformsOER Authoring and Delivery Platforms
OER Authoring and Delivery PlatformsUna Daly
 
French Scrum User Group @Google - The Agile and Open Source Way
French Scrum User Group @Google - The Agile and Open Source WayFrench Scrum User Group @Google - The Agile and Open Source Way
French Scrum User Group @Google - The Agile and Open Source WayAlexis Monville
 
Software Sustainability in e-Research: Dying for a Change
Software Sustainability in e-Research: Dying for a ChangeSoftware Sustainability in e-Research: Dying for a Change
Software Sustainability in e-Research: Dying for a ChangeNeil Chue Hong
 
Shirley Evans
Shirley EvansShirley Evans
Shirley EvansJisc
 
Foundation Comparison
Foundation ComparisonFoundation Comparison
Foundation ComparisonJody Garnett
 

Similaire à Google Summer of Code 2011: UOC & Apertium (20)

HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...
HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...
HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...
 
Software management plans in research software
Software management plans in research softwareSoftware management plans in research software
Software management plans in research software
 
A community of developers stimulating innovation in uk higher education
A community of developers stimulating innovation in uk higher educationA community of developers stimulating innovation in uk higher education
A community of developers stimulating innovation in uk higher education
 
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
 
Google: Summer of Code 2010 (SIP-Communicator)
Google: Summer of Code 2010 (SIP-Communicator)Google: Summer of Code 2010 (SIP-Communicator)
Google: Summer of Code 2010 (SIP-Communicator)
 
Google Summer of Code
Google Summer of CodeGoogle Summer of Code
Google Summer of Code
 
Foundation Comparison
Foundation ComparisonFoundation Comparison
Foundation Comparison
 
Fostering pre-university student participation in OSGeo through the Google Co...
Fostering pre-university student participation in OSGeo through the Google Co...Fostering pre-university student participation in OSGeo through the Google Co...
Fostering pre-university student participation in OSGeo through the Google Co...
 
International pbl conf_5b-c_kizaki
International pbl conf_5b-c_kizakiInternational pbl conf_5b-c_kizaki
International pbl conf_5b-c_kizaki
 
Induction session
Induction sessionInduction session
Induction session
 
Venturing into the cloud
Venturing into the cloudVenturing into the cloud
Venturing into the cloud
 
CPSeis & GeoCraft
CPSeis & GeoCraftCPSeis & GeoCraft
CPSeis & GeoCraft
 
summer internship
summer internshipsummer internship
summer internship
 
Using technology to learn languages
Using technology to learn languagesUsing technology to learn languages
Using technology to learn languages
 
Open World Forum - The Agile and Open Source Way
Open World Forum - The Agile and Open Source WayOpen World Forum - The Agile and Open Source Way
Open World Forum - The Agile and Open Source Way
 
OER Authoring and Delivery Platforms
OER Authoring and Delivery PlatformsOER Authoring and Delivery Platforms
OER Authoring and Delivery Platforms
 
French Scrum User Group @Google - The Agile and Open Source Way
French Scrum User Group @Google - The Agile and Open Source WayFrench Scrum User Group @Google - The Agile and Open Source Way
French Scrum User Group @Google - The Agile and Open Source Way
 
Software Sustainability in e-Research: Dying for a Change
Software Sustainability in e-Research: Dying for a ChangeSoftware Sustainability in e-Research: Dying for a Change
Software Sustainability in e-Research: Dying for a Change
 
Shirley Evans
Shirley EvansShirley Evans
Shirley Evans
 
Foundation Comparison
Foundation ComparisonFoundation Comparison
Foundation Comparison
 

Plus de Office of Learning Technologies, Universitat Oberta de Catalunya

Plus de Office of Learning Technologies, Universitat Oberta de Catalunya (20)

My uoc mobil
My uoc mobilMy uoc mobil
My uoc mobil
 
How to design a mobile learning environement csedu 2014
How to design a mobile learning environement csedu 2014How to design a mobile learning environement csedu 2014
How to design a mobile learning environement csedu 2014
 
Presentació Jornada Técnica #uoc-sprint
Presentació Jornada Técnica #uoc-sprintPresentació Jornada Técnica #uoc-sprint
Presentació Jornada Técnica #uoc-sprint
 
Introducció a la programació en android (recovered)
Introducció a la programació en android (recovered)Introducció a la programació en android (recovered)
Introducció a la programació en android (recovered)
 
Diseño universal y personalización en entornos virtuales de aprendizaje para...
Diseño universal y personalización en entornos virtuales  de aprendizaje para...Diseño universal y personalización en entornos virtuales  de aprendizaje para...
Diseño universal y personalización en entornos virtuales de aprendizaje para...
 
2.0 features in institutional repositories: The point of view of end-users
2.0 features in institutional repositories: The point of view of end-users2.0 features in institutional repositories: The point of view of end-users
2.0 features in institutional repositories: The point of view of end-users
 
Using the personas method to describe visually impaired students using an onl...
Using the personas method to describe visually impaired students using an onl...Using the personas method to describe visually impaired students using an onl...
Using the personas method to describe visually impaired students using an onl...
 
Estudiantes con discapacidad visual en la uoc y elearning: recomendaciones
Estudiantes con discapacidad visual en la uoc y elearning: recomendacionesEstudiantes con discapacidad visual en la uoc y elearning: recomendaciones
Estudiantes con discapacidad visual en la uoc y elearning: recomendaciones
 
Augmented reality & cultural heritage eiasm 2013
Augmented reality & cultural heritage   eiasm 2013Augmented reality & cultural heritage   eiasm 2013
Augmented reality & cultural heritage eiasm 2013
 
Augmented reality, education & tourism
Augmented reality, education & tourism Augmented reality, education & tourism
Augmented reality, education & tourism
 
E-learning, tourism and augmented reality
E-learning, tourism and augmented realityE-learning, tourism and augmented reality
E-learning, tourism and augmented reality
 
Education and augmented reality: the cultural heritage
Education and augmented reality: the cultural heritageEducation and augmented reality: the cultural heritage
Education and augmented reality: the cultural heritage
 
Augmented reality
Augmented reality   Augmented reality
Augmented reality
 
Exploration in m-learning, two case studies: iPad application and web version...
Exploration in m-learning, two case studies: iPad application and web version...Exploration in m-learning, two case studies: iPad application and web version...
Exploration in m-learning, two case studies: iPad application and web version...
 
Laboratorio de Accesibilidad:
Laboratorio de Accesibilidad:Laboratorio de Accesibilidad:
Laboratorio de Accesibilidad:
 
Iuoc mobile2.0 2011
Iuoc mobile2.0 2011Iuoc mobile2.0 2011
Iuoc mobile2.0 2011
 
iUOC: enhanced mobile learning at UOC_EUNIS 2011
iUOC: enhanced mobile learning at UOC_EUNIS 2011iUOC: enhanced mobile learning at UOC_EUNIS 2011
iUOC: enhanced mobile learning at UOC_EUNIS 2011
 
Mobile learning scenarios from a UCD perspective. Madness session presentatio...
Mobile learning scenarios from a UCD perspective. Madness session presentatio...Mobile learning scenarios from a UCD perspective. Madness session presentatio...
Mobile learning scenarios from a UCD perspective. Madness session presentatio...
 
Gestion de proyectos orientados a dispositivos móviles
Gestion de proyectos orientados a dispositivos móvilesGestion de proyectos orientados a dispositivos móviles
Gestion de proyectos orientados a dispositivos móviles
 
Presentació o2
Presentació o2Presentació o2
Presentació o2
 

Dernier

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 

Dernier (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 

Google Summer of Code 2011: UOC & Apertium

  • 1. Pre and post editing environment for Apertium Lluís Villarejo Learning Technologies March 2012
  • 2. c What is GSoC? • It's a global program that offers student developers stipends to write code for various open source software projects. • Since 2005 • Inspire young developers to participate in OSS projects. • Give students more exposure to real-world soft dev scenarios. • Get more open source code created and released. • Help open source prjs identify and bring in new developers.
  • 3. c Some participants • Apache Soft. Found. • Sakai Foundation • Debian • Mozilla • Facebook • Inclusive Design Inst. • Drupal • The Linux Foundation • Creative Commons • The GNU project • DocBook project • Wikimedia Foundation • GCC • WordPress • Gnome • Inclusive Design Inst. • ... • ...
  • 4. c How does it work? • Orgs present themselves as mentoring agents. • Orgs present a list of potential projects and mentors. • Accepted orgs should try to attract students' interest. • Students build project proposals. • Google finances slots for each org (5.000 + 500 USD). • The project community decides the student-slot assignation. • Between end of May and end of August.
  • 5. c GsoC'11 statistics • $7.2M budget • 1115 students accepted from 68 countries • 2096 mentors and co-mentors from 55 countries • 175 Open Source organizations • 18.1% of students have participated in previous years • 97 countries with student applicants • 88% overall success rate
  • 7. c Why participating with Apertium? • Strategically: – Apertium is a strategic agent inside UOC. – Developing Apertium means further developing internationalization aids for UOC. – Attract and onboard new developers for Apertium. – Collaboration with Google's Open Source initiatives. • Functionally: – Opporutnity to further develop specific UOC needs with external funding. – Capitalize specific user feedback on translation quality.
  • 8. c The Apertium case • 20 proposed tasks • 17 tasks got interest from students [1-9] – Pre and post-editing environment gets 11 students interested. • Apertium community ranks the 17 tasks – Pre and post-editing environment ranks 4th • Google assigns 9 slots to Apertium (49.500 USD) – Our task goes through and Camille Mougey is selected from the Grenoble Insitute of Technology.
  • 9. c Pre and post-editing, why? • An important part of the errors you get when translating a document are due to deficiencies in the original. • The integration of existing resources can help to ease this burden: – Digital knowledge sources (digital dictionaries... ) – Automatic tools (spell-checker, grammar checker, translation memory generation, search & replace...) • These processes should be integrated naturally in the translation workflow → the need for an integrated web interface to Apertium. • To improve the system we need to have access to the human post-editing process.
  • 10. c Pre and post-editing, features • Pre and Post-editing web interface integrated with Apertium translation toolbox. • Spell checking on source and target languages. Integration with Aspell • Grammar checking on source and target languages. Integration with LanguageTool • Integration with several external dictionaries. • Search & replace functionalities on source and target languages. • Ability to deal with formatted text. • Logging system. All events are logged as they happen, ie at the very moment the user inserts or deletes text. This allows for a further data mining process to be run on the logs to detect commonly modified structures or vocabulary. • Translation memory generation. Integration of Maligna. • PDF translation through pdftohtml • Image translation. Through tesseract. Final report 2010 Final report 2011
  • 11. c Results & learned lessons • Fully functional environment, goals accomplished. • Automatic availability of feedback on post-editing human behaviour. • Jointly defined task (flexible framework provided). • Interest in developing great empathy with the student. • Motivated and pro-active student. • Student engagement. • Very frequent feedback. • Mentoring team with access to ABSOLUTELY ALL the information regarding the project.
  • 12. c Further work • Proof of concept accomplished. • Base platform developed so further work can be easily added. • Integration of other resources (more external dictionaries). • Extension of currently used resources (addition of grammar rules, dictionaries improvement, format range extension). • Logging information mining to get deeper knowledge on the human post-editing process. • Use of this mining process to improve Apertium translation engine.
  • 13. c GsoC 2012 • Logging information mining to get deeper knowledge on the human post-editing process. • Use of this mining process to improve Apertium translation engine. • Post-edition over formatted text.
  • 14. c Thanks Questions & answers