SlideShare a Scribd company logo
1 of 25
Download to read offline
Welcome to the Cloud!
Terminology as a Service
Andrejs Vasiļjevs
Tilde
tekom 2013 / Wiesbaden / 07.11.2013.
Complexity of terminology works
 Term identification in the source text
 Consulting online databases and local files for translation
equivalents
 Creating and maintaining terminology glossaries
 Sharing term glossaries and involving others in their
polishing
 Structuring data in the industry standard formats
 Integrating term glossaries in CAT and other productivity
tools
 Keeping terminology up to date
 etc.
Terminology as a Service

cloud-based platform for acquiring, cleaning up,
sharing, and reusing multilingual terminological data
TaaS User Needs Survey Results:
Importance of terminology work

1.8%
14.8%

43.5%
Very important
Quite important

Less important
Not important

39.9%
TaaS User Needs Survey:
willingness to share
60.5%

39.5%

Yes, provided that…
16.7%

No, because…
8.3%

24.9%

6.0%

4.6%

16.5%

48.6%

7.6%
19.2%

11.4%
14.2%
Joint contribution to the DB
Access control
Legal aspects
External quality control
Little effort
Anonymity
Other

22.0%
Legal restrictions
Poor quality/Lack of time
Own asset
Risk of misunderstanding
TaaS Partners

 Tilde

Latvia (Coordinator)

 TAUS

Netherlands

 Kilgray

Hungary

 Cologne University

of Applied Sciences
 University of Sheffield

Germany
UK
TaaS Mission
 Simplify the process for language workers to prepare,
store and share of task-specific multilingual term glossaries

 Provide instant access to term translation equivalents and
translation candidates for professional translators through
CAT tools
 Domain adaptation of statistical machine translation
systems by dynamic integration with TaaS provided
terminology data
Key services of TaaS
 Automatic extraction of monolingual term
candidates
from user uploaded documents
 Automatic retrieval of translation equivalents
from different public and industry terminology
databases
 Translation candidate acquisition
from multilingual web data
 Facilities for cleaning-up
by users automatically acquired terminological
data;

 Data sharing and integration facilities
through APIs and export tools
Focus areas

Research






Quality
Performance
Scalability
Interoperability

 Term extraction
 Collection of domain specific
multilingual corpora
 Max(FTC)

Development

Usage

 Usability
 Outreach
 Sustainability
TaaS Services
Target Repositories
 TAUS Data
repository of multilingual translation memories
 EuroTermBank
databank of federated multilingual terminology
 IATE
inter-institutional termbank of European Union
 META-SHARE
distributed Pan-European repository of language
resources
Integration
 Support for industry standard
formats
 Integration into CAT and
productivity tools
 API to integrate TaaS services
into various software
applications
Term identification and annotation
HTML Term Annotation
Term entries for terms identified in EuroTermBank are stored in TBX format
in a <script> element that is placed in the HTML5 document.
XLIFF Term Annotation
Identifying and marking terms
New W3C standard for Internationalization
Tag Set ITS 2.0

ITS 2.0 enriched
content

ITS 2.0 enriched
content
Showcase

Web Page

Terminology
Annotation
Web Service API

Plaintext
TaaS Terminology Services

Human users
(e.g., translators,
terminologists)

ITS2.0
term-annotated content
export / visualisation

ITS2.0
term-annotated
content
ITS 2.0
enriched
content

Term-annotated
content
ITS2.0
term-annotated
content

Machine users

CAT Tools MT Systems
CAT tools

MT

https
REST

https
REST

Presentation Layer

included

Public API

included

Web Page UI

External
TDBs
https
REST

Web
Browsers
http/https
html

TaaS Architecture

Application Logic Layer
Terminology
collection
management

User
management

Data Storage Layer
(Shared Term Repository)

Terminology
collection
search

Terminology
collection
creation

Term extraction workflows
Full collection
creation
workflow

Monolingual
collection
creation

High-performance
Computing (HPC) Cluster

File Store

HPC frontend

SGE

Translation
candidate
extraction

Modules
Term extraction
TXT extractor
TWSC
Kilgray Term
Extractor
Term normalizer

CPU

CPU

Collection creator

CPU

CPU

Statistical DB
acquisition

CPU
Statistical
DB

CPU

CPU

Shared Term
Repository
DB

Text
tagging
with terms

CPU

CPU

CPU

CPU

CPU

Parameter retriever
Bilingual Term
Extraction System
Statistical DB feeding

....

Translation
lookup
ETB & STR
IATE
TAUS API
Statistical DB
Collection merger

Result processing
Collection Importer
Marked Text
enrichment
koks timber

How to instruct SMT
to use the right terms?
Put TaaS in the service for MT
s
do-it-yourself
MT factory
on the cloud
Boost in the quality of
machine translation
Narrow Domain Automotive MT
English – Latvian

DATA
2 M unique parallel sentences
1.9 M monolingual sentences
0.2 M in-domain monolingual

QUALITY
16% improvement from
terminology integration
Come & Try
demo.taas-project.eu
Thank you!
andrejs@tilde.com

The research within the project TaaS leading to these results has received funding from the European
Union Seventh Framework Programme (FP7/2007-2013), Grant Agreement no 296312

More Related Content

Similar to Welcome to the Cloud! Terminology as a Service, CHAT2013

Changing patterns and variables of obligations of Libraries
Changing patterns and variables of obligations of LibrariesChanging patterns and variables of obligations of Libraries
Changing patterns and variables of obligations of Libraries
Munesh Kumar
 
ECM And Enterprise Metadata in SharePoint 2010
ECM And Enterprise Metadata in SharePoint 2010ECM And Enterprise Metadata in SharePoint 2010
ECM And Enterprise Metadata in SharePoint 2010
Phuong Nguyen
 

Similar to Welcome to the Cloud! Terminology as a Service, CHAT2013 (20)

TaaS Workshop 2014, Terminology as a Service, Indra Samite, Tilde
TaaS Workshop 2014, Terminology as a Service, Indra Samite, TildeTaaS Workshop 2014, Terminology as a Service, Indra Samite, Tilde
TaaS Workshop 2014, Terminology as a Service, Indra Samite, Tilde
 
Common industry API for translation services presented by TAUS at FEISGILTT
Common industry API for translation services presented by TAUS at FEISGILTTCommon industry API for translation services presented by TAUS at FEISGILTT
Common industry API for translation services presented by TAUS at FEISGILTT
 
TAUS webinar The Big Picture View On The Translation Industry, March 2013
TAUS webinar The Big Picture View On The Translation Industry, March 2013TAUS webinar The Big Picture View On The Translation Industry, March 2013
TAUS webinar The Big Picture View On The Translation Industry, March 2013
 
Aos ciard-china
Aos ciard-chinaAos ciard-china
Aos ciard-china
 
WEBINAR: TAUS Outlook 2013
WEBINAR: TAUS Outlook 2013WEBINAR: TAUS Outlook 2013
WEBINAR: TAUS Outlook 2013
 
Semantic interoperability courses training module 2 - core vocabularies v0.11
Semantic interoperability courses   training module 2 - core vocabularies v0.11Semantic interoperability courses   training module 2 - core vocabularies v0.11
Semantic interoperability courses training module 2 - core vocabularies v0.11
 
Use and integration of controlled vocabularies (AGROVOC) in DSpace Repositories
Use and integration of controlled vocabularies (AGROVOC) in DSpace RepositoriesUse and integration of controlled vocabularies (AGROVOC) in DSpace Repositories
Use and integration of controlled vocabularies (AGROVOC) in DSpace Repositories
 
TAUS Knowledge Base: Communicating Translation Automation
TAUS Knowledge Base: Communicating Translation AutomationTAUS Knowledge Base: Communicating Translation Automation
TAUS Knowledge Base: Communicating Translation Automation
 
OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...
OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...
OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...
 
LavaCon 2017 - Authored by Man and Machine: Interactive Documents?
LavaCon 2017 - Authored by Man and Machine: Interactive Documents?LavaCon 2017 - Authored by Man and Machine: Interactive Documents?
LavaCon 2017 - Authored by Man and Machine: Interactive Documents?
 
Changing patterns and variables of obligations of Libraries
Changing patterns and variables of obligations of LibrariesChanging patterns and variables of obligations of Libraries
Changing patterns and variables of obligations of Libraries
 
Knowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents EnvironmentKnowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents Environment
 
Cyflwyniad Bloc
Cyflwyniad BlocCyflwyniad Bloc
Cyflwyniad Bloc
 
Semantic Web: Technolgies and Applications for Real-World
Semantic Web: Technolgies and Applications for Real-WorldSemantic Web: Technolgies and Applications for Real-World
Semantic Web: Technolgies and Applications for Real-World
 
Tatiana Gornostay: Language Meets Knowledge in Digital Content Management
Tatiana Gornostay: Language Meets Knowledge in Digital Content ManagementTatiana Gornostay: Language Meets Knowledge in Digital Content Management
Tatiana Gornostay: Language Meets Knowledge in Digital Content Management
 
ECM And Enterprise Metadata in SharePoint 2010
ECM And Enterprise Metadata in SharePoint 2010ECM And Enterprise Metadata in SharePoint 2010
ECM And Enterprise Metadata in SharePoint 2010
 
Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Pro...
Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Pro...Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Pro...
Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Pro...
 
Help File Proposal
Help File ProposalHelp File Proposal
Help File Proposal
 
Archiving as a Service - A Model for the Provision of Shared Archiving Servic...
Archiving as a Service - A Model for the Provision of Shared Archiving Servic...Archiving as a Service - A Model for the Provision of Shared Archiving Servic...
Archiving as a Service - A Model for the Provision of Shared Archiving Servic...
 
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
 

More from TAUS - The Language Data Network

More from TAUS - The Language Data Network (20)

TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
 
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
 
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
 
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
 
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
 
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
 
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
 
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
 Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann... Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
 
A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...
 
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
 
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
 
Farmer Lv (TrueTran)
Farmer Lv (TrueTran)Farmer Lv (TrueTran)
Farmer Lv (TrueTran)
 
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
 
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 The Theory and Practice of Computer Aided Translation Training System, Liu Q... The Theory and Practice of Computer Aided Translation Training System, Liu Q...
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 
Translation Technology Showcase in Shenzhen
Translation Technology Showcase in ShenzhenTranslation Technology Showcase in Shenzhen
Translation Technology Showcase in Shenzhen
 
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
 
SDL Trados Studio 2017, Jocelyn He (SDL)
SDL Trados Studio 2017, Jocelyn He (SDL)SDL Trados Studio 2017, Jocelyn He (SDL)
SDL Trados Studio 2017, Jocelyn He (SDL)
 
How we train post-editors - Yongpeng Wei (Lingosail)
How we train post-editors - Yongpeng Wei (Lingosail)How we train post-editors - Yongpeng Wei (Lingosail)
How we train post-editors - Yongpeng Wei (Lingosail)
 
A use-case for getting MT into your company, Kerstin Berns (berns language c...
 A use-case for getting MT into your company, Kerstin Berns (berns language c... A use-case for getting MT into your company, Kerstin Berns (berns language c...
A use-case for getting MT into your company, Kerstin Berns (berns language c...
 
QE integrated in XTM, by Bob Willans (XTM)
QE integrated in XTM, by Bob Willans (XTM)QE integrated in XTM, by Bob Willans (XTM)
QE integrated in XTM, by Bob Willans (XTM)
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 

Welcome to the Cloud! Terminology as a Service, CHAT2013

  • 1. Welcome to the Cloud! Terminology as a Service Andrejs Vasiļjevs Tilde tekom 2013 / Wiesbaden / 07.11.2013.
  • 2. Complexity of terminology works  Term identification in the source text  Consulting online databases and local files for translation equivalents  Creating and maintaining terminology glossaries  Sharing term glossaries and involving others in their polishing  Structuring data in the industry standard formats  Integrating term glossaries in CAT and other productivity tools  Keeping terminology up to date  etc.
  • 3. Terminology as a Service cloud-based platform for acquiring, cleaning up, sharing, and reusing multilingual terminological data
  • 4. TaaS User Needs Survey Results: Importance of terminology work 1.8% 14.8% 43.5% Very important Quite important Less important Not important 39.9%
  • 5. TaaS User Needs Survey: willingness to share 60.5% 39.5% Yes, provided that… 16.7% No, because… 8.3% 24.9% 6.0% 4.6% 16.5% 48.6% 7.6% 19.2% 11.4% 14.2% Joint contribution to the DB Access control Legal aspects External quality control Little effort Anonymity Other 22.0% Legal restrictions Poor quality/Lack of time Own asset Risk of misunderstanding
  • 6. TaaS Partners  Tilde Latvia (Coordinator)  TAUS Netherlands  Kilgray Hungary  Cologne University of Applied Sciences  University of Sheffield Germany UK
  • 7. TaaS Mission  Simplify the process for language workers to prepare, store and share of task-specific multilingual term glossaries  Provide instant access to term translation equivalents and translation candidates for professional translators through CAT tools  Domain adaptation of statistical machine translation systems by dynamic integration with TaaS provided terminology data
  • 8. Key services of TaaS  Automatic extraction of monolingual term candidates from user uploaded documents  Automatic retrieval of translation equivalents from different public and industry terminology databases  Translation candidate acquisition from multilingual web data  Facilities for cleaning-up by users automatically acquired terminological data;  Data sharing and integration facilities through APIs and export tools
  • 9. Focus areas Research     Quality Performance Scalability Interoperability  Term extraction  Collection of domain specific multilingual corpora  Max(FTC) Development Usage  Usability  Outreach  Sustainability
  • 11. Target Repositories  TAUS Data repository of multilingual translation memories  EuroTermBank databank of federated multilingual terminology  IATE inter-institutional termbank of European Union  META-SHARE distributed Pan-European repository of language resources
  • 12. Integration  Support for industry standard formats  Integration into CAT and productivity tools  API to integrate TaaS services into various software applications
  • 14. HTML Term Annotation Term entries for terms identified in EuroTermBank are stored in TBX format in a <script> element that is placed in the HTML5 document.
  • 16. Identifying and marking terms New W3C standard for Internationalization Tag Set ITS 2.0 ITS 2.0 enriched content ITS 2.0 enriched content Showcase Web Page Terminology Annotation Web Service API Plaintext TaaS Terminology Services Human users (e.g., translators, terminologists) ITS2.0 term-annotated content export / visualisation ITS2.0 term-annotated content ITS 2.0 enriched content Term-annotated content ITS2.0 term-annotated content Machine users CAT Tools MT Systems
  • 17.
  • 18. CAT tools MT https REST https REST Presentation Layer included Public API included Web Page UI External TDBs https REST Web Browsers http/https html TaaS Architecture Application Logic Layer Terminology collection management User management Data Storage Layer (Shared Term Repository) Terminology collection search Terminology collection creation Term extraction workflows Full collection creation workflow Monolingual collection creation High-performance Computing (HPC) Cluster File Store HPC frontend SGE Translation candidate extraction Modules Term extraction TXT extractor TWSC Kilgray Term Extractor Term normalizer CPU CPU Collection creator CPU CPU Statistical DB acquisition CPU Statistical DB CPU CPU Shared Term Repository DB Text tagging with terms CPU CPU CPU CPU CPU Parameter retriever Bilingual Term Extraction System Statistical DB feeding .... Translation lookup ETB & STR IATE TAUS API Statistical DB Collection merger Result processing Collection Importer Marked Text enrichment
  • 19. koks timber How to instruct SMT to use the right terms?
  • 20. Put TaaS in the service for MT
  • 21.
  • 23. Boost in the quality of machine translation Narrow Domain Automotive MT English – Latvian DATA 2 M unique parallel sentences 1.9 M monolingual sentences 0.2 M in-domain monolingual QUALITY 16% improvement from terminology integration
  • 25. Thank you! andrejs@tilde.com The research within the project TaaS leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013), Grant Agreement no 296312