The document discusses the potential for machine translation (MT) to help address the language diversity challenges of e-government in Europe. It provides background on the multilingual nature of the EU and goals of e-government, before examining how MT could bridge language barriers and allow governments to serve citizens in their own languages. The talk uses Latvia as a case study and addresses the data challenges in developing high-quality MT for smaller European languages.
MT's Role in Latvia's Multilingual e-Government Services
1. THURSDAY,
22
May
/12:40
–
13:10
Is
MT
Ready
for
e-‐Government?
The
Latvian
Story
Indra
Samite,
Tilde
TAUS
ROUNDTABLE
2014
22
May/
Moscow
(Russia)
2.
Indra
Sāmīte
indra.samite@Olde.com
Is MT ready for e-‐gov?
The Latvian story
3.
• about Tilde
• eGovernment and the
challenge of language
diversity
• european language
landscape
• promise of MT
• MT for eGov in Latvia
• data challenge and
META-‐NET
• MT for Europe
outline
4.
• Language technology developer
• LocalizaBon service provider
• Leadership in smaller languages
• Offices in Riga (Latvia), Tallinn
(Estonia) and Vilnius (Lithuania)
• 130 employees
• Strong R&D team
• 9 PhDs and candidates
• Trusted partner of the EU for
significant research projects
5.
G2C
Government to CiBzens
G2C
Government to Businesses
G2E
Government to Employees
G2G
Government to Governments
C2G
CiBzens to Governments
eGovernment
6.
transparency
customers on-‐line, NOT in line
efficiency
increase parBcipaBon
reach marginalized groups
Goals of
eGovernment
7.
• providing informaBon
regulatory services, public hearing schedules,
issue briefs, no6fica6ons, etc.
• two-‐way communicaBons with the ciBzen, a
business, or another government agency
• dialogue with agencies to post problems,
comments, or requests
• conducBng transacBons
lodging tax returns, applying for services and
grants, etc.
• Enabling ciBzen transiBon from passive
informaBon access to acBve parBcipaBon by:
– informing the ciBzen
– represenBng the ciBzen
– encouraging the ciBzen to vote
– consulBng the ciBzen
– involving the ciBzen
eGov acBviBes
8.
80 European languages
23 official languages
New countries and
languages joining soon
CroaBa, Iceland
MulBlingual
Europe
9.
98%
Luxembourg
95%
Latvia
94%
Netherlands
93%
Malta
92%
Slovenia
Lithuania
91%
Sweden
EU countries
where people
can speak
at
least
one
language
in addiBon
to their mother
tongue
10.
65%
Hungary
62%
Italy
61%
United
Kingdom
Portugal
60%
Ireland
EU countries
where majority of
people
cannot
speak any
foreign
language
11.
What role does
translaBon play
in your everyday life?
is
translaOon
important
?
12.
The importance of languages was
emphasized in the Council ResoluBon
on
linguisBc diversity of 14 February
2002 on acknowledging the part
played by languages in social,
economic and poliBcal integraBon,
parBcularly in an enlarged Europe.
LinguisBc diversity is one of the
operaBng principles of the European
insBtuBons. The Treaty on European
Union enBtles every ciBzen to write to
any of the insBtuBons in one of these
languages and to have an answer in
the same language (ArBcle 21).
EU Policy
13.
Preserving the European cultural and
linguisBc diversity in the united
informaBon and knowledge society
Securing at affordable costs the free flow
of informaBon and thought across
language boundaries in the resulBng
single informaBon space
Providing each language community with
the most advanced technologies for
communicaBon, informaBon and
knowledge management so that
maintaining their mother tongue does not
turn into a disadvantage
Challenge
Credits:
Hans
Uszkoreit
14.
EU
MULTILINGUALITY
IN
PRACTICE:
CASE
STUDY
In
October
2010,
a
Spanish
lawyer
turned
to
the
Ombudsman,
complaining
that
many
public
consultaOons
are
only
published
in
English,
for
example,
consultaOons
concerning
a
new
partnership
to
help
small
and
medium-‐sized
enterprises
and
concerning
the
freedom
of
movement
of
workers.
15.
16.
17.
“The Commission should
ensure that all European
ci6zens are able to
understand its public
consulta6ons,
which should [..]
be
published
in
all
the
official
languages.
Its failure to do so is an
instance of
maladministra5on.”
4 October 2012
The European Ombudsman,
P. Nikiforos Diamandouros
18.
[European] Commission [has] to ensure that every EU ciBzen's right to address the EU insBtuBons in any of
the EU official languages is fully respected and implemented by ensuring that public consultaBons are
available in all EU official languages,[..] and that there is no language-‐based discriminaBon [..]
European Parliament resolu6on 2012/2676(RSP)
19.
Fulfill
the
vision
of
e-‐Government
AND
the
promise
of
language
diversity
The eGovernment
Challenge
20.
MT
machine
translaEon
machine
translaEon
21.
Bridge the language
barrier
Speak your ciBzens’
languages
Promote diversity
Overcome barriers to
communicaBon
Grow business
World peace J
MT for eGov
22.
What MT
serves best
short shelf life
immediacy
large volume
mulBple languages
23.
Where it works
embedded in web
pages
mulBlingual online
services
social media
mulB-‐lingual chat
mobile devices
24.
customizable,
trainable
domain specific
on-‐demand
in the cloud
real –Bme
security
privacy
Specific
requirements
25.
• EU
Official
languages:
23
• EC
procedural
languages:
3
(EN,
FR,
DE)
• DGT:1750
linguists
and
600
support
• Where:
in
Brussels,
Luxembourg
and
in
local
offices
in
Member
States
DGT TranslaBon
26.
The past: ECMT
• Rule-‐based machine translaBon
• Developed between 1975 and 1998
• 28 language pairs available (ten
languages)
• Since 2006 only linguisBc maintenance
work on a
couple of systems
• Suspended in 12/2010
The future: MT@EC
• 05/2010 Commission Task Force
confirmed need
for new MT for the Commission
• 06/2010 AcBon plan approved by
management
• 09/2010 Work started for MT@EC
Machine
TranslaBon
@
European
Commission
27.
• Based on data-‐driven MT
technology
• Making best use of Commission
language resources
• Making best use of internal
linguisBc experBse (1700
translators for 23 languages)
• Open and flexible
• Ensuring technological
independence
• Being built by DG TranslaBon
• Started: summer 2010
• Deploy: summer 2013
MT@EC
28.
Open to the market
Language technology watch (conBnuous)
LinguisBc intervenBons -‐ demonstraBon
projects in 2011
Comparison of baseline engines to market
offerings -‐ 2012
… and to research
Using Moses
A major insBtuBonal user of MT
Involvement in projects (e.g. MulBlingual
web)
Conferences for EU insBtuBons staff (e.g.
EM+ workshop)
Provider of language resources…
MT@EC
Credits:
Spyros
Pilos,
DGT
29.
DATA
30.
The
DGT
MulOlingual
TranslaOon
Memory
of
the
Acquis
Communautaire
hFp://langtech.jrc.it/DGT-‐TM.html
31.
JRC-‐Acquis
The
total
body
of
European
Union
law
applicable
in
the
EU
Member
States
hFp://langtech.jrc.it/JRC-‐Acquis.html
32.
Data
for
SMT
training
33.
34.
META-‐NET Language
Whitepapers
30 European languages
Analyzing language
readiness for the digital
age
21 language under
long-‐term threat due to
inadequate technological
support
www.meta-‐net.eu/
whitepapers
35.
Strategic
Research
Agenda
36.
q Europe-‐wide social and business
networking in naBve language
q Mobile and internet services in naBve
language for e-‐Commerce, educaBon,
travel, entertainment, etc.
q eGovernment reaching all linguisBc
groups and enabling poliBcal
discussion across borders
q Unlimited TV/movie cross-‐language
subBtling/interpretaBon
q Ever present Personal Interpreter
q Translingual Spaces: dedicated
locaBons for ambient interpretaBon
Vision:
ApplicaOons
needed
by
EU
ciOzens
and
businesses
37.
• A ubiquitous online plarorm combining automaBc
translaBon, language checking, post-‐ediBng, as well
as human creaBvity and quality assurance
• for
generic
and
special-‐purpose
services
• free
for
small
volume
use
and
for
high-‐
volume
baseline
quality
• involve
providers
of
computer
supported
HQ
human
translaOon
• business
opportuniOes
for
a
wide
range
of
service
and
technology
providers
• Assured privacy, confidenBality and security
provided by trusted service centers
• Quality upscale models: instant quality upgrades
• Domain and Task specializaBon
Vision:
Services
for
the
EU
Society
and
CiOzens
38.
Ubiquitous translaBon
services for a full range of
quality levels, fast, affordable
Covers writen and spoken
language from formal
language to chats and social
networking
MulB-‐media mulB-‐language
content delivery
q On mobiles, tablets, PCs, etc.
Vision:
Services
for
the
EU
Society
and
CiOzens
39.
Large cooperaBng projects
Sharing infrastructures: resources, evaluaBon
Smaller projects – providing building
blocks
NaBonal languages (resources, technologies)
Component technologies
Combined funding (EC, naBonal, private)
Inclusion of industry and translaBon
professionals in the enBre research and
innovaBon process
Solving legal hurdles on using data for
research
ConnecBon to CEF
Infrastructural support (selected areas)
Resources, evaluaBon suites, organisaBon of
challenges
OrganisaOon
of
Research
and
InnovaOon
40.
Case Study:
LATVIA
The
Latvian
Story
41.
populaBon 2,1 M
1,6 million naBve
Latvian speakers
large Russian speaking
populaBon (36%)
the
situaBon in
Latvia
42.
less than 10m speakers
lack of parallel data
(corpora)
complex language
structure
highly inflected
language components
under developed
terminology
Under-‐resourced
43.
provide e-‐services to all the
populaBon / linguisBc groups
develop technologies for
supporBng Latvian in
informaBon society
facilitate access to the
informaBon of European
Union insBtuBons
integrate in the infrastructure
of EU mulBlingual services
MT
@
eGov.LV
44.
large corpus of parallel
data
large corpus of
monolingual data
MT core system
Infrastructure
language specific tools
such as morphology tools
What is
necessary to
develop
staBsBcal MT
45.
online translaBon
service
translaBon widget for
integraBon in eGov
service sites
standardized API for
universal integratability
Integration
49.
custom
machine
translaBon
as easy
and
affordable
as
a cup
of coffee
50.
• upload
your data
TMX, XLIFF, DOC, PDF, XLZ, TXT
• combine
it with the data on the LetsMT
public repository
• generate
your custom MT
with a few mouse clicks
• run
your MT system
on the LetsMT cloud
• use
it in your CAT tool
with LetsMT plug-‐in
• integrate
through LetsMT API
in your online or desktop app
51.
1,4
billion
parallel sentences
102
languages
129
MT systems trained
*status on 12-‐10-‐2012
52.
custom
terminology
incremental
data
custom
MT
53.
cloud-‐based
terminology
services
for
term
extracEon
and
mulBlingual term
glossary
creaEon
for
human and
machine
translaEon