1) Machine translation is playing an increasingly important role in e-discovery, compliance, and analyzing large amounts of data due to the rapid growth of multilingual data.
2) The amount of data in the world is growing exponentially and will reach 45 zettabytes by 2020, making manual translation of all that data impossible.
3) For e-discovery cases, machine translation can help review large volumes of documents more quickly and cost-effectively compared to traditional translation methods, allowing documents to be reviewed in English.
CAFC Chronicles: Costly Tales of Claim Construction Fails
Role of translation software in e-discovery, compliance and big data
1. The growing role of translation software in
e-discovery, compliance, and big data
John Tinsley
CEO, Iconic Translation Machines
eDiscovery Ireland. Dublin. 17th November 2017
2. 80% of all litigation will be
multilingual by 2020
3. Data Explosion
0
5
10
15
20
25
30
35
40
45
2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
Data in zettabytes (ZB)
We are here
Data is growing at a 40% compound annual rate
Nearly 45 zettabytes (ZB) by 2020
That’s 45 billion terabytes
Source: Craig Ball, https://ballinyourcourt.wordpress.com/
4. • What?
– Translation and Language Technology
Software and Solutions provider
• Where?
– Dublin, Ireland. Operating globally.
• Who?
– Information providers, Government
organisations, Enterprises
Introducing: Iconic Translation Machines
www.iconictranslation.com/ediscovery
5. Snapshot of Iconic partners and users
www.iconictranslation.com/ediscovery
6. • History
– started in the late ‘40s w/
US gov funding
• How it works
– learns from previous
examples of translation
• Why it’s hard
– languages are complex!
• Hard languages
– not all languages are equal
(more on this later)
• State of the art
– Neural Machine
Translation
Machine Translation in 60 seconds
60
www.iconictranslation.com/ediscovery
7. The Ensemble ArchitectureTM
Innovative, state-of-art, proprietary architecture
Chinese pre-ordering
rules
Statistical
Post-editing
Input
Output
Training Data
Spanish place
name recognizer
Multi-output
Combination
Korean email
detection
Language
Identification
Client terminology
Japanese writing
normalisation
German
Compounding rules
Statistical
Machine
Translation
Neural MT
File
Format
Handler
Document Parser
More info: http://iconictranslation.com/machine-translation/ensemble_architecture/
8. Incidental
• things that comes up
from time to time, low
volumes
• mix of languages
• snippets of text, single
documents, emails
Translation and e-discovery
When does the need for translation arise?
Projects / Cases
• known quantity based on a
project, e.g. case, M&A, etc.
• one or two languages only
• large volumes of files, all
formats
www.iconictranslation.com/ediscovery
9. Goal: identify relevant/privileged documents
1. Do nothing...
2. Look internally
– can anyone in the firm help with review/translation
3. Look externally
– contract Italian reviewers
4. Translate the documents
– review in English
What are your options?
A case comes into the firm and the documents
contain around 300 emails in Italian
www.iconictranslation.com/ediscovery
10. Options: Pros & Cons
Option Pros Cons
Look internally Quick, effective - Doesn’t scale.
- Availability not guaranteed
- Low coverage
Contract
Reviewers
Effective ++ - Expensive
- Slow
- QA is a challenge
Translation Effective + - Very slow (3-step)
- procure translation
- translate
- review still to do
- Expensive
- Doesn’t scale
www.iconictranslation.com/ediscovery
11. Most Effective. Most Expensive. Necessary?
0
5
10
15
20
25
30
35
Associate
Reviewer
English
Speaking
Attorneys
(Contract)
English
Speaking
Attorneys
(Internal)
Foreign
Speaking
Attorneys
(Contract)
Foreign
Speaking
Attorneys
(Internal)
Human
Translators
Paralegal
Reviewers
Breakdown of document review costs
Source: https://www.iltanet.org/blogs/pete-afrasiabi/2017/01/19/finding-the-needle-in-the-digital-multilingual-haystack
12. Option Pros Cons
Look internally Quick, effective - Doesn’t scale.
- Availability not guaranteed
- Low coverage
Contract
Reviewers
Effective ++ - Expensive
- Slow
- QA is a challenge
Translation Effective + - Very slow (3-step)
- procure translation
- translate
- review still to do
- Expensive
- Doesn’t scale
Options: Pros & Cons
We’re implying “human”
translation here, but there are
more options than that
www.iconictranslation.com/ediscovery
13. First, let’s move the goalposts
A case comes into the firm and the documents
contain around 30,000 emails in Italian
www.iconictranslation.com/ediscovery
14. Translation Options: Pros & Cons
Option Pros Cons
Human - Best quality - Very slow
- Expensive
- Doesn’t scale
Online MT - Quick
- Cheap
- Security
- Quality
- Limitations (integration/files)
Professional MT - Quick
- Cost Effective
- Secure
- Flexible
- Quality?
Professional
human translation
“Free” online
machine translation
Professional
machine translation
www.iconictranslation.com/ediscovery
15. Not an exercise in perfection
An exercise in establishing a
legally defensible position
- Eoghan Kenny, Senior Manager Discovery at A&L Goodbody
16. • Defining performance based on
“usability”
– relevant or not?
Machine Translation misconceptions!
“It’s rubbish”
90%
effectiveness
….it’s not!
• Some fluctuation across languages
French German Turkish Finnish
Spanish Chinese Korean Hungarian
Portuguese Japanese Thai Basque
17. • Using free online tools? That’s not secure!
Machine Translation misconceptions!
“It’s not secure” ….it is! (in most cases)
• Professional machine translation
– private cloud services can rigorously control where data goes,
protected by state-of-the-art security protocols
• Worst case, bring it in-house
18. • Using free online tools? That’s not secure!
Machine Translation misconceptions!
Iconic data centre locations on AWS
• Professional machine translation
– private cloud services can rigorously control where data goes,
protected by state-of-the-art security protocols
• Worst case, bring it in-house
19. Making things more practical
File
Formats
Language
Identification
Workflow
Integration
-MS Office
-PDFs
-EML/MSG
-Audio
-Pre-detect
-Multilingual
threads
-
-Review
platform
connectors
-API
www.iconictranslation.com/ediscovery
Features introduced to cover as many bases as possible
21. Afrikaans
Arabic
Bosnian (Latin)
Bulgarian
Cantonese (Traditional)
Catalan
Chinese Simplified
Chinese Traditional
Croatian
Czech
Danish
Dutch
English
Estonian
Fijian
Supported languages
21
Filipino
Finnish
French
German
Greek
Haitian Creole
Hebrew
Hindi
Hungarian
Indonesian
Italian
Japanese
Kiswahili
Korean
Latvian
Lithuanian
Malagasy
Malay
Maltese
Norwegian
Persian
Polish
Portuguese
Romanian
Russian
Samoan
Serbian (Cyrillic)
Serbian (Latin)
Slovak
Slovenian
Spanish
Swedish
Tahitian
Thai
Tongan
Turkish
Ukrainian
Urdu
Vietnamese
Welsh
All languages listed below are available for translation into and out of each other
www.iconictranslation.com/ediscovery
22. • Costs based on the
amount of data translated
– pages
– GB
– words
• Covers all languages
Translation software – cost models
www.iconictranslation.com/ediscovery
• Standard software
license
– based on the number of
languages required
– incl. support
• Unlimited usage annually
Cloud Model On-premise Model
• Translations as a managed service
• Customization and development
Professional Services
24. Case Study: Machine translation in the loop
Law Firm eDiscovery
FirmLitigation
foreign
60,000 docs
Review Team
MT deployed
on-site
Docs MT’ed
and reviewed
Key docs
human
translated
www.iconictranslation.com/ediscovery
29. Solutions need to be flexible because your variables are many
languages | file types | review platform | case requirements
Machine translation is in the tech stack and here to say
It’s often the only practical option
It’s often the best option!
Globalisation – translation demand is only going to grow
The technology only going to get better
In summary
www.iconictranslation.com/ediscovery
Ireland demogrpahics
Language changing
Multilnationals in Ireland – emplyoee demographics
EU cases – number of languages
The fast that we’re in Ireland and we’re English speaking doesn’t matter anymore…
Borders are fluid when it comes to business and staffing (though it remains to be seen what happens with our next-door neighbours)
https://ballinyourcourt.wordpress.com/
MT solutions and services provider, specializing in providing customised solutions with subject matter expertise for specific techincal sectors, such as Patents/IP, life sciences, and financial.
We are the MT partner of choice for some of the world’s largest translation companies, information providers, and government and enterprise organisations.
For Translation Companies: We help translation companies to translate more content, more accurately for faster project turnaround, resulting in significant cost savings and increased revenue.
For Enterprise Clients: We help enterprises to translate more content in less time, resulting in faster products to market and enhanced global reach.
For Information Providers: We help information providers to translate knowledge, literature and documentary information faster and more accurately, resulting in broader knowledge offerings and faster time to market.
Range of different use cases for machine translation – you’d be surprised how pervasive it is. But for the native english speakers in the audience, you’re probably not exposed as much.
These guys all have something in common – they’re involved in litigation and compliance!
No one is immune
In the course of working with us – questions kept arising about being able to support some case that was on their plate
Could do 60 minutes but I don’t think Adrian would be happy!
Started in the US in the 40’s with focus on Russian, writing rules and dictionaries, grew from there
Now it uses clever approaches to learn from previous examples of translations, same as a human “I’ve seen this before…”
Can’t see everything! Also, no such thing as a correct translations, language is always evolving. The more static – the easier
The closer two languages are, the easier, Spanish-Italian dead easy, Spanish to English, German to English, Korean to English
Now, like most things, artificial intelligence and neural networks (Neural MT) are the hot topic and have legitimately caused a signifncat leap forward
How do you use it? Online MT, Open-source, customised, industry specific... we’ll get into that but there we have 60 seconds.
We don’t just do statistical, rule-based, or hybrid. We have a proprietary technology that combines the best of all approaches.
As Eoghan said we can’t assume this is not a thing in Ireland any more. QUICK POLL: who has never had to deal with foreign language documents?
I’m dying for more information here so if you have other instances, cases, please let me know!
Do nothing (not what you should do, but it’s all you can do…)
See if someone in the company/firm speaks the language
send an email around, as them to help with review/translation
Hire contract reviewers bilingual in English-(Foreign)
review in foreign language (with in-house support?)
Translate the documents
review in English
Your friend isn’t that good, no way to validate quality
contract reviewer = lawyer in the country
From a survey of use firms…
Biggest costs come from having to contract out review in foreign language, rates are higher
We’re going to focus on Translation. Those cons apply to “human translation” but we have more options than that. This is where the technology comes into play…
Let’s also move the goal-posts a little bit, say we’re now dealing with 30,000 emails from the Italian which puts a different complexion on things…
HT = gilding the lily
Did you really need a high quality professional translation to be able to say that a doc was 100% no where near relevant? No! Overkill
Need a pre-filtering step, that’s MT
This is not an exerise in perfection at this stage
Exercise in legally defensible position
MT -> keyword -> identify important -> give both languages -> that’s defensible and saves money
Some common misconceptions
It’s as good as it’s ever been, and only getting better. In fact, we’re in the middle of a paradigm shift
Types of things that we’re doing
We’re aware that MT is a detour so we want to make that detour as seamless as possible
Our relativity integration
The fast that we’re in Ireland and we’re English speaking doesn’t matter anymore…
Borders are fluid when it comes to business and staffing (though it remains to be seen what happens with our next-door neighbours)
These guys all have something in common – they’re involved in litigation and compliance!
No one is immune
In the course of working with us – questions kept arising about being able to support some case that was on their plate
These guys all have something in common – they’re involved in litigation and compliance!
No one is immune
In the course of working with us – questions kept arising about being able to support some case that was on their plate
These guys all have something in common – they’re involved in litigation and compliance!
No one is immune
In the course of working with us – questions kept arising about being able to support some case that was on their plate
In many of those cases, translation is an enabler, it allows companies to grow their business. While the same can be true in Discovery, typically is more of a road block!
We know where we fit!
At this point, I’d like to invite questions so I’ll leave a few minutes to enter them in the chat box here in the webinar software.
In the meantime, we already have a few so let’s start with those
multilingual language indentification
where does