The document discusses approaches and application scenarios for integrating multilingual knowledge resources and web content within the FREME project. It provides background on the FREME project, which aims to bridge language and data technologies. It discusses challenges like integrating knowledge resources into different content formats and gaps in current solutions. The document proposes using the Natural Language Processing Interchange Format (NIF) to represent annotations and integrate world and terminology knowledge. It provides examples and discusses potential application scenarios like authoring multilingual ebooks and integrating semantic enrichment into translation.
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Sasaki mlkrep-20150710
1. Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 1
Co-funded by the Horizon 2020
Framework Programme of the European Union
Grant Agreement Number 644771
MLKREP, 10 JULY 2015
Felix Sasaki
DFKI / W3C Fellow
APPROACHES AND APPLICATION
SCENARIOS FOR INTEGRATING
MULTILINGUAL KNOWLEDGE
RESOURCES AND WEB CONTENT
www.freme-project.eu
2. Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 2
BACKGROUND: THE FREME PROJECT
3. Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 3
THE FREME PROJECT
• Two year H2020 Innovation action; start February 2015
• Industry partners leading four business cases around
digital content and (linked) data
• Technology development bridging language and data
• Outreach and business modelling demonstrating monetization of the multilingual
data value chain
4. Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 4
CHALLENGE AND OPPORTUNITY: BIG DATA IS GROWING ACROSS
LANGUAGES, SECTORS AND DOMAINS
• BC: Digital publishing
• BC: Translation and localisation
• BC: Agriculture and food domain data
• BC: Web site personalisation
Agriculture
metadata, user
content, news
content, …
WHAT LIES AHEAD FOR SEVERAL INDUSTRIES? SEE THE FREME BUSINESS CASES
EN
ES
JA, ZH, ...
AR
5. Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 5
CURRENT STATE OF SOLUTIONS
Machine
translation,
terminology
annotation, ...
Linked data
creation &
processing
GAPS THAT HINDER BUSINESS:
• Plethora of formats
• Adaptability and platform dependency
• Language coverage
• Usability “The right tool for the right person
in given and new enterprises”:
technology influences job profiles
6. Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 6
FREME TO THE RESCUE: ENRICHING DIGITAL CONTENT
Machine
translation,
terminology
annotation, ...
Linked data
creation &
processing
LT and LD as first class
citizens on the Web
A SET OF INTERFACES* - DESIGN DRIVEN
BY BUSINESS CASES
LT and LD for various
user types: (application)
developer, content
architect, content
author, …
* Graphical interfaces
* Software Interfaces
8. Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 8
EACH SERVICE IN ONE SENTENCE
• e-Translation: “Translate from Dutch to English”
• e-Terminology: “Add terminology annotations”
• e-Entity: “Identify unique entities”
• e-Link: “Add information from (linked open) data sources”
• e-Publishing: “Publish as digital book content”
• e-Internationalisation: “Use standardised metadata for multilingual content
production”
A KEY ASPECT FREME: FREME will allow to combine data and language technologies via
adequate software interfaces (APIs) and graphical user interfaces (GUIs)
9. Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 9
CHALLENGES FOR MULTILINGUAL
KNOWLEDGE RESOURCES AND SOLUTIONS
PROVIDED BY FREME
10. Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 10
CHALLENGE:
INTEGRATION OF KNOWLEDGE RESOURCES INTO CONTENT
• Content comes in a plethora of formats
• There is no standardised way to representation knowledge related information in
widely used content formats
• Keynote from Michael Wetzel: too many competing formats!
◦ SKOS, OWL, TBX, …
• Solution by FREME:
◦ Using NIF to represent natural natural language processing workflows
◦ Enrich with interlinked information
◦ Linking => benefit from the network effect on the Web
11. Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 11
WHAT IS NIF?
• Natural Language Processing Interchange Format
• See http://nlp2rdf.org/
• Linked Data format to store annotations & to organize NLP pipelines
• API specification to create NIF workflows
• Following slides: main roles for NIF
17. Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 17
A POTENTIAL NIF WORKFLOW
Existing
content
Content analytics, e.g.
named entity
recognition
Conversion to
NIF
Deploying knowledge from the
Linguistic Linked Data (LLD) cloud
18. Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 18
A POTENTIAL NIF WORKFLOW
Existing
content
Content analytics, e.g.
named entity
recognition
Conversion to
NIF
Deploying knowledge from the
Linguistic Linked Data (LLD) cloud
Integrating world knowledge and
terminological knowledge
19. Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 19
INTEGRATING WORLD KNOWLEDGE AND
TERMINOLOGICAL KNOWLEDGE
{ "@graph" : [ {
"@id" : "p:char=0,21", …
"isString" : "I have a screwdriver.",
"referenceContext" : "p:char=0,21"
}, …] }
• Step 1: creating NIF
from existing content
20. Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 20
INTEGRATING WORLD KNOWLEDGE AND
TERMINOLOGICAL KNOWLEDGE
{ "@graph" : [ {
"@id" : "p:char=0,21", …
"isString" : "I have a screwdriver.",
"referenceContext" : "p:char=0,21"
}, {
"@id" : "p:char=9,20", …
"taIdentRef" : "http://dbpedia.org/resource/screwdriver" }, …] }
• Step 1: creating NIF
from existing content
• Step 2: adding world
knowledge based on
Dbpedia
21. Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 21
INTEGRATING WORLD KNOWLEDGE AND
TERMINOLOGICAL KNOWLEDGE
{ "@graph" : [ {
"@id" : "p:char=0,21", …
"isString" : "I have a screwdriver.",
"referenceContext" : "p:char=0,21"
}, {
"@id" : "p:char=9,20", …
"taIdentRef" : "http://dbpedia.org/resource/screwdriver" },
"termInfoRef" : "http://tbx2rdf.lider-project.eu/…/query=schraubendreher" },
…] }
• Step 1: creating NIF
from existing content
• Step 2: adding world
knowledge based on
Dbpedia
• Step 3: adding
terminological
knowledge from IATE
22. Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 22
INTEGRATING WORLD KNOWLEDGE AND
TERMINOLOGICAL KNOWLEDGE
{ "@graph" : [ {
"@id" : "p:char=0,21", …
"isString" : "I have a screwdriver.",
"referenceContext" : "p:char=0,21"
}, {
"@id" : "p:char=9,20", …
"taIdentRef" : "http://dbpedia.org/resource/screwdriver" },
"termInfoRef" : "http://tbx2rdf.lider-project.eu/…/query=schraubendreher" },
…] }
• Step 1: creating NIF
from existing content
• Step 2: adding world
knowledge based on
Dbpedia
• Step 3: adding
terminological
knowledge from IATE
• IATE is used as a linked data version, via
http://tbx2rdf.lider-project.eu
• The query to IATE uses the translation suggested from DBpedia
• The network effect: interlinking adds value
24. Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 24
AUTHORING AND PUBLISHING MULTILINGUALLY AND SEMANTICALLY
ENRICHED EBOOKS
• Example: Integration into ePub editing mode of oXygen XML Editor
e-Entity: annotate named entities
25. Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 25
INTEGRATING SEMANTIC ENRICHMENT INTO MULTILINGUAL
CONTENT IN TRANSLATION AND LOCALISATION
• Example: Integration into XLIFF 2.0 editing mode of oXygen XML Editor
• Combination of services
◦ e-Entity: annotate named entities; e-Terminology: fetch terminological information
◦ e-Link: fetch additional information from a linked data source like DBpedia, specific to the
type of entities (places, persons, …)
26. Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 26
INTEGRATING SEMANTIC ENRICHMENT INTO MULTILINGUAL
CONTENT IN TRANSLATION AND LOCALISATION
• Enriching content with machine readable information – represented as JSON-LD
◦ Input: “Welcome to Berlin … Marlene Dietrich!”
◦ Output:
[
{
"@id": "dbpedia:Marlene_Dietrich",
"@type": "person",
"born": "1901-12-27"
}
]
May be basis e.g. for further processing, e.g.
multilingual generation:
• “… born 1901”
• “… geboren 1901”
• “…1901年生まれ”
• …
28. Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 28
DEMO
• Generating translation suggestions
• Knowledge being used
◦ World knowledge: DBedia
◦ Terminological knowledge: IATE
• Storage in ePub based on Internationalization Tag Set (ITS) 2.0
◦ Standardised markup for multilingual content production
◦ Storage of translation suggestions here are ITS “Localization Note”
29. Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 29
WANT TO TRY THINGS OUT?
• Go to http://api.freme-project.eu/doc/0.1/
• Check out API demo calls
• Time line for next prototypes
◦ 0.2: mid July
◦ 0.3: end of August
◦ Feedback to GitHub: https://github.com/freme-project
- Will be made public repro mid July
30. Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 30
CONTACTS
Felix Sasaki, on behalf of the FREME consortium
E-mail: felix.sasaki@dfki.de
CONSORTIUM
Notes de l'éditeur
This slide probably needs no visualization.
BC 1 “Digital publishing”: Digital content itself is exploding and is loosing value
BC 2 “Translation and Localisation”: Demand for speed and quality is increasing, prices are going down
BC 3 “Agriculture and food data”: Discovery of data is difficult due to missing multilingual metadata
BC 4 “Web site personalisation”: solutions are focusing on English speaking market
Robust language technologies
Machine translation, terminology extraction & annotation
Robust linked data (LD) technologies
Entity annotation, linking to data sources
More and more platforms as silos that allow to deploy certain parts of these technology stacks
GAPS that hinder businesses:
No easy to use interfaces to LT and LD tooling
Plethora of formats to process
Adaptability and scalability of solutions
Usability: “Give the adequate tool to the right person”
e-Translation: “Translate from Dutch to English”
e-Terminology: “Add terminology annotations”
e-Entity: “Identify unique entities”
e-Link: “Add information from (linked open) data sources”
e-Publishing: “Publish as digital book content”
e-Internationalisation: “Use standardised metadata for multilingual content production”
A KEY ASPECT FREME: FREME will allow to combine data and language technologies via adequate software interfaces (APIs) and graphical user interfaces (GUIs)
Back Page #1
Social network icons refer to speaker (he/she has to link his/her accounts)