SlideShare une entreprise Scribd logo
1  sur  72
Télécharger pour lire hors ligne
Software Architecture                        04/2008/KW




                    OPENTMS
SOFTWARE ARCHITECTURE




                     Roßtal, 29/08/2008
                    Doc.Nr.: HEA-1.1-2008
                         Version 1.3

Author: Dr. Klemens Waldhör / klemens.waldhoer@heartsome.de
      Location: OpenTMS_Software_Architecure_v1.3.doc




                          www.folt.de
Software Architecture                        04/2008/KW




1 VERSIONING INFORMATION



•        V0.1 – Version 0.1 – April/May/June2008: Start Version; Klemens Wald-
         hör, Heartsome Europe - TOSS_Software_Architecure.doc;

•        V1.0 – Version 1.0 – 05.08.2008: Initial version; Klemens Waldhör, Heart-
         some Europe; based on discussion with Michael Schneider, beodoc,
         04.07.2008 - OpenTMS_Software_Architecure_v1.0.doc

•        V1.1 – Version 1.1 – 30.08.2008: Modifications based on the FOLT inter-
         nal architecture discussion meeting, 29.08.2008, Acolada GmbH, Nürn-
         berg. Participants: Ulrike Baral, beodoc; Torsten Kuprat; Michael Schnei-
         der, beodoc; Klemens Waldhör, Heartsome Europe; Thomas Wedde, eu-
         roscript; OpenTMS_Software_Architecure_v1.1.doc




Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                2/72
Software Architecture                        04/2008/KW




2 PREFACE

This manual gives an overview of the software architecture OpenTMS. It is based
on the requirements defined in the FOLT Open Source Initiative (Folt, 2007b).

The architecture of OpenTMS is mainly based on several models. These models
describe the key components of OpenTMS. Each model handles a specific aspect
of the translation process and its requirements. The models form a framework
which guide the construction of language specific software tools.

The following core models are identified:

    •   Security model

    •   Document model

    •   Process model

    •   User model

    •   Data model

    •   GUI model

    •   Interface model

On top of those models the application model organises real applications (like the
GUI model).

OpenTMS uses a data source in the data model which organises the access to
database or any kind device which allows to store (TM or terminology) data.

The architecture also contains a description of some basic functions
which can form the basic core of translation tools. The architecture is
defined in such a way that is can be easily extended with new functions
or combining existing functions to new functionality.




Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                   3/72
Software Architecture                                                      04/2008/KW




CONTENTS
1    VERSIONING INFORMATION .........................................................................2

2    PREFACE.........................................................................................................3

3    LIST OF TABLES AND FIGURES ...................................................................7

4    DEFINITIONS ...................................................................................................8

5    INTRODUCTION ............................................................................................12
 5.1     Arguments for an OpenTMS Software Architcture......................................12
 5.2     Basics .........................................................................................................12
    5.2.1 Naming conventions........................................................................................ 12
    5.2.2 Naming of OpenTMS specific functions/methods ............................................ 13
 5.3     Character set ..............................................................................................13
 5.4     Standards ...................................................................................................13
 5.5     Basic Requirements ...................................................................................14
 5.6     Architecture ................................................................................................14

6    OPENTMS ARCHITECTURE AND MODELS................................................16
 6.1     Parameters in OpenTMS models ...............................................................16
 6.2     Core Models of OpenTMS ..........................................................................18
 6.3     OpenTMS Core Library...............................................................................20
 6.4     The Application Model ................................................................................20
 6.5     Implementation Languages ........................................................................21

7    SECURITY MODEL........................................................................................22
 7.1     Security, OpenTMS and Programming Languages ....................................23
 7.2     Communication Level .................................................................................24
 7.3     Document Level..........................................................................................24
 7.4     Database Level...........................................................................................25
 7.5     Security Level .............................................................................................25

8    BASIC OPENTMS COMPONENTS ...............................................................27

9    DOCUMENT MODEL .....................................................................................30
 9.1 Documents ...............................................................................................30



Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                                                       4/72
Software Architecture                                                         04/2008/KW



 9.2 Character Sets.........................................................................................31
 9.3 XML document handling ........................................................................31
 9.4 XLIFF Documents ....................................................................................31
   9.4.1 OpenTMS and Skeleton files ........................................................................... 32
   9.4.2 Security and encryption in XLIFF – secureXLIFF............................................. 33
 9.5 TMX Documents ......................................................................................33
   9.5.1 Security and encryption in TMX – secureTMX................................................. 34
 9.6 TBX Documents .......................................................................................34
   9.6.1 Security and encryption in TBX – secure TBX ................................................. 34
 9.7 Other Documents ....................................................................................35
 9.8     Basic Document Access Functionality ........................................................35

10 OPENTMS AS A CLIENT/SERVER ARCHITECTURE..................................37

11 DATA MODEL................................................................................................41
 11.1 Data sources ..............................................................................................41
 11.2 TM Matches................................................................................................43
 11.3 Basic data source access functionality .......................................................44
 11.4 Databases ..................................................................................................47
   11.4.1 Open source SQL data bases ......................................................................... 47
   11.4.2 Closed source SQL databases ........................................................................ 47
   11.4.3 Alternatives ..................................................................................................... 47
   11.4.4 Database Access ............................................................................................ 49
   11.4.5 Database and data source configuration ......................................................... 49

12 TRANSLATION OBJECTS ............................................................................51
 12.1 Format information .....................................................................................52
 12.2 Terminology versus Translation Memory....................................................52
 12.3 Variables , placeholders, replacement classes...........................................53

13 PROCESS MODEL ........................................................................................56
 13.1 OpenTMS Process .....................................................................................56
 13.2 OpenTMS Scripting Language ...................................................................56
 13.3 OpenTMSL Communication Methods.........................................................58

14 USER MODEL................................................................................................59
 14.1 User roles ...................................................................................................59



Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                                                           5/72
Software Architecture                                                 04/2008/KW



 14.2 Basic user functionality ...............................................................................60

15 GUI MODEL ...................................................................................................61

16 INTERFACE MODEL .....................................................................................62

17 CONFIGURING OPENTMS............................................................................63
 17.1 Naming of the configuration file ..................................................................64
 17.2 Structure of the configuration file ................................................................64
 17.3 Configuration Options .................................................................................65

18 DMS INTERFACE ..........................................................................................66

19 BIBLIOGRAPHY ............................................................................................68

20 APPENDIX .....................................................................................................69
 20.1 Multiple translations for a linguistic concept................................................69




Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                                                6/72
Software Architecture                 04/2008/KW




3        LIST OF TABLES AND FIGURES
Fig 1: OpenTMSName defined as a regular expression                     12
Fig 2: Naming of OpenTMS functions for export                          13
Fig 3: OpenTMS Procedure description                                   15
Fig 4: OpenTMS Models                                                  18
Fig 5: Example securing XLIFF document exchange                        23
Fig 6: OpenTMS Objects                                                 28
Fig 7: XLIFF File                                                      32
Fig 8: Some basic XLIFF File functions                                 36
Fig 9: Hierarchy of processes                                          38
Fig 10: Applications                                                   38
Fig 11: Pipeline Architecture                                          40
Fig 12: Data sources and data components                               41
Fig 13: Data sources with several data components                      42
Fig 14: Data source access types                                       45
Fig 15: Data source access types                                       46
Fig 16:Configuring different database types                            49
Fig 17: Representation of linguistic entities as General Linguistic Object
                                                                        52
Fig 18: Conversions of linguistic entities                             53
Fig 19: OpenTMS Scripting Language                                     56
Fig 20: OpenTMSL Inter-process and computer communication              57
Fig 21: Some basic user functions                                      60
Fig 22: Configuration of OpenTMS                                       63
Fig 23: Configuration file naming example                              64
Fig 24: Configuration option structure                                 65
Fig 25: OpenTMS options table                                          65



Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008         7/72
Software Architecture                                04/2008/KW




4 DEFINITIONS

Client: A client is an application or system that accesses a (remote) service on
another computer system known as a server by way of a network. URL:
http://en.wikipedia.org/wiki/Client_%28computing%29

Client-Server: Client-server is a computing architecture which separates a client
from a server, and is almost always implemented over a computer network. A cli-
ent-server application is a distributed system that constitutes of both client and
server software. A client is a software or process that may initiate a communica-
tion session, while a server can not initiate sessions, but is waiting for a requests
from a client. Client and server may also aim at the host computer hardware con-
nected to a network, that are residing the client and server software respectively.
URL: http://en.wikipedia.org/wiki/Client-server

Doclet: Als Doclet bezeichnet man in Anlehnung an Applets Module, die von Do-
kumentationswerkzeugen zur Verarbeitung und automatischen Erzeugung von
Dokumentation und eventuell auch Code eingesetzt werden. Bekannt sind Doclets
insbesondere im Umfeld der Programmiersprache Java, wo sie als Module im Do-
kumentationswerkzeug           Javadoc                  eingesetzt   werden.        URL:
http://de.wikipedia.org/wiki/Doclet.

GUI: Graphical User Interface. An application which allows a human user to inter-
act with a program thru windows, menus etc.

“A graphical user interface (GUI) (IPA: /ˈguːiː/) is a type of user interface which al-
lows people to interact with electronic devices like computers, hand-held devices
(MP3 Players, Portable Media Players, Gaming devices), household appliances
and office equipment. A GUI offers graphical icons, and visual indicators as op-
posed to text-based interfaces, typed command labels or text navigation to fully
represent the information and actions available to a user. The actions are usually
performed through direct manipulation of the graphical elements.” URL:
http://en.wikipedia.org/wiki/GUI

FOLT: Forum Open Language Tools URL: www.folt.org

HTTP: Hypertext Transfer Protocol (HTTP) is a communications protocol for the
transfer of information on intranets and the World Wide Web. Its original purpose



Dok. Nr.: HEA-1-2008; Version 00 ; Rev.00; April 2007
                                                                                       8
Software Architecture                         04/2008/KW



was to provide a way to publish and retrieve hypertext pages over the Internet.
URL: http://en.wikipedia.org/wiki/HTTP

HTTPS: Hypertext Transfer Protocol over Secure Socket Layer or HTTPS is a URI
scheme used to indicate a secure HTTP connection. It is syntactically identical to
the http:// scheme normally used for accessing resources using HTTP. Using an
https: URL indicates that HTTP is to be used, but with a different default TCP port
(443) and an additional encryption/authentication layer between the HTTP and
TCP. This system was designed by Netscape Communications Corporation to
provide authentication and encrypted communication and is widely used on the
World Wide Web for security-sensitive communication such as payment transac-
tions and corporate logons. URL: http://en.wikipedia.org/wiki/Https

Open Source: Open source is a development methodology,[1] which offers practi-
cal accessibility to a product's source (goods and knowledge). Some consider
open source as one of various possible design approaches, while others consider
it a critical strategic element of their operations. Before open source became
widely adopted, developers and producers used a variety of phrases to describe
the concept; the term open source gained popularity with the rise of the Internet,
which provided access to diverse production models, communication paths, and
interactive communities.

The open source model of operation and decision making allows concurrent input
of different agendas, approaches and priorities, and differs from the more closed,
centralized models of development.[2] The principles and practices are commonly
applied to the development of source code for software that is made available for
public collaboration, and it is usually released as open-source software. URL:
http://en.wikipedia.org/wiki/Open_source

RPC: Remote procedure call (RPC) is a technology that allows a computer pro-
gram to cause a subroutine or procedure to execute in another address space
(commonly on another computer on a shared network) without the programmer
explicitly coding the details for this remote interaction. That is, the programmer
would write essentially the same code whether the subroutine is local to the exe-
cuting program, or remote. When the software in question is written using object-
oriented principles, RPC may be referred to as remote invocation or remote
method invocation. URL: http://en.wikipedia.org/wiki/Remote_procedure_call



Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                 9/72
Software Architecture                       04/2008/KW



Server: In information technology, a server is an application or device that per-
forms services for connected clients as part of a client-server architecture. A
server application, as defined by RFC 2616 (HTTP/1.1), is "an application program
that accepts connections in order to service requests by sending back responses."
Server computers are devices designed to run such an application or applications,
often for extended periods of time with minimal human direction. Examples of d-
class servers include web servers, e-mail servers, and file servers. URL:
http://en.wikipedia.org/wiki/Server_%28computing%29

Software Architecture: The software architecture of a program or computing sys-
tem is the structure or structures of the system, which comprise software
components, the externally visible properties of those components, and the
relationships between them. The term also refers to documentation of a sys-
tem's software architecture. Documenting software architecture facilitates com-
munication between stakeholders, documents early decisions about high-level de-
sign, and allows reuse of design components and patterns between projects. URL:
http://en.wikipedia.org/wiki/Software_architecture.

TOMCAT: Apache Tomcat is a Servlet container developed by the Apache Soft-
ware Foundation (ASF). Tomcat implements the Java Servlet and the JavaServer
Pages (JSP) specifications from Sun Microsystems, and provides a "pure Java"
HTTP web server environment for Java code to run. … Apache Tomcat includes
tools for configuration and management, but can also be configured by editing
configuration      files     that   are    normally  XML-formatted.    URL:
http://en.wikipedia.org/wiki/Apache_Tomcat

UML (Unified Modeling Language): In the field of software engineering, the Uni-
fied / Universal Modeling Language (UML) is a standardized visual specification
language for object modeling. UML is a general-purpose modeling language that
includes a graphical notation used to create an abstract model of a system, re-
ferred to as a UML model. UML is officially defined at the Object Management
Group (OMG) by the UML metamodel, a Meta-Object Facility metamodel (MOF).
Like other MOF-based specifications, UML has allowed software developers to
concentrate        more       on      design     and   architecture   URL:
http://en.wikipedia.org/wiki/Unified_Modeling_Language




Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008              10/72
Software Architecture                              04/2008/KW



Unicode: In computing, Unicode is an industry standard allowing computers to
consistently represent and manipulate text expressed in most of the world's writing
systems. Developed in tandem with the Universal Character Set standard and
published in book form as The Unicode Standard, Unicode consists of a repertoire
of more than 100,000 characters, a set of code charts for visual reference, an en-
coding methodology and set of standard character encodings, an enumeration of
character properties such as upper and lower case, a set of reference data com-
puter files, and a number of related items, such as character properties, rules for
normalization, decomposition, collation, rendering and bidirectional display order
(for the correct display of text containing both right-to-left scripts, such as Arabic or
Hebrew, and left-to-right scripts). URL: http://en.wikipedia.org/wiki/Unicode

UTF-8: UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length
character encoding for Unicode. It is able to represent any character in the Uni-
code standard, yet the initial encoding of byte codes and character assignments
for UTF-8 is backwards compatible with ASCII. For these reasons, it is steadily
becoming the preferred encoding for e-mail, web pages, and other places where
characters are stored or streamed. URL: http://en.wikipedia.org/wiki/UTF-8
XML-RPC: XML-RPC is a remote procedure call protocol which uses XML to en-
code     its   calls   and     HTTP  as a  transport   mechanism.    URL:
http://en.wikipedia.org/wiki/Xml-rpc




Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                      11/72
Software Architecture                        04/2008/KW




5 INTRODUCTION

5.1      Arguments for an OpenTMS Software Architcture
The arguments for an open source based localization tool have been discussed in
FOLT, 2007a.
Software design principles:
        For end users (translators): easy to install
        For translation providers: server version, networking
        For customers: running own servers; secure interfaces

5.2      Basics
5.2.1 Naming conventions
OpenTMS uses a standardized naming convention scheme for variables, names in
xml file etc.
Each legal OpenTMS name (string, literal, variable name, function names) con-
sists of one or more words. Variables starts with an uppercase letter. Function
names (e.g. identifying processes) start with lowercase. Only the characters [A-Z]
are allowed. The remaining characters are either [a-z] or [0-9]. No blanks are al-
lowed between words.

Word := [A-Z]([a-z]|[0-9])*
word := [a-z]([a-z]|[0-9])*
OpenTMSName := Word+
OpenTMSFunctionName := word Word*
Examples:
•        The variable: xliffDocument
•        The function: openXliffDocument

Fig 1: OpenTMSName defined as a regular expression
Exceptions from the naming conventions could be introduced if acronyms etc. are
used for words (e.g. TMX). Nevertheless it is not recommended to do this.




Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008               12/72
Software Architecture                         04/2008/KW



5.2.2 Naming of OpenTMS specific functions/methods
It is suggested using a consistent OpenTMS naming system for functions and
variables which are exported from OpenTMS. Exported functions refer to functions
which can be used in applications (similar to the public concept in Java or C++).
This immediately helps to identify code which is used in systems outside of
OpenTMS. The special string “OpenTMS_” is used for this purpose.

ExportOpenTMSName:= “OpenTMS_” Word+
ExportOpenTMSFunctionName := “OpenTMS_” word Word*
Examples:
•        The variable: OpenTMS_Ecoding
•        The function: OpenTMS_openXliffDocument

Fig 2: Naming of OpenTMS functions for export



5.3 Character set
OpenTMS uses UTF-8 as basic character set, esp. for exchanging files.

5.4      Standards
FOLT builds heavily on the idea of Open Source and using standards. Therefore
the FOLT requirements use well-established localization standards to represent
various types of localization information - based on XML.
•        XLIFF - XML based localization exchange format
•        TTX – Trados TM format
•        TMX - XML based localization translation memory exchange format
•        SRX - XML based format for describing segmentation rules
•        GMX – standard for measuring quantitative aspects in the translation
         process
•        TBX / MARTIF / OLIF – formats for representing terminology
•        CSV
•        Language Encoding ISO 639…
In general the basic architecture makes heavy use of XML. XML based structures
are used as the basic mechanism to exchange information between different ap-


Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008               13/72
Software Architecture                          04/2008/KW



plications (->Translets). Using XML has the advantage that many (open source)
parsers are available for different programming languages which enables imple-
menting the core OpenTMS architecture in different languages and environments.

5.5      Basic Requirements
The following is taken from the FOLT (2007b); it extracts the main requirements:

•   Software: Web based application; thin client; no installation no properiatary run
    time components; preferred open source software (FOLT, 2007b, p. 17)
•   Operating System: OS Independent
•   Hardware: standard hardware (FOLT, 2007b, p. 17)
•   Interfaces: Integration into CMS, workflow management should be supported
    (FOLT, 2007b, p. 17).
•   Product interfaces: Exchange supported through XLIFF and TMX (FOLT,
    2007b, p. 18).
•   Database: Open source database (FOLT, 2007b, p. 21); basically all SQL da-
    tabases should be supported, therefore a generic database interface is re-
    quired.
•   Scalability: single and multi user requirement

5.6 Architecture
The architecture is described mainly in diagrams and text. The target group of this
document are mainly non technicians. Therefore it is tried to keep the document
as informal as possible without loosing the necessary precision. Further docu-
ments or versions of this document may add more details to the various items dis-
cussed. If possible the basic methods and classes have been written in Java but
this should not induce that the implementation requires Java as an implementation
language.

The various components described in the document are called models. A model
organizes a certain functionality or aspect of the OpenTMS systems. An example
of a model is the security model of OpenTMS. This model describes all necessary
functions and structures to implement the OpenTMS security system.

There are several methods to describe architecture, methods and objects of a
piece of software. Within this document mainly diagrams and block diagrams are




Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                  14/72
Software Architecture                       04/2008/KW



used to show the structure of the software. For describing methods and objects an
XML based methodology is used (taken from Tomcat).

The following is an example of a method call description using the Tomcat inter-
face description. The method will be enhanced by describing also the possible re-
turn values.
<translet>
    <translet -name>ApplyTranslationMemoryToSegment</translet-name>
    <translet-class>com.OpenTMS.translet.translateSegment</translet-
class>
    <init-param>
       <param-name>
         TMXDB
       </param-name>
       <param-value>
         OpenTMSexampledatabase
       </param-value>
    </init-param>
    <init-param>
       <param-name>
         SEGMENT
       </param-name>
       <param-value>
         This segments needs to be translated.
       </param-value>
    </init-param>
    <init-param>
       <param-name>
         FUZZYQUALITY
       </param-name>
       <param-value>
         70
       </param-value>
    </init-param>
  </translet>

Fig 3: OpenTMS Procedure description

Annotation: In order to keep the text more compact function naming does not in-
clude the naming scheme described in chapter 5.2.2. But this jus for readability
purposes. The real implementation should adhere to the naming scheme.




Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008              15/72
Software Architecture                            04/2008/KW




6 OPENTMS ARCHITECTURE AND MODELS

The OpenTMS architecture is composed of several models. Each model imple-
ments a specific aspect and behavior of the OpenTMS system. Each model com-
municates with the other model through parameters and values.

6.1 Parameters in OpenTMS models
Parameter and their realization, esp. their types, independently from a specific pro-
gramming languages is not really trivial – apart from trivial types like characters,
strings, integers or other numbers. Transferring more complex structured informa-
tion has to be organized based on those primitive types. Programming languages
typically uses “serialization” approaches to achieve at least a transfer of date from
one application instance to another instance.

OpenTMS tries to use a general parameter / value model which addresses both
programming language specific and programming language independent parame-
ter / value transfer. In order to make the integration of existing applications possi-
ble OpenTMS supports different options for parameter representation.

The following methods should be supported:

    •   XML based parameters: all values should be transferred thru xml elements
        where the value is given thru the element content (string), the name of the
        parameter as attribute and the type of the parameter as an attribute too. XL
        based parameter / value transfer is esp. useful when transferring complex
        structured values between functions (e.g. objects). Nevertheless complex
        parameters (objects) need to be serialized. It is suggested that OpenTMS
        defines some additional basic parameter types which often occur in transla-
        tion tools (e.g. date type, TransUnits from XLIFF, tu or tuvs in TMX).

    •   Tomcat parameters: This follows the way how the TOMCAT server engine
        defines method calls with parameter values. Actually also XML based.

    •   XML-RPC parameter: This follows the way how XML-RPC defines method
        calls with parameter values. It supports some basic types like integer etc.
        More complex parameters have to be serialized.




Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                    16/72
Software Architecture                                04/2008/KW



      •   Programming Language specific parameters: Those parameters should
          be wrapped in a specific object thru serialisation. This parameter type
          should only be used within a specific implementation where it is very
          unlikely that it will be used by other programming languages.

      •   Hash tables: Hash tables are supported by most programming languages
          and transfer between database is often supported. Basically an entry in the
          table contains a key (the name of the parameter) and the value of the pa-
          rameter (value of the key).

The kernel of each language specific OpenTMS implementation contains a basic
library which supports creating reading and writing OpenTMS parameters.

                    Type                                         Comment

int                                               Integer as in Java

float                                             Float as in Java

char                                              Character as in Java

String                                            String as in Java

Time

Date

TransUnit                                         XML based XLIFF TransUnit Structure

tu                                                XML based TMX tu Structure

GLO                                               General Linguistic Object - see chapter
                                                  12

MoLo                                              Monolingual Object - see chapter 12

Mulo                                              Multilingual Object - see chapter 12

Fig 4: Table of Core OpenTMS parameter types

An example how parameters are used is given in Fig. 2.



Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                        17/72
Software Architecture                        04/2008/KW



6.2 Core Models of OpenTMS
The following chapter describes the core models of OpenTMS. The key idea is
that OpenTMS uses an extendible architecture approach which allows to add new
models in an easy, yet compatible way to the kernel architecture. A new model
has to fulfill some basic requirements, e.g. that parameters are defined and used
in the way as described in the previous chapter 6.1.




Fig 5: OpenTMS Models and their relations

The OpenTMS models are arranged in a kind of “onion model”. The kernel is rep-
resented by the process model which in turn builds on the user, document and
data model which model specific aspects of the OpenTMS system. These kernel
models are “shielded” by the security model which is responsible for assuring that
only allowed operations are performed.




Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008               18/72
Software Architecture                          04/2008/KW



    •   Security Model: This model describes the security aspects and require-
        ments of OpenTMS. Other models use the security model to allow or re-
        strict the access to OpenTMS specific functions. OpenTMS uses a security
        model which on the one side secures the communication channel and on
        the other side secures data (e.g. the value of elements in an xml file or the
        values in a property file).

    •   User Model: This model realizes the user and its representation in the
        OpenTMS. The user model works in tight connection with the security. User
        does now only imply human users, but also other processes. User models
        have rights attached to them which in turn support the security model of
        OpenTMS.

    •   Process Model: This model implements the functions (combined finally into
        applications – see application model) of the OpenTMS, e.g. a converter or a
        translation memory search.

    •   Data Model: Basically this model implements the database side of
        OpenTMS. It uses a generalized database model, called data sources.
        Data sources are any kind of storage media for data, starting from plain text
        files towards SQL and other types of databases.

    •   Document Model: The document model describes the core documents
        used in OpenTMS. Basically this is based on XLIFF and TMX. The docu-
        ment model also could be seen as part of the data model but due to the im-
        portance of documents as one of the core output produced by the transla-
        tion and localization process they are modeled separately.

    •   GUI Model: This model specifies editors and other functionality which re-
        quires a GUI. The GUI model is not further detailed in the architecture
        specification here. The GUI model should be defined in a separate docu-
        ment.

    •   Interface Model: The model describes how to extend OpenTMS with new
        models. The Interface model is an abstract model and needs further inspec-
        tion. An example of such an extension is the interface to CMS systems. In-
        terface models are also of quite importance as they serve as the connection




Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                  19/72
Software Architecture                         04/2008/KW



        to other applications (e.g. Web servers, CMS systems) and in general to
        scripting languages like Perl, PHP etc.

    •   Application Model: This model realizes programs, which performs tasks
        like translation etc.




6.3 OpenTMS Core Library
In order to achieve a consistent implementation and in order to foster a quick im-
plementation OpenTMS implements its key functions in a core library. Function
implemented in the core library should not be re-implemented (“reinvented”) in ex-
ternal functions or processes. Obviously the set of key functions will evolve over
time. Functionality and implementation of the core should not be changed without
important reasons (similar to the LINUX implementation process).

Using a core library OpenTMS will ensure that certain functions behave in the
same way across applications. It also gives security to the developer and the user
that functionality does not change unforeseeable.

Core library functions should be the first one which are realized if OpenTMS is im-
plemented in different programming languages.

6.4 The Application Model
The OpenTMS architecture just serves as a model how the different aspects of
tools supporting the translation process can be implemented. As a model it is in-
dependent from any programming language.

Applications need to be written in order to make the functionality of OpenTMS
accessible to users. This is realized in the application model. The GUI model can
be seen as an example of an application model.

Applications obviously depend on the existence of a concrete implementation in an
existing programming language (Java, C#, Perl or whatever). In this sense
OpenTMS provides a programming framework which allows to construct language
support tools.




Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                20/72
Software Architecture                        04/2008/KW



In the beginning OpenTMS will come with some basic applications (Editors etc.).
But the main idea is that a profound framework is defined and specified which al-
lows the construction of new language applications.

OpenTMS also supports its own scripting language (OpenTMSL). This language
makes the OpenTMS functions accessible thru simple calls (similar to batch files).
This scripting language can also be used to construct applications.

6.5 Implementation Languages
In a first step it is suggested to implement a Java version of OpenTMS. Java has
the advantage compared to other languages that it runs on several operating ma-
chines (which is one of the goals of FOLT and OpenTMS). Integrating tools written
in other language can be done as OpenTMS from its basic model is constructed
toward using XML-RPC and similar communication modes.

The basic Java implementation can serve as the basis for other implementations
(C, C#, C++, Perl, PHP etc.).

With regard to security issues associated with choosing a proper programming
languages see chapter 7.




Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008               21/72
Software Architecture                         04/2008/KW




7 SECURITY MODEL

A key success factor of the OpenTMS system is security. As translation always
can involve documents of various security levels a proper handling of the docu-
ments and document transmission is required.

Depending on the security level data can be encoded/encrypted. It is suggested to
use three different levels.

    •   Level 0: No security procedures are applied, data are transferred as they
        are.

    •   Level 1: The communication channel is secured. It uses standard secure
        protocols here.

    •   Level 2: Encoding for security is done here on data level. Basically this
        means that strings are encrypted when the are communicated through a
        communication channel or are written or retrieved from a database. This
        also involves encrypted XLIFF files (resp. parts of it).

    •   Level 4: GUI level related security

Level 1 and 2 can be used together to achieve optimal security where necessary.

Security is attached to the OpenTMS User model.

A key feature of the OpenTMS architecture is that the security model is transpar-
ent. Actually when writing a (new) application the programmer does not need to
take care of the security expect. The OpenTMS kernel provides all the functions
and interfaces to make those calls transparent; supplying the correct parameters is
sufficient.

Actually another type of security level (Level 4) can be introduced at GUI level. At
this level functions like copy and paste are secured in addition. This should pro-
hibit that users can copy and paste the content of text windows (editing windows)
into other applications. Defining this security level will be left to the GUI model
definition.




Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                 22/72
Software Architecture                            04/2008/KW



The following diagram shows how several methods can be combined to achieve a
high security during the transmission of an XLIFF file. In this example in a first step
the XLIFF is secured (encrypted). Once a transfer of the file during the net work is
required the channel as such is also secured. Once the XLIFF file is received it is
decoded by the OpenTMS system. From a programmatic side this is just realised.
by setting and defining the security to be used.




Fig 6: Example securing XLIFF document exchange




7.1 Security, OpenTMS and Programming Languages
In the previous chapter the issue of programming languages has been discussed.
A common known problem with programming languages – more precisely with
applications written in those languages and often also only associated with specific


Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                    23/72
Software Architecture                         04/2008/KW



operating systems – security measures are often not properly implemented (e.g.
the very old problem of “buffer overflows” in C).

OpenTMS overcomes this problem by clearly defining specific modules which are
encapsulated and follow modern software development rules (e.g. access only
thru well defined interfaces) a special security layer wraps the various modules.

This architecture specification is mainly targeted towards the server part of
OpenTMS. Thus it is independently from any GUI application.

GUIs can use OpenTMS basically in two ways:

    a) thru the OpenTMS server functionality: This approach encapsulates all
       modules and functions and gives the highest possible security measure.
        Here only “public server sided functionality” can be used.

    b) Directly calling functions from the OpenTMS library: Obviously this can
       cause problems if the GUI does not call the functions properly (esp. in pro-
       gramming languages like C or C++).

One of the OpenTMS target GUIs are web based applications (browser based).
Those will call all the functionality thru a web server, SOAP or XML-RPC inter-
faces. This minimises the danger of introducing security problem on the client size
(e.g. for GUIs which have to follow requirements like ZDv 54/100 VS-NfD „IT-
Sicherheit in der Bundeswehr“). By restricting to “plain HTML” one can reduce the
risk to a minimum. Obviously increasing the security level goes with a decrease in
comfort und user friendliness. This decision is up to the end user and his organisa-
tion.

7.2 Communication Level
Communications which goes through TCP/IP should support (strong) encryption of
the data transmitted. This is done in addition to using protocols like https, se-
cureFTP etc.

7.3 Document Level
The basis of most activities in OpenTMS are documents.




Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                 24/72
Software Architecture                          04/2008/KW



A key problem is the transfer of xliff files. The content of the segments are nor-
mally readable by human readers. If required the segments in the xliff files (as well
as in tmx or tbx files) can be encrypted (creating something like a secureXLIFF,
secureTMX, secureTBX). The segments can only be read in conjunction with a
user and password. The users who have regular access to the content can be
stored in encrypted form in the header of the xliff file or be supplied when opening
the xliff document.

7.4 Database Level
Database entries follow the same procedure. If required the entries should be en-
crypted. At this level database specific security functionality can and should be
applied to.

Without the knowledge of the user - password combination an export etc. of the
database does not provide any information in case of an attack.

In addition any data base security layers need to be supported too.

7.5 Security Level
The following functions assume that each encryption and decryption process as-
sociates the relevant user and his roles with the security function. At this point no
function parameters are defined. This will be done in an implementation manual.




Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                  25/72
Software Architecture                              04/2008/KW




            Function                                            Comment

Encrypt / Decrypt                     General function which encrypts and decrypts any
                                      type of document

Encrypt XLIFF                         This function encrypts the texts (segments) of a
                                      XLIFF document. The xml structure as such is still
Decrypt XLIFF                         visible. Depending on the parameters supplied
                                      attributes etc. are secured too.

Encrypt TMX                           This function encrypts the texts (segments) of a
                                      TMX document. The xml structure as such is still
Decrypt TMX                           visible. Depending on the parameters supplied
                                      attributes etc. are secured too.

Encrypt TBX                           This function encrypts the texts (segments) of a
                                      TBX document. The xml structure as such is still
Decrypt TBX                           visible. Depending on the parameters supplied
                                      attributes etc. are secured too.

Establish Secure Communi- Establish a secure communication channel. The
cation                    type of security depends on the supplied parame-
                          ters.

Terminate Secure Communi- Terminates a secure communication channel.
cation

Secure Data Source                    Enables the encryption / decryption of database
                                      entries.




Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                      26/72
Software Architecture                         04/2008/KW




8 BASIC OPENTMS COMPONENTS
The OpenTMS framework is organized around a set of basic components called
models (see chapter 6) which interact and allow to apply processes on them. The
following is a brief overview which basic models exist:

•   Documents: Documents form one key feature of the architecture. Basically
    documents are every form of text. Translations and other modification proc-
    esses (e.g. segmentation) are applied to documents. A key document type in
    OpenTMS is an XLIFF document which is main paradigm for communication
    text between various processes.
•   Database: Database refers to any kind of storage which can be used to re-
    trieve a specific text or sub-text (like a paragraph, segment). Database in the
    OpenTMS context is understood widely, starting from simple text files towards
    highly sophisticated SQL or object oriented database systems. OpenTMS uses
    a general database object which can come in various flavors, e.g. translation
    memory, a phrase database or terminology databases. OpenTMS database
    architecture supports various security levels. Encrypting of entries should be
    supported. OpenTMS uses the notion of “data source” for this generalized
    data bases.
•   Processes: Processes apply operations to documents and databases. Opera-
    tions could be: modifications, inserting, searching, editing, converting etc. A
    key process in OpenTMS is the translations process. OpenTMS processes are
    named “Translets” (or Translet in singular). An example of a Translet is a Do-
    clet, a module which is applied for the conversion, modification etc. of docu-
    ments. Processes in OpenTMS are normally accessible through the OpenTMS
    Scripting Language, a language which gives access to the core operations of
    the OpenTMS architecture (similar to Java Scripts)




Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                27/72
Software Architecture                          04/2008/KW




Fig 7: OpenTMS Objects

From a certain perspective processes can be seen as a special type of commu-
nication. Within OpenTMS three different communication types can be distin-
guished. Communication is here used in a broad view.

•   Command (file) based process: Here an executable is run (batch mode).
    Command processes use xml based command files as input parameters.
•   Function based process: Here the specific process is called either as a func-
    tion or method within a piece of software.
•   Net (TCP/IP) based process: Here a process is run through a net work
    (TCP/IP) using SOAP, RPC, XML-RPC or similar communication methods. The
    method is activated in a certain process while the actual execution is run in an-
    other process (could be a server, a virtual machine, multi threading or similar).


•   Workflow: A workflow is a set of processes which are applied in a specific se-
    quence. A workflow also may involve humans as part of the workflow. A typical
    workflow could be: PM received document to translate – determines document
    characteristics – compute statistics – provides offer – client accepts offer – PM
    determines translator – converts document for translator – sends to translator –
    and so on. This means that a workflow also can contain purely humans actions
    interwoven with computer processes. Anyway each human process must be
    mapped to a computer process.



Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                  28/72
Software Architecture                         04/2008/KW



Later in the document it is mentioned that processes can be organized in pipe-
lines. Actually this means that one process can take the output of another process,
do some computation on this output and create a new output which itself can now
form the input to another process.




Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                29/72
Software Architecture                         04/2008/KW




9         DOCUMENT MODEL

9.1 Documents
Documents(“texts”) are a core concept in OpenTMS. Documents are normally the
core interest as documents need to be translated. Documents normally come into
OpenTMS as input or output. Documents are normally processed in OpenTMS
thru XLIFF (chapter 9.4). Documents are converted into XLIFF and back. Docu-
ments come in various formats, e.g.:

    •   WinWord

    •   RTF

    •   Plain text

    •   HTML

    •   XML

    •   OpenOffice

    •   program texts

    •   resource files

    •   property files

    •   database entries

    •   any other common location industry formats

    •   any other document type

The most simple type of a document is a string, a sequence of characters. For
OpenTMS processes strings are packed into XML structures, mainly a subset of
XLIFF.

A key property of a document is a language associated with it – although the lan-
guage itself may vary within the document. If a document gets translated at least a
second language is associated with it.


Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008
                                                                                30
Software Architecture                           04/2008/KW



9.2 Character Sets
OpenTMS uses the Unicode character set for all (internal) representation pur-
poses. This has the advantage that most of the characters used worldwide can be
processed with OpenTMS. Also most programming languages use nowadays Uni-
code as their internal character representation.

UTF-8 formatted text is used as the core character set if OpenTMS produces and
delivers files which are some kind of final document (e.g. for statistics output). De-
viations come in if the original character set differs.

The core library of OpenTMS contains basic functions to convert from one charac-
ter set to another character set. In addition the kernel library should contain some
functions which allow the detection of a character format of a document.

9.3      XML document handling
OpenTMS heavily uses XML bases standards (XLIFF, TMX, TBX). There are sev-
eral good open source implementations for XML handling available (DOM model,
SAX parser, JDOM just to name a view). Obviously those functions should used to
manipulate those documents.

On top of the standard xml library functionality functions are required to support
the manipulation of the translation / localization XML standards. Those functions
will also be part of the core library.

9.4      XLIFF Documents
XLIFF documents form the core document type on which most of the processes
are applied (segmentation, translation etc.). XLIFF documents are created by con-
verters. Converters take different document formats (rtf, xml, html etc.) and con-
vert them to the xml based XLIFF format (XLIFF, 2008).

The following shows a very simple example of an XLIFF document.




Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                   31/72
Software Architecture                        04/2008/KW



<?xml version="1.0" encoding="UTF-8" ?>
<xliff version="1.0">
<file     datatype="XML"    original="D:arayatestsimplexmlsimplexml.xml"
source-language="de" target-language="es">
<header>
<phase-group>                          Header of the XLIFF File
<phase company-name="Araya" date="Sun May 11 11:29:11 CEST 2008" phase-
name="1" process-name="pre-process" tool="XML2XLIFF version 2.0"/>
<phase company-name="Araya" date="Sun May 11 11:29:11 CEST 2008" phase-
name="2" process-name="Segmentation" tool="SEGMENTER version 2.0"/>
</phase-group>
    <skl>
                                       Reference to an external file
       <external-file href="C:arayasklsimplexml.xml.27120.skl"/>
    <internal-file
form="mimestring">PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiID8+DQo
8c2ltcGxleG1sPg0KPHNl
                                       Internal File
Z21lbnQ+JSUlMCUlJQo8L3NlZ21lbnQ+DQo8c2VnbWVudD4lJSUxJSUlCjwvc2VnbWVudD4NC
jwv
c2ltcGxleG1sPg==</internal-file></skl>
    <prop-group        name="encoding"><prop         prop-type="encoding">UTF-
8</prop></prop-group>
    <prop-group name="xmlformat">
       <prop                           Properties of the XLIFF File
                    prop-type="donotresolveentitiesfile">C:arayainiedqm-
ent.txt</prop>
       <prop prop-type="iniFile">c:/Araya/ini/config_simplexml.xml</prop>
    </prop-group>
    <prop-group name="specialinfo">
    </prop-group>
</header>
<body>
    <trans-unit approved="no" help-id="0" id="0" xml:space="preserve">
       <source xml:lang="de">Das ist ein Segment</source>
    <target xml:lang="es" xml:space="preserve"/><prop-group><prop prop-
type="segmentid">1067381512</prop></prop-group></trans-unit>
                                       Segments
    <trans-unit approved="no" help-id="1" id="1" xml:space="preserve">
       <source xml:lang="de">Das ist ein <ph id="0">&lt;b&gt;</ph>Segment
mit<ph id="1">&lt;/b&gt;</ph> Format</source>
    <target xml:lang="es" xml:space="preserve"/><prop-group><prop prop-
type="segmentid">1067381512</prop></prop-group></trans-unit>
</body>
</file>
</xliff>

Fig 8: XLIFF File



9.4.1 OpenTMS and Skeleton files

Skelton files are one of the key features of XLIFF. In order to reduce the size of
content of a segment (transunit, source and target) most converters move the non-



Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008               32/72
Software Architecture                             04/2008/KW



relevant part (e.g. format information) of an (external) document in an external rep-
resentation. They then use a kind of referencing scheme to specify where parts of
the text and the segment come together (mainly for back conversion). Skeleton
files mainly contain the format (non-textual) part of a document. Often this part is
bigger than the core text.

One can distinguish between internal and external skeleton files (also called skl
files).

External skl files keep the XLIFF file small, while internal skl files create a bigger
XLIFF file. With external files the problem of back conversion is more complicated
as the back converter requires the skl file. One way to overcome this problem is to
compress the internal skl file and encode it appropriately.

OpenTMS supports the back conversion of a document independently from the
place it was created. Thus normally XLIFF files in OpenTMS use internal skl files.
In case where this is not possible or wanted a procedure must be supplied which
allows to reintegrate the skl file into the xliff file before transmitted to another ma-
chine, user etc.

9.4.2 Security and encryption in XLIFF – secureXLIFF

As described in the section about security XLIFF documents must follow the secu-
rity architecture of OpenTMS. XLIFF documents are potential threat for security. If
they are transmitted via the web or by another transport method (USB stick etc.)
other persons may read the XLIFF document. In order to prevent access of unau-
thorized users it is proposed to encrypt the relevant parts (esp. source and target
elements) of the document. Only specified users with the correct password will
gain access through an editor or similar to the content of the XLIFF document.
XLIFF editors reading the file must support the OpenTMS security layer. Using
such a security approach one also could forbid copy and paste etc. for a given xliff
document.

Annotation: Obviously an open source encryption method should be used.

Using a secureXLIFF may be a good argument for industrial user to use the
OpenTMS concept and architecture.

9.5      TMX Documents


Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                     33/72
Software Architecture                                        04/2008/KW



TMX documents form the core document type on which database operations apply
(fuzzy search, word based search etc.). TMX documents resp. their entries are
stored in databases. Converters take different translation memory exchange for-
mats (Trados, etc.) and convert them to the xml based TMX format (TMX, 2008).
Databases store the tmx entries. While there is no problem with the meta informa-
tion associated with each TMX entry (tu) the global TMX document meta informa-
tion creates a problem. As databases are organized around entries this meta in-
formation must be stored in separate tables and referenced by each entry.
                                                                                                1
TMX files are normally imported into databases to support high access speed .

9.5.1 Security and encryption in TMX – secureTMX

The same security architecture as for XLIFF should be applied to TMX.

9.6        TBX Documents
TBX documents form the core document type for terminology data. TBX docu-
ments are imported into a OpenTMS database. TMX and TBX documents are in-
ternally stored in the same entry structure. They can distinguished by specific
markers.

The reason for storing both TMX and TBX documents in the same type of data-
base is that this allows the re-usage of both data in similar situations. Obvi-
ously the database functions need to support reading and writing the entries given
the context. This a (originally) TBX entry may be used as a TMX entry (translation
memory match) in one context while a TMX entry could be used as a terminology
match in another context. This internally identical handling should not imply that
both entry types are the same but reality shows that often the usage patterns re-
quire that they can be used interchangeable.

9.6.1 Security and encryption in TBX – secure TBX

The same security architecture as fur XLIFF should be applied to TMX.



1
    A key question is if OpenTMS should allow direct access to TMX files (like Star text files) too
       without having the need to import them into a database. Advantage would be that esp. for
       small TMX files there is no real need to store them in a database. It would also not require any
       database drivers. XML access functions would be sufficient. One could see this a special type
       of database.


Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                                   34/72
Software Architecture                             04/2008/KW



9.7      Other Documents
OpenTMS requires to process all types of other documents. Once those files are
brought into the OpenTMS system those files are converted to XLIFF (except
those cases discussed above). Once processed those XLIFF documents are con-
verted back to their original format.

Ideally OpenTMS should contain or interact with a CMS system which provides a
convenient way of storing all kinds of documents. Interfaces to CMS will be de-
fined. Although the implementation of the interface is not part of the OpenTMS
implementation. See chapter 18

9.8 Basic Document Access Functionality
In the following some basic XLIFF file functions are described. Those functions
should go into the core library of OpenTMS. They are by far not exhaustive. A
more detailed function library for XLIFF will be defined later. Although most of the
functions can be realised by using DOM functionality, a function library which
makes it easy to handle XLIFF files should be realised.

As the functions will involve complex parameter combinations the parameters will
be supplied as XML constructs. For performance reason one will not really supply
flat xml files, but an in-memory version of the XML file (nodes etc.).

  Basic Translation Func-                                       Comment
tions for XLIFF documents

Convert Document                      Converts a given document to XLIFF

Backconvert Document                  Back converts a given document from XLIFF

CreateXLIFFDocument                   Creates an empty XLIFF document. This function
                                      maybe questionable as normally XLIFF docu-
                                      ments have just an temporary status. The nor-
                                      mally come into existence thru a converter call.
                                      Nevertheless such a function may be helpful.
                                      Pure to text conversion can be achieved anyway.

GetProperties                         Retrieves the (general) properties of the XLIFF
                                      document

SetProperties                         Sets the (general) properties of the XLIFF docu-
                                      ment


Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                     35/72
Software Architecture                              04/2008/KW




Segment                               Segments the XLIFF document based on some
                                      SRX rules (configuration file)

AddTransUnit                          Adds a new TransUnit at a certain position. This
                                      function also depends on the original format. De-
                                      pending on the format this function may cause
                                      problems in the back conversion process.

RetrieveTransUnit                     Retrieves a segment of the XLIFF document; this
                                      includes all the information of the segment (thus
                                      the whole trans-unit is received)

RemoveTransUnit                       Removes a TransUnit; here one could distinguish
                                      between immediately (and therefore permanently
                                      executing the operation) or just making the
                                      change in memory and later saving the changes.

ModifyTransUnit                       Modifies a TransUnit; here one could distinguish
                                      between immediately (and therefore permanently
                                      executing the operation) or just making the
                                      change in memory and later saving the changes.

TranslateTransUnit                    The TransUnit is translated based on some pa-
                                      rameters supplied. This can include TM transla-
                                      tion, term translation or machine translation or
                                      basically any other kind of translations or
                                        nvocacation.

SplitTransUnit                        Splits the source part of a TransUnit. Care has to
                                      be taken with regard to validity.

CombineTransUnit                      Combines the source parts of a TransUnit. Care
                                      has to be taken with regard to validity.

SaveDocument                          Saves the XLIFF document

GetStatistics                         Returns some statistics of the translation process
                                      (GMX based)

Fig 9: Some basic XLIFF File functions




Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                      36/72
Software Architecture                         04/2008/KW




10 OPENTMS AS A CLIENT/SERVER ARCHITECTURE

The kernel OpenTMS architecture is based on the client server principle. Using
a client server architecture brings many advantages, amongst the very critical one
that processes can be spread over several computers or threads in modern oper-
ating systems and hardware architectures. This does not imply that the OpenTMS
architecture only can be implemented on a client server basis. All the processes
(Translets) also can run in a single user environment (e.g. by a procedural call
within an editor). But by using a client server framework one avoids the problem to
re-program or re-implement a piece of software which was designed to run in a
single threaded environment only. This holds with regard to using global or static
variables etc. from an implementation point of view.

Each procedure developed for OpenTMS should be designed with multi thread-
ing in the background. Each procedure should be encapsulated in such a way that
it can be surrounded by a (process wrapper) which allows it to run other as a
(multi) thread in the same software or computer environment or can be distributed
over several computers. Actually this means “globally defined variables” should
be avoided as far as possible. As has been described before the key functions are
implemented in the OpenTMS core library.

All (main) procedures should also be written in such a way that they can be called
easily by the OpenTMS scripting language.




Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                37/72
Software Architecture                         04/2008/KW



Fig 10: Hierarchy of processes

Processes have to adhere to the security concept of OpenTMS. Processes can
only be executed if they (and the user associated with the process) have appropri-
ate rights (gained thru the security model). This esp. applies for processes which
use network connections.




Fig 11: Applications

Most of the processes are XLIFF exchange based (thinking in terms of functions
and variables this means that the parameters of functions are XLIFF documents or
substructures of XLIFF). This means that the processes mainly operate on XLIFF
based xml structures. They add or modify XLIFF structures. In principle the opera-
tions should be non destructive. That is information is not deleted or removed but
only added. In some cases this cannot be fully held: e.g. if a translator modifies a
translation (in a destructive way) the (older) information is lost. The same may ap-
ply to database entries. This also depends on the usage of a proper versioning
system. As a consequence of using internally XLIFF related structures conver-
sions to related XML based formats like TMX, TBX etc. must be supported. This
can be realized by attaching import and export procedures to the OpenTMS ker-
nel.




Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                 38/72
Software Architecture                         04/2008/KW



Exceptions are for example converters which take a whatever formatted docu-
ment as input and produce an XLIFF document. The same applies to back con-
version.

Please note that the above figure also represents some kind of workflow. Basic
workflows can be part of the OpenTMS architecture (e.g. each process applying
changes to an XLIFF document should document this in the XLIFF header). But it
is not intended that OpenTMS as such comes with its own workflow solution. More
complex workflow procedures should be modeled either using proprietary or open
source software.

OpenTMS also follow the “old style” of UNIX pipe lining. Processes (see chapter
about process model) take an input and produce an output. The next process will
take the output of the previous process applying some further transformation of the
input and creating new output. Nevertheless there is some difference. As parame-
ters can become quite complex the UNIX style of interpreting the input just as “a
string” is opened here up to support input and output in form of the parameters
described before.




Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                39/72
Software Architecture                           04/2008/KW




Fig 12: Pipeline Architecture

Figure 11 shows a typical pipe lining of several processes (Translets) during a
translation process. OpenTMS can differentiate between two basic Translets.

•        Human Initiated Translets: These are Translets which are invoked and
         (fully) controlled by humans. Examples are a Translation Editor, operation
         which invoke inserting or updating entries in a database.

•        Automated Translets: These are processes which are normally run auto-
         matically and do not require human interactions. Examples are the steps –
         conversion – segmentation – pre-translation. Here also automated pro-
         cedures (e.g. pre-translating a project – Translets applied to a set of docu-
         ments) have to mentioned.




Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                   40/72
Software Architecture                      04/2008/KW




11 DATA MODEL

11.1 Data sources
Data (mostly databases) are modeled thru data sources. Data sources are the ba-
sic objects which allow the access to all kind of data, esp. databases. Data
sources mainly store segments from TMX files or TBX entries. Data sources are
XML oriented, that is depending on the xml document supplied it converts the en-
try in such a way that it can be transferred to a data component.




Fig 13: Data sources and data components

Why not directly refereeing to databases? The basic idea behind the usage of a
data source as the core data object in OpenTMS (representing databases) etc. is
that creating such a layer between the real databases (e.g. MySQL) and the
OpenTMS software makes adding new types of data quite easy. The various types
of data are referred to as data components. Thus an SQL database is a data


Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008             41/72
Software Architecture                         04/2008/KW



component, but also a TMX file could be seen as a data component if the relevant
access operations are supported. Similar an Excel file can be considered as a
data source. Using this approach OpenTMS is not restricted to SQL databases,
but can use flat files, spread sheets etc. too. It can also support direct access to
vendor specific databases or systems. A server sided installation of OpenTMS can
also act as data source.




                      Access to data sources
                       through standardised
                             interface
O
P
E
N                  Open
T
M                  TMS               Data type
                                      specific
S                  Data               access

S
                  Source             functions

O                  Layer
F
T
W             Maps the OpenTMS
A            access functions to the
            specific data component
R
E                                         Various data
                                       components like files
                                              etc.




Fig 14: Data sources with several data components




Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                 42/72
Software Architecture                           04/2008/KW



A data component which is connected thru a data source must support a core
functionality. This core functionality is divided into three types of functions (meth-
ods):

•        Read methods: This involves all functions retrieving data from a data
         component. Read methods also maps the results in the way the caller
         needs the data (e.g. TBX or TMX).

•        Write methods: This involves all functions writing, updating and deleting
         data to a data component. Write methods also take into account which in-
         put format is used (e.g.TMX or TBX etc.) and convert them into the internal
         data source format.

•        Select Methods: This methods are part of the read methods and allow to
         select specific entries from the data source.

Care has to be taken which security level has been chosen. Depending on the
level the data have to be encrypted and decrypted.

Two types of data components can be distinguished:

•        Read only data components: This type of component can only retrieve
         data, but not store data. An example could be if a plain TMX file is used as
         data component.

•        Full data components: Here both read and write methods are supported.

Depending on the user configuration data components can be configured to be-
have differently. It can appear as read only data component for one user, while for
another used it could be accessible as full data component.

11.2 TM Matches
OpenTMS differentiates between three types of matches:

•        Perfect Match: This is a match where the segment to be searched
         matches the segment in TM both with regard to the text content and
         the format

•        Exact Match: In this case only the text part of the segment matches with
         the database entry perfectly, the format information differs.



Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                   43/72
Software Architecture                           04/2008/KW



•        Fuzzy Match: In this case there are some deviations between the search
         segment and the match in the TM. The difference is usually stated in %
         values. This type of match is also often called inexact match.

One may consider in the future other types of matches too, e.g. replacement class
matches where only the “blank characters (white spaces)”, differ. For this see also
chapter 12.3.

11.3 Basic data source access functionality
The following (read and write ) access functions are the core functions need. Ac-
cess results in matches. A basic idea is that that the function decides based on the
input supplied how the entry is interpreted and written into the database. This
means that TMX entries are handled differently from TBX entries etc.

Please note that in the description of the functions no explicit reference is made to
the security model. It is assumed that the security level is set before or in invoca-
tion with the database function invocation.




Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                  44/72
Software Architecture                                       04/2008/KW




           Access Type                                          Comment

Exact Access                            A given entry is found by the “string=segment”
                                        supplied but independently of the format..

Exact Format Access                     A given entry is found by the “string” supplied tak-
                                        ing format information into account.

Fuzzy Access                            A given entry is found by using a similarity search.
                                        Similarity is measured in %, where 100% is iden-
                                        tical to an exact access.

Fuzzy Format Access                     A given entry is found by using a similarity search
                                        – taking the format into account. Similarity is
                                        measured in %, where 100% is identical to an
                                        exact format access.

Word Based Access                       A search is done by splitting the string into indi-
                                        viduals words. The word identification is language
                                        dependent. The words could either be searched
                                                         2
                                        using OR or AND . Word based access could be
                                        enhanced by supporting stemming (e.g. Porter
                                        stemming algorithm)

Regular Expression Access               A regular expression is used to retrieve the result
                                        set. Actually such a function is quite resource
                                        consuming.

Sub segment Access                      Segments are retrieved based on some sub seg-
                                        ments of a given search string. Actually this could
                                        be seen as a more specialized form of the regular
                                        expression search or word based search. This
                                        type of search is esp. important if a segment ac-
                                        tually represents a paragraph and may contain
                                        several sentences.

Fig 15: Data source access types




2
    It is suggested to use a logical represenation of the query similar to Google (www.google.com).
         Here + denotes”word must exist”, while – denotes that the word is not allowed to exist in the
         result set.


Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                                  45/72
Software Architecture                                04/2008/KW




 Access Functions for TM                                        Comment
      and TBX data

RetrieveTMMatch                       Get a match from the Translation Memory. The
                                      actual result depends on the data source access
                                      type chosen. Parameters involve match quality
                                      etc.

RetrieveTBXMatch                      Get a TBX match from the terminology database.
                                      The actual result depends on the data source ac-
                                      cess type chosen.

AddEntry                              This is a generic function adding data (e.g. TMX
                                      entries) to data sources. The function is generic in
                                      that that sense that it decides on the type of the
                                      xml document to be added how the entry is stored
                                      (TMX, TBX etc.).

CreateEntry                           Creates an empty data source entry of a specific
                                      type

AddTMEntry                            Adds a TM entry; actually a specialization of Ad-
                                      dEntry

AddTBXEntry                           Adds a TBX entry; actually a specialization of Ad-
                                      dEntry

RemoveEntry                           This is a generic function removing data (e.g.
                                      TMX entries) to data sources. The function is ge-
                                      neric in that that sense that it decides on the type
                                      of the xml document to be added how the entry is
                                      stored (TMX, TBX etc.)

ModifyEntry                           This is a generic function modifying data (e.g.
                                      TMX entries) to data sources. The function is ge-
                                      neric in that that sense that it decides on the type
                                      of the xml document to be added how the entry is
                                      stored (TMX, TBX etc.)

CopyEntry                             This is a generic function copying data (e.g. TMX
                                      entries) to data sources. The function is generic in
                                      that that sense that it decides on the type of the
                                      xml document to be added how the entry is stored
                                      (TMX, TBX etc.)

Fig 16: Data source access types




Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                        46/72
Software Architecture                                      04/2008/KW




11.4 Databases
A key principle of the OpenTMS architecture is its independence from database
products. OpenTMS defines a core subset of access functions (based on SQL)
which can be implemented by nearly all database systems.

The following gives a (a non exhaustive) list of database types which should be
          3
supported .

11.4.1 Open source SQL data bases

•          MySQL - www.mysql.de

•          Postgres - www.mysql.de

•          H2 - www.h2database.com

•          Cloudscape - www.ibm.com/software/data/cloudscape (IBM)

•          …

11.4.2 Closed source SQL databases

•          SQL            Server          (different                       flavors)               -
           www.microsoft.com/germany/sql/default.mspx

•          Oracle - www.oracle.com

•          …

11.4.3 Alternatives

SQL databases are not the only databases out there. Other database formats
could be:

•          Spreadsheets (like SQL)




3
    A key question at this point is if OpenTMS should implement something as an “internal database”
       which just would mean storing the database as “simple hash tables” which can be serialised
       and de-serialised. See also the discussion of TMX documents (Footnote 1). Alternatively the
       internal database could just consist of an xml file.


Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                               47/72
Software Architecture                 04/2008/KW



•        Object oriented databases

•        XML database systems (e.g. XINDICE)

•        Plain text files




Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008       48/72
Software Architecture                         04/2008/KW




11.4.4 Database Access

Internally all main access functions of OpenTMS are based on specific objects
(see page 51) and all access happens through these objects. By using this addi-
tional abstraction level (interfaces as they are called in most programming lan-
guages nowadays) one gets even independent from SQL and is open for future
advances in the area of databases development.

All access functions are mapped to SQL statements (or their equivalents) which
are not hardcoded but stored in xml database configuration files.

Till this point there is no real necessity to realize the database only in SQL. The
advantage of using SQL as the language describing the access functions is a) that
it is widespread and b) standardized.




Fig 17:Configuring different database types

11.4.5 Database and data source configuration

As OpenTMS needs to support a lot of different database / data sources type add-
ing a new database type should not require changing the data source code kernel.



Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                49/72
Software Architecture                               04/2008/KW



Therefore for each data source type a configuration file defines the main pa-
rameters of the database. Depending on security require the configuration file
can be secured using the security model functions for documents. This includes:

•        Database class driver – e.g com.mysql.jdbc.Driver

•        Connection String – e.g. jdbc:mysql:

•        Any other connection string specific commands (e.g. buffer size)

•        Commit support

•        Unicode support

•        Server Address

•        Port

•        User (encrypted)

•        Password (encrypted)

•        Mapping of OpenTMS database access function to database specific ac-
         cess code (e.g. SQL code like <command step="1">DROP TABLE MONO
         IF EXISTS MONO</command>). Depending on the access functions they
         can be organized in groups if a specific functionality requires to run sev-
         eral database functions (e.g. creating all the necessary tables for a new
         database). This is mainly important for SQL databases as here a variation
         of supported SQL types exist.

•        Reference to code (e.g. jar file, dll etc.), If a specific functions needs to run
         at a specific point of time (e.g. creating a new database). This should en-
         able to inject specific implementation code for specific tasks (e.g. if some
         functionality cannot be executed thru SQL commands)

In addition a more generic interface can be called if a database cannot be inte-
grated with the configuration file specifications above. In this case the whole inter-
face for the new database needs to be implemented and made available to
OpenTMS.




Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                       50/72
Software Architecture                                       04/2008/KW




12 TRANSLATION OBJECTS

A key entity in the translation process are translations. Translations (inherently
multilingual) consist usually of segments (monolingual) and languages associated
with those segments.

As a consequence the architecture uses three types of language related entities.
This objects are used by processes to create the translation functionality.

A “General Linguistic Object” (GLO) contains information (features, attributes)
which are common to all linguistic information types. Examples are: unique id,
creation and modification dates, authors etc.. Linguistic Objects always can be
serialized to XML. Main supported formats are here: XLIFF, TMX and TBX.

From that object two objects are derived:

•           A “Monolingal Object” (MoLO) which represents a linguistic entity for a
            given language. It inherits all the features of GLO and adds for example
            the language of the entity (segment).

•           A “Multilingual Object” (MuLO) represents translations by linking one or
            more MoLOS into one object. A MuLO constists at least of one MoLO and
            can contain up to n MoLOS. It is not required that each MoLO of a MuLO
                                     4
            has a different language.

Each of those object types contain a unique id, in addition a MoLo inherits an
MuLO related id so that it can be easily associated with its translations.




4
    The behaviour of multilingual objects can be configured. One option can be to treat all entries as
      bi-lingual objects only. Thus one MuLo only would contain MoLos – a source and target MoLo.
      Normally options like this should be used with caution as they introduce problems in managing
      real multilingual databases. This is esp. true if one source segment may have several transla-
      tons (target MoLos). Nevertheless there may be cases where one requires to have several
      translations for a source segment, eg. Something like a temporary translation. In this caseit is
      suggested to associate “status attributes” with the MoLo. This could be the used on the one
      hand as a sorting criteria for matches and on the other hand for identifying problem transla-
      tions.


Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                                  51/72
Software Architecture                         04/2008/KW



Obviously attributes are associated with Linguistic Objects. As several standards
are used (TMX, XLIFF and TBX) a mapping of the attributes between the different
types is required. Within the object the attributes may be identified through their
name space.




Fig 18: Representation of linguistic entities as General Linguistic Object

12.1 Format information
Format information (e.g. transported thru the <ph> tag in XLIFF ) and its correct
handling is a key and kernel function of OpenTMS. The core OpenTMS library
contains all the necessary functions to handle format information correctly.

OpenTMS should aim at providing the highest possible support in format handling.

12.2 Terminology versus Translation Memory


Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                52/72
Software Architecture                        04/2008/KW



Within computational linguistics a key difference is made between terminology and
translation memory. Both concepts clearly are used in two different contexts. This
is also reflected that there are (at least) two standards: TMX (TMX, 2008) and TBX
(TBX, 2008). Nevertheless from a conceptual and software engineering point of
view both concepts share more than distinguish them. Both have “strings” as their
basic representations – either as terms or as segments – and also meta informa-
tion matches in most cases. A main difference is their context usage. TMs are
normally applied at segment level; consist normally of more characters), while
terms are used at a sub segment (word, phrase) level.

As this differences only appear at the usage level OpenTMS consequently imple-
ments the same underlying (database) structure for TM and term entries. Using
special markers a distinction can be made at run time (= usage time). The advan-
tage immediately can be seen that by this approach both concepts can be used in
different usage contexts. Search and retrieval functionality is available for both
concepts (e.g. fuzzy search is rarely available for term databases; using a com-
mon internal representation this drawback is overcome).




Fig 19: Conversions of linguistic entities

12.3 Variables , placeholders, replacement classes
Translation memory entries, sometimes also terminology entries, often contain
textual parts which can act as placeholders. Typical examples of placeholders are



Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008               53/72
Software Architecture                           04/2008/KW



numbers, month names, acronyms etc. In many cases it is possible automatically
replacing those “variable parts” with their actual counterpart in a segment. This is
esp. useful in matching, e.g. just be replacing the numbers in a match with its cor-
rect value to achieve a better match, even a perfect match.

OpenTMS supports for this reason the concept of replacement classes. A re-
place class is specific construct which generalizes a certain type of string or infor-
mation. A replacement class consists of basically two parts:

    •   A class name (e.g. number)

    •   A procedure describing the replacement class. In many cases the proce-
        dure can be defined through a regular expression. Another option maybe
        that specific strings (e.g. terms from a terminology database) may act as
        replacement class.

    •   A procedure maybe language dependent. If a procedure is language de-
        pendent transformation rules have to be defined how a value of language A
        is transformed to a language B.

Example:

Class: GeneralNumber
Procedures:
General:
      Definition: ([0-9]+?)(.)([0-9]+?)
      Transform: $1.$2
German:
      Definition: ([0-9]+?)(,)([0-9]+?)
      Transform: $1,$2

The basic idea is that a language specific procedure involves two parts:

    •   a definition part which describes how to detect (evaluate) an instance of a
        replacement class

    •   a transformation part which describes how to compute the instance of a
        replacement class given that a replacement class has been detected (e.g.
        in another language)




Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                   54/72
Software Architecture                         04/2008/KW



When a replacement class matches parts of segment the matching part is re-
placed with replacement class carrying forward the class name and the value of
the original class.

Replacement classes invoke two main challenges:

    •   A key problem in defining replace classes is the order in which they are
        involved (checked). Depending on the definition of the regular expression
        several expression may match (e.g. numbers without and with decimal
        points). Open TMS should apply a strict linear order procedure. The first
        matching expression is applied and used.

    •   The other key problem is checking if all the replacement classes appear a)
        in both source and target match and b) appear in the source segment (the
        one which requires translation). For OpenTMS the proposed solution is that
        the replacement classes in both source and target have to mach exactly. If
        this is given the replacement classes also have to match source segment to
        be translated. It has to be noted that another approach could be used too –
        removing the non matching replacement classes in all three involved
        strings.




Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                55/72
Software Architecture                          04/2008/KW




13 PROCESS MODEL

13.1 OpenTMS Process
An OpenTMS process realizes the functionality of the OpenTMS system – mainly
supporting the translation process. Examples of processes are converters, seg-
menters, translation memories, machine translation, statistics modules etc.

OpenTMS processes build on the core library functions and move them into a
process environment. In many cases this does not really mean that a process is
created in the deep meaning of a process, it also cold mean that a function of the
core library (but any othr function defined in another OpenTMS context) is called
from an application.

13.2 OpenTMS Scripting Language
Most OpenTMS processes are available through the OpenTMS Scripting Lan-
guage (OpenTMSL). The OpenTMS Scripting language enables developers and
users to write their own scripts to adapt the OpenTMS processes to their needs.

OpenTMSL is defined in a programming language independent way and should be
implemented in different programming languages. It basically makes the functions
defined in the core library accessible to the public through an easy to learn script-
ing language.




Fig 20: OpenTMS Scripting Language




Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008                  56/72
Open Tms Software Architecure
Open Tms Software Architecure
Open Tms Software Architecure
Open Tms Software Architecure
Open Tms Software Architecure
Open Tms Software Architecure
Open Tms Software Architecure
Open Tms Software Architecure
Open Tms Software Architecure
Open Tms Software Architecure
Open Tms Software Architecure
Open Tms Software Architecure
Open Tms Software Architecure
Open Tms Software Architecure
Open Tms Software Architecure
Open Tms Software Architecure

Contenu connexe

Similaire à Open Tms Software Architecure

Atoll platform highlights and lte
Atoll platform highlights and lteAtoll platform highlights and lte
Atoll platform highlights and lteHassan Ghasemi
 
PATHS Functional specification first prototype
PATHS Functional specification first prototypePATHS Functional specification first prototype
PATHS Functional specification first prototypepathsproject
 
WeWebU OpenWorkdesk / OpenECM-Framework
WeWebU OpenWorkdesk / OpenECM-FrameworkWeWebU OpenWorkdesk / OpenECM-Framework
WeWebU OpenWorkdesk / OpenECM-FrameworkWeWebU Software AG
 
[EN] PLC programs development guidelines
[EN] PLC programs development guidelines[EN] PLC programs development guidelines
[EN] PLC programs development guidelinesItris Automation Square
 
Oracle Apps INVENTORY
Oracle Apps INVENTORY Oracle Apps INVENTORY
Oracle Apps INVENTORY Manu MK
 
Documentation Guidelines
Documentation GuidelinesDocumentation Guidelines
Documentation GuidelinesGreg Turnbull
 
16.7_Release_Notes.pdf
16.7_Release_Notes.pdf16.7_Release_Notes.pdf
16.7_Release_Notes.pdfAbhySingh3
 
LoCloud - D3.4: Vocabulary services
LoCloud - D3.4: Vocabulary servicesLoCloud - D3.4: Vocabulary services
LoCloud - D3.4: Vocabulary serviceslocloud
 
D4.3. Content and Concept Filter V1
D4.3. Content and Concept Filter V1D4.3. Content and Concept Filter V1
D4.3. Content and Concept Filter V1LinkedTV
 
Kessuud Process Model2.1
Kessuud Process Model2.1Kessuud Process Model2.1
Kessuud Process Model2.1chen meng
 
HOL-0419-01-PowerProtect_Data_Manager_-19.11.pdf
HOL-0419-01-PowerProtect_Data_Manager_-19.11.pdfHOL-0419-01-PowerProtect_Data_Manager_-19.11.pdf
HOL-0419-01-PowerProtect_Data_Manager_-19.11.pdfHua Chiang
 
DBMS_Lab_Manual_&_Solution
DBMS_Lab_Manual_&_SolutionDBMS_Lab_Manual_&_Solution
DBMS_Lab_Manual_&_SolutionSyed Zaid Irshad
 
Oracle applications developer’s guide
Oracle applications developer’s guideOracle applications developer’s guide
Oracle applications developer’s guideSing Light
 
Guia de linguagem Assembly para microcontroladores Atmel AVR.
Guia de linguagem Assembly para microcontroladores Atmel AVR.Guia de linguagem Assembly para microcontroladores Atmel AVR.
Guia de linguagem Assembly para microcontroladores Atmel AVR.Fabio Curty
 
Software requirements specifications wp2
Software requirements specifications wp2Software requirements specifications wp2
Software requirements specifications wp2ambitlick
 

Similaire à Open Tms Software Architecure (20)

Atoll platform highlights and lte
Atoll platform highlights and lteAtoll platform highlights and lte
Atoll platform highlights and lte
 
MSc dissertation np
MSc dissertation npMSc dissertation np
MSc dissertation np
 
PATHS Functional specification first prototype
PATHS Functional specification first prototypePATHS Functional specification first prototype
PATHS Functional specification first prototype
 
WeWebU OpenWorkdesk / OpenECM-Framework
WeWebU OpenWorkdesk / OpenECM-FrameworkWeWebU OpenWorkdesk / OpenECM-Framework
WeWebU OpenWorkdesk / OpenECM-Framework
 
[EN] PLC programs development guidelines
[EN] PLC programs development guidelines[EN] PLC programs development guidelines
[EN] PLC programs development guidelines
 
Oracle Apps INVENTORY
Oracle Apps INVENTORY Oracle Apps INVENTORY
Oracle Apps INVENTORY
 
tutorialSCE
tutorialSCEtutorialSCE
tutorialSCE
 
Documentation Guidelines
Documentation GuidelinesDocumentation Guidelines
Documentation Guidelines
 
16.7_Release_Notes.pdf
16.7_Release_Notes.pdf16.7_Release_Notes.pdf
16.7_Release_Notes.pdf
 
DITA
DITADITA
DITA
 
LoCloud - D3.4: Vocabulary services
LoCloud - D3.4: Vocabulary servicesLoCloud - D3.4: Vocabulary services
LoCloud - D3.4: Vocabulary services
 
D4.3. Content and Concept Filter V1
D4.3. Content and Concept Filter V1D4.3. Content and Concept Filter V1
D4.3. Content and Concept Filter V1
 
Kessuud Process Model2.1
Kessuud Process Model2.1Kessuud Process Model2.1
Kessuud Process Model2.1
 
MIL-STD-498:1994
MIL-STD-498:1994MIL-STD-498:1994
MIL-STD-498:1994
 
Hrms
HrmsHrms
Hrms
 
HOL-0419-01-PowerProtect_Data_Manager_-19.11.pdf
HOL-0419-01-PowerProtect_Data_Manager_-19.11.pdfHOL-0419-01-PowerProtect_Data_Manager_-19.11.pdf
HOL-0419-01-PowerProtect_Data_Manager_-19.11.pdf
 
DBMS_Lab_Manual_&_Solution
DBMS_Lab_Manual_&_SolutionDBMS_Lab_Manual_&_Solution
DBMS_Lab_Manual_&_Solution
 
Oracle applications developer’s guide
Oracle applications developer’s guideOracle applications developer’s guide
Oracle applications developer’s guide
 
Guia de linguagem Assembly para microcontroladores Atmel AVR.
Guia de linguagem Assembly para microcontroladores Atmel AVR.Guia de linguagem Assembly para microcontroladores Atmel AVR.
Guia de linguagem Assembly para microcontroladores Atmel AVR.
 
Software requirements specifications wp2
Software requirements specifications wp2Software requirements specifications wp2
Software requirements specifications wp2
 

Plus de Klemens Waldhör

Bilingual Term Extraction Tool (in German)
Bilingual Term Extraction Tool (in German)Bilingual Term Extraction Tool (in German)
Bilingual Term Extraction Tool (in German)Klemens Waldhör
 
Bilingual Term Extraction Tool (in English)
Bilingual Term Extraction Tool (in English)Bilingual Term Extraction Tool (in English)
Bilingual Term Extraction Tool (in English)Klemens Waldhör
 
Heartsome Europe TMX Editor
Heartsome Europe TMX EditorHeartsome Europe TMX Editor
Heartsome Europe TMX EditorKlemens Waldhör
 
Heartsome Europe Xliff Editor User Guide German
Heartsome Europe Xliff Editor User Guide GermanHeartsome Europe Xliff Editor User Guide German
Heartsome Europe Xliff Editor User Guide GermanKlemens Waldhör
 
Heartsome Europe Xliff Editor User Guide English
Heartsome Europe Xliff Editor User Guide EnglishHeartsome Europe Xliff Editor User Guide English
Heartsome Europe Xliff Editor User Guide EnglishKlemens Waldhör
 
Bilingual TMX EditorTool (in German)
Bilingual TMX EditorTool (in German)Bilingual TMX EditorTool (in German)
Bilingual TMX EditorTool (in German)Klemens Waldhör
 
Heartsome Europe Bilingual TMX EditorTool (in English)
Heartsome Europe Bilingual TMX EditorTool (in English)Heartsome Europe Bilingual TMX EditorTool (in English)
Heartsome Europe Bilingual TMX EditorTool (in English)Klemens Waldhör
 
Bilingual Terminology Extraction
Bilingual Terminology ExtractionBilingual Terminology Extraction
Bilingual Terminology ExtractionKlemens Waldhör
 
Vortrag Ostbayrischer Tourismustag2008 Waldhoer
Vortrag Ostbayrischer Tourismustag2008 WaldhoerVortrag Ostbayrischer Tourismustag2008 Waldhoer
Vortrag Ostbayrischer Tourismustag2008 WaldhoerKlemens Waldhör
 

Plus de Klemens Waldhör (12)

Folt Treffen 22062009
Folt Treffen 22062009Folt Treffen 22062009
Folt Treffen 22062009
 
Folt Treffen 16122008
Folt Treffen 16122008Folt Treffen 16122008
Folt Treffen 16122008
 
Bilingual Term Extraction Tool (in German)
Bilingual Term Extraction Tool (in German)Bilingual Term Extraction Tool (in German)
Bilingual Term Extraction Tool (in German)
 
Bilingual Term Extraction Tool (in English)
Bilingual Term Extraction Tool (in English)Bilingual Term Extraction Tool (in English)
Bilingual Term Extraction Tool (in English)
 
Heartsome Europe TMX Editor
Heartsome Europe TMX EditorHeartsome Europe TMX Editor
Heartsome Europe TMX Editor
 
Heartsome Europe Xliff Editor User Guide German
Heartsome Europe Xliff Editor User Guide GermanHeartsome Europe Xliff Editor User Guide German
Heartsome Europe Xliff Editor User Guide German
 
Heartsome Europe Xliff Editor User Guide English
Heartsome Europe Xliff Editor User Guide EnglishHeartsome Europe Xliff Editor User Guide English
Heartsome Europe Xliff Editor User Guide English
 
Bilingual TMX EditorTool (in German)
Bilingual TMX EditorTool (in German)Bilingual TMX EditorTool (in German)
Bilingual TMX EditorTool (in German)
 
Heartsome Europe Bilingual TMX EditorTool (in English)
Heartsome Europe Bilingual TMX EditorTool (in English)Heartsome Europe Bilingual TMX EditorTool (in English)
Heartsome Europe Bilingual TMX EditorTool (in English)
 
Bilingual Terminology Extraction
Bilingual Terminology ExtractionBilingual Terminology Extraction
Bilingual Terminology Extraction
 
Heartsome Portfolio
Heartsome PortfolioHeartsome Portfolio
Heartsome Portfolio
 
Vortrag Ostbayrischer Tourismustag2008 Waldhoer
Vortrag Ostbayrischer Tourismustag2008 WaldhoerVortrag Ostbayrischer Tourismustag2008 Waldhoer
Vortrag Ostbayrischer Tourismustag2008 Waldhoer
 

Dernier

ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 

Dernier (20)

ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 

Open Tms Software Architecure

  • 1. Software Architecture 04/2008/KW OPENTMS SOFTWARE ARCHITECTURE Roßtal, 29/08/2008 Doc.Nr.: HEA-1.1-2008 Version 1.3 Author: Dr. Klemens Waldhör / klemens.waldhoer@heartsome.de Location: OpenTMS_Software_Architecure_v1.3.doc www.folt.de
  • 2. Software Architecture 04/2008/KW 1 VERSIONING INFORMATION • V0.1 – Version 0.1 – April/May/June2008: Start Version; Klemens Wald- hör, Heartsome Europe - TOSS_Software_Architecure.doc; • V1.0 – Version 1.0 – 05.08.2008: Initial version; Klemens Waldhör, Heart- some Europe; based on discussion with Michael Schneider, beodoc, 04.07.2008 - OpenTMS_Software_Architecure_v1.0.doc • V1.1 – Version 1.1 – 30.08.2008: Modifications based on the FOLT inter- nal architecture discussion meeting, 29.08.2008, Acolada GmbH, Nürn- berg. Participants: Ulrike Baral, beodoc; Torsten Kuprat; Michael Schnei- der, beodoc; Klemens Waldhör, Heartsome Europe; Thomas Wedde, eu- roscript; OpenTMS_Software_Architecure_v1.1.doc Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 2/72
  • 3. Software Architecture 04/2008/KW 2 PREFACE This manual gives an overview of the software architecture OpenTMS. It is based on the requirements defined in the FOLT Open Source Initiative (Folt, 2007b). The architecture of OpenTMS is mainly based on several models. These models describe the key components of OpenTMS. Each model handles a specific aspect of the translation process and its requirements. The models form a framework which guide the construction of language specific software tools. The following core models are identified: • Security model • Document model • Process model • User model • Data model • GUI model • Interface model On top of those models the application model organises real applications (like the GUI model). OpenTMS uses a data source in the data model which organises the access to database or any kind device which allows to store (TM or terminology) data. The architecture also contains a description of some basic functions which can form the basic core of translation tools. The architecture is defined in such a way that is can be easily extended with new functions or combining existing functions to new functionality. Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 3/72
  • 4. Software Architecture 04/2008/KW CONTENTS 1 VERSIONING INFORMATION .........................................................................2 2 PREFACE.........................................................................................................3 3 LIST OF TABLES AND FIGURES ...................................................................7 4 DEFINITIONS ...................................................................................................8 5 INTRODUCTION ............................................................................................12 5.1 Arguments for an OpenTMS Software Architcture......................................12 5.2 Basics .........................................................................................................12 5.2.1 Naming conventions........................................................................................ 12 5.2.2 Naming of OpenTMS specific functions/methods ............................................ 13 5.3 Character set ..............................................................................................13 5.4 Standards ...................................................................................................13 5.5 Basic Requirements ...................................................................................14 5.6 Architecture ................................................................................................14 6 OPENTMS ARCHITECTURE AND MODELS................................................16 6.1 Parameters in OpenTMS models ...............................................................16 6.2 Core Models of OpenTMS ..........................................................................18 6.3 OpenTMS Core Library...............................................................................20 6.4 The Application Model ................................................................................20 6.5 Implementation Languages ........................................................................21 7 SECURITY MODEL........................................................................................22 7.1 Security, OpenTMS and Programming Languages ....................................23 7.2 Communication Level .................................................................................24 7.3 Document Level..........................................................................................24 7.4 Database Level...........................................................................................25 7.5 Security Level .............................................................................................25 8 BASIC OPENTMS COMPONENTS ...............................................................27 9 DOCUMENT MODEL .....................................................................................30 9.1 Documents ...............................................................................................30 Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 4/72
  • 5. Software Architecture 04/2008/KW 9.2 Character Sets.........................................................................................31 9.3 XML document handling ........................................................................31 9.4 XLIFF Documents ....................................................................................31 9.4.1 OpenTMS and Skeleton files ........................................................................... 32 9.4.2 Security and encryption in XLIFF – secureXLIFF............................................. 33 9.5 TMX Documents ......................................................................................33 9.5.1 Security and encryption in TMX – secureTMX................................................. 34 9.6 TBX Documents .......................................................................................34 9.6.1 Security and encryption in TBX – secure TBX ................................................. 34 9.7 Other Documents ....................................................................................35 9.8 Basic Document Access Functionality ........................................................35 10 OPENTMS AS A CLIENT/SERVER ARCHITECTURE..................................37 11 DATA MODEL................................................................................................41 11.1 Data sources ..............................................................................................41 11.2 TM Matches................................................................................................43 11.3 Basic data source access functionality .......................................................44 11.4 Databases ..................................................................................................47 11.4.1 Open source SQL data bases ......................................................................... 47 11.4.2 Closed source SQL databases ........................................................................ 47 11.4.3 Alternatives ..................................................................................................... 47 11.4.4 Database Access ............................................................................................ 49 11.4.5 Database and data source configuration ......................................................... 49 12 TRANSLATION OBJECTS ............................................................................51 12.1 Format information .....................................................................................52 12.2 Terminology versus Translation Memory....................................................52 12.3 Variables , placeholders, replacement classes...........................................53 13 PROCESS MODEL ........................................................................................56 13.1 OpenTMS Process .....................................................................................56 13.2 OpenTMS Scripting Language ...................................................................56 13.3 OpenTMSL Communication Methods.........................................................58 14 USER MODEL................................................................................................59 14.1 User roles ...................................................................................................59 Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 5/72
  • 6. Software Architecture 04/2008/KW 14.2 Basic user functionality ...............................................................................60 15 GUI MODEL ...................................................................................................61 16 INTERFACE MODEL .....................................................................................62 17 CONFIGURING OPENTMS............................................................................63 17.1 Naming of the configuration file ..................................................................64 17.2 Structure of the configuration file ................................................................64 17.3 Configuration Options .................................................................................65 18 DMS INTERFACE ..........................................................................................66 19 BIBLIOGRAPHY ............................................................................................68 20 APPENDIX .....................................................................................................69 20.1 Multiple translations for a linguistic concept................................................69 Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 6/72
  • 7. Software Architecture 04/2008/KW 3 LIST OF TABLES AND FIGURES Fig 1: OpenTMSName defined as a regular expression 12 Fig 2: Naming of OpenTMS functions for export 13 Fig 3: OpenTMS Procedure description 15 Fig 4: OpenTMS Models 18 Fig 5: Example securing XLIFF document exchange 23 Fig 6: OpenTMS Objects 28 Fig 7: XLIFF File 32 Fig 8: Some basic XLIFF File functions 36 Fig 9: Hierarchy of processes 38 Fig 10: Applications 38 Fig 11: Pipeline Architecture 40 Fig 12: Data sources and data components 41 Fig 13: Data sources with several data components 42 Fig 14: Data source access types 45 Fig 15: Data source access types 46 Fig 16:Configuring different database types 49 Fig 17: Representation of linguistic entities as General Linguistic Object 52 Fig 18: Conversions of linguistic entities 53 Fig 19: OpenTMS Scripting Language 56 Fig 20: OpenTMSL Inter-process and computer communication 57 Fig 21: Some basic user functions 60 Fig 22: Configuration of OpenTMS 63 Fig 23: Configuration file naming example 64 Fig 24: Configuration option structure 65 Fig 25: OpenTMS options table 65 Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 7/72
  • 8. Software Architecture 04/2008/KW 4 DEFINITIONS Client: A client is an application or system that accesses a (remote) service on another computer system known as a server by way of a network. URL: http://en.wikipedia.org/wiki/Client_%28computing%29 Client-Server: Client-server is a computing architecture which separates a client from a server, and is almost always implemented over a computer network. A cli- ent-server application is a distributed system that constitutes of both client and server software. A client is a software or process that may initiate a communica- tion session, while a server can not initiate sessions, but is waiting for a requests from a client. Client and server may also aim at the host computer hardware con- nected to a network, that are residing the client and server software respectively. URL: http://en.wikipedia.org/wiki/Client-server Doclet: Als Doclet bezeichnet man in Anlehnung an Applets Module, die von Do- kumentationswerkzeugen zur Verarbeitung und automatischen Erzeugung von Dokumentation und eventuell auch Code eingesetzt werden. Bekannt sind Doclets insbesondere im Umfeld der Programmiersprache Java, wo sie als Module im Do- kumentationswerkzeug Javadoc eingesetzt werden. URL: http://de.wikipedia.org/wiki/Doclet. GUI: Graphical User Interface. An application which allows a human user to inter- act with a program thru windows, menus etc. “A graphical user interface (GUI) (IPA: /ˈguːiː/) is a type of user interface which al- lows people to interact with electronic devices like computers, hand-held devices (MP3 Players, Portable Media Players, Gaming devices), household appliances and office equipment. A GUI offers graphical icons, and visual indicators as op- posed to text-based interfaces, typed command labels or text navigation to fully represent the information and actions available to a user. The actions are usually performed through direct manipulation of the graphical elements.” URL: http://en.wikipedia.org/wiki/GUI FOLT: Forum Open Language Tools URL: www.folt.org HTTP: Hypertext Transfer Protocol (HTTP) is a communications protocol for the transfer of information on intranets and the World Wide Web. Its original purpose Dok. Nr.: HEA-1-2008; Version 00 ; Rev.00; April 2007 8
  • 9. Software Architecture 04/2008/KW was to provide a way to publish and retrieve hypertext pages over the Internet. URL: http://en.wikipedia.org/wiki/HTTP HTTPS: Hypertext Transfer Protocol over Secure Socket Layer or HTTPS is a URI scheme used to indicate a secure HTTP connection. It is syntactically identical to the http:// scheme normally used for accessing resources using HTTP. Using an https: URL indicates that HTTP is to be used, but with a different default TCP port (443) and an additional encryption/authentication layer between the HTTP and TCP. This system was designed by Netscape Communications Corporation to provide authentication and encrypted communication and is widely used on the World Wide Web for security-sensitive communication such as payment transac- tions and corporate logons. URL: http://en.wikipedia.org/wiki/Https Open Source: Open source is a development methodology,[1] which offers practi- cal accessibility to a product's source (goods and knowledge). Some consider open source as one of various possible design approaches, while others consider it a critical strategic element of their operations. Before open source became widely adopted, developers and producers used a variety of phrases to describe the concept; the term open source gained popularity with the rise of the Internet, which provided access to diverse production models, communication paths, and interactive communities. The open source model of operation and decision making allows concurrent input of different agendas, approaches and priorities, and differs from the more closed, centralized models of development.[2] The principles and practices are commonly applied to the development of source code for software that is made available for public collaboration, and it is usually released as open-source software. URL: http://en.wikipedia.org/wiki/Open_source RPC: Remote procedure call (RPC) is a technology that allows a computer pro- gram to cause a subroutine or procedure to execute in another address space (commonly on another computer on a shared network) without the programmer explicitly coding the details for this remote interaction. That is, the programmer would write essentially the same code whether the subroutine is local to the exe- cuting program, or remote. When the software in question is written using object- oriented principles, RPC may be referred to as remote invocation or remote method invocation. URL: http://en.wikipedia.org/wiki/Remote_procedure_call Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 9/72
  • 10. Software Architecture 04/2008/KW Server: In information technology, a server is an application or device that per- forms services for connected clients as part of a client-server architecture. A server application, as defined by RFC 2616 (HTTP/1.1), is "an application program that accepts connections in order to service requests by sending back responses." Server computers are devices designed to run such an application or applications, often for extended periods of time with minimal human direction. Examples of d- class servers include web servers, e-mail servers, and file servers. URL: http://en.wikipedia.org/wiki/Server_%28computing%29 Software Architecture: The software architecture of a program or computing sys- tem is the structure or structures of the system, which comprise software components, the externally visible properties of those components, and the relationships between them. The term also refers to documentation of a sys- tem's software architecture. Documenting software architecture facilitates com- munication between stakeholders, documents early decisions about high-level de- sign, and allows reuse of design components and patterns between projects. URL: http://en.wikipedia.org/wiki/Software_architecture. TOMCAT: Apache Tomcat is a Servlet container developed by the Apache Soft- ware Foundation (ASF). Tomcat implements the Java Servlet and the JavaServer Pages (JSP) specifications from Sun Microsystems, and provides a "pure Java" HTTP web server environment for Java code to run. … Apache Tomcat includes tools for configuration and management, but can also be configured by editing configuration files that are normally XML-formatted. URL: http://en.wikipedia.org/wiki/Apache_Tomcat UML (Unified Modeling Language): In the field of software engineering, the Uni- fied / Universal Modeling Language (UML) is a standardized visual specification language for object modeling. UML is a general-purpose modeling language that includes a graphical notation used to create an abstract model of a system, re- ferred to as a UML model. UML is officially defined at the Object Management Group (OMG) by the UML metamodel, a Meta-Object Facility metamodel (MOF). Like other MOF-based specifications, UML has allowed software developers to concentrate more on design and architecture URL: http://en.wikipedia.org/wiki/Unified_Modeling_Language Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 10/72
  • 11. Software Architecture 04/2008/KW Unicode: In computing, Unicode is an industry standard allowing computers to consistently represent and manipulate text expressed in most of the world's writing systems. Developed in tandem with the Universal Character Set standard and published in book form as The Unicode Standard, Unicode consists of a repertoire of more than 100,000 characters, a set of code charts for visual reference, an en- coding methodology and set of standard character encodings, an enumeration of character properties such as upper and lower case, a set of reference data com- puter files, and a number of related items, such as character properties, rules for normalization, decomposition, collation, rendering and bidirectional display order (for the correct display of text containing both right-to-left scripts, such as Arabic or Hebrew, and left-to-right scripts). URL: http://en.wikipedia.org/wiki/Unicode UTF-8: UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode. It is able to represent any character in the Uni- code standard, yet the initial encoding of byte codes and character assignments for UTF-8 is backwards compatible with ASCII. For these reasons, it is steadily becoming the preferred encoding for e-mail, web pages, and other places where characters are stored or streamed. URL: http://en.wikipedia.org/wiki/UTF-8 XML-RPC: XML-RPC is a remote procedure call protocol which uses XML to en- code its calls and HTTP as a transport mechanism. URL: http://en.wikipedia.org/wiki/Xml-rpc Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 11/72
  • 12. Software Architecture 04/2008/KW 5 INTRODUCTION 5.1 Arguments for an OpenTMS Software Architcture The arguments for an open source based localization tool have been discussed in FOLT, 2007a. Software design principles: For end users (translators): easy to install For translation providers: server version, networking For customers: running own servers; secure interfaces 5.2 Basics 5.2.1 Naming conventions OpenTMS uses a standardized naming convention scheme for variables, names in xml file etc. Each legal OpenTMS name (string, literal, variable name, function names) con- sists of one or more words. Variables starts with an uppercase letter. Function names (e.g. identifying processes) start with lowercase. Only the characters [A-Z] are allowed. The remaining characters are either [a-z] or [0-9]. No blanks are al- lowed between words. Word := [A-Z]([a-z]|[0-9])* word := [a-z]([a-z]|[0-9])* OpenTMSName := Word+ OpenTMSFunctionName := word Word* Examples: • The variable: xliffDocument • The function: openXliffDocument Fig 1: OpenTMSName defined as a regular expression Exceptions from the naming conventions could be introduced if acronyms etc. are used for words (e.g. TMX). Nevertheless it is not recommended to do this. Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 12/72
  • 13. Software Architecture 04/2008/KW 5.2.2 Naming of OpenTMS specific functions/methods It is suggested using a consistent OpenTMS naming system for functions and variables which are exported from OpenTMS. Exported functions refer to functions which can be used in applications (similar to the public concept in Java or C++). This immediately helps to identify code which is used in systems outside of OpenTMS. The special string “OpenTMS_” is used for this purpose. ExportOpenTMSName:= “OpenTMS_” Word+ ExportOpenTMSFunctionName := “OpenTMS_” word Word* Examples: • The variable: OpenTMS_Ecoding • The function: OpenTMS_openXliffDocument Fig 2: Naming of OpenTMS functions for export 5.3 Character set OpenTMS uses UTF-8 as basic character set, esp. for exchanging files. 5.4 Standards FOLT builds heavily on the idea of Open Source and using standards. Therefore the FOLT requirements use well-established localization standards to represent various types of localization information - based on XML. • XLIFF - XML based localization exchange format • TTX – Trados TM format • TMX - XML based localization translation memory exchange format • SRX - XML based format for describing segmentation rules • GMX – standard for measuring quantitative aspects in the translation process • TBX / MARTIF / OLIF – formats for representing terminology • CSV • Language Encoding ISO 639… In general the basic architecture makes heavy use of XML. XML based structures are used as the basic mechanism to exchange information between different ap- Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 13/72
  • 14. Software Architecture 04/2008/KW plications (->Translets). Using XML has the advantage that many (open source) parsers are available for different programming languages which enables imple- menting the core OpenTMS architecture in different languages and environments. 5.5 Basic Requirements The following is taken from the FOLT (2007b); it extracts the main requirements: • Software: Web based application; thin client; no installation no properiatary run time components; preferred open source software (FOLT, 2007b, p. 17) • Operating System: OS Independent • Hardware: standard hardware (FOLT, 2007b, p. 17) • Interfaces: Integration into CMS, workflow management should be supported (FOLT, 2007b, p. 17). • Product interfaces: Exchange supported through XLIFF and TMX (FOLT, 2007b, p. 18). • Database: Open source database (FOLT, 2007b, p. 21); basically all SQL da- tabases should be supported, therefore a generic database interface is re- quired. • Scalability: single and multi user requirement 5.6 Architecture The architecture is described mainly in diagrams and text. The target group of this document are mainly non technicians. Therefore it is tried to keep the document as informal as possible without loosing the necessary precision. Further docu- ments or versions of this document may add more details to the various items dis- cussed. If possible the basic methods and classes have been written in Java but this should not induce that the implementation requires Java as an implementation language. The various components described in the document are called models. A model organizes a certain functionality or aspect of the OpenTMS systems. An example of a model is the security model of OpenTMS. This model describes all necessary functions and structures to implement the OpenTMS security system. There are several methods to describe architecture, methods and objects of a piece of software. Within this document mainly diagrams and block diagrams are Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 14/72
  • 15. Software Architecture 04/2008/KW used to show the structure of the software. For describing methods and objects an XML based methodology is used (taken from Tomcat). The following is an example of a method call description using the Tomcat inter- face description. The method will be enhanced by describing also the possible re- turn values. <translet> <translet -name>ApplyTranslationMemoryToSegment</translet-name> <translet-class>com.OpenTMS.translet.translateSegment</translet- class> <init-param> <param-name> TMXDB </param-name> <param-value> OpenTMSexampledatabase </param-value> </init-param> <init-param> <param-name> SEGMENT </param-name> <param-value> This segments needs to be translated. </param-value> </init-param> <init-param> <param-name> FUZZYQUALITY </param-name> <param-value> 70 </param-value> </init-param> </translet> Fig 3: OpenTMS Procedure description Annotation: In order to keep the text more compact function naming does not in- clude the naming scheme described in chapter 5.2.2. But this jus for readability purposes. The real implementation should adhere to the naming scheme. Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 15/72
  • 16. Software Architecture 04/2008/KW 6 OPENTMS ARCHITECTURE AND MODELS The OpenTMS architecture is composed of several models. Each model imple- ments a specific aspect and behavior of the OpenTMS system. Each model com- municates with the other model through parameters and values. 6.1 Parameters in OpenTMS models Parameter and their realization, esp. their types, independently from a specific pro- gramming languages is not really trivial – apart from trivial types like characters, strings, integers or other numbers. Transferring more complex structured informa- tion has to be organized based on those primitive types. Programming languages typically uses “serialization” approaches to achieve at least a transfer of date from one application instance to another instance. OpenTMS tries to use a general parameter / value model which addresses both programming language specific and programming language independent parame- ter / value transfer. In order to make the integration of existing applications possi- ble OpenTMS supports different options for parameter representation. The following methods should be supported: • XML based parameters: all values should be transferred thru xml elements where the value is given thru the element content (string), the name of the parameter as attribute and the type of the parameter as an attribute too. XL based parameter / value transfer is esp. useful when transferring complex structured values between functions (e.g. objects). Nevertheless complex parameters (objects) need to be serialized. It is suggested that OpenTMS defines some additional basic parameter types which often occur in transla- tion tools (e.g. date type, TransUnits from XLIFF, tu or tuvs in TMX). • Tomcat parameters: This follows the way how the TOMCAT server engine defines method calls with parameter values. Actually also XML based. • XML-RPC parameter: This follows the way how XML-RPC defines method calls with parameter values. It supports some basic types like integer etc. More complex parameters have to be serialized. Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 16/72
  • 17. Software Architecture 04/2008/KW • Programming Language specific parameters: Those parameters should be wrapped in a specific object thru serialisation. This parameter type should only be used within a specific implementation where it is very unlikely that it will be used by other programming languages. • Hash tables: Hash tables are supported by most programming languages and transfer between database is often supported. Basically an entry in the table contains a key (the name of the parameter) and the value of the pa- rameter (value of the key). The kernel of each language specific OpenTMS implementation contains a basic library which supports creating reading and writing OpenTMS parameters. Type Comment int Integer as in Java float Float as in Java char Character as in Java String String as in Java Time Date TransUnit XML based XLIFF TransUnit Structure tu XML based TMX tu Structure GLO General Linguistic Object - see chapter 12 MoLo Monolingual Object - see chapter 12 Mulo Multilingual Object - see chapter 12 Fig 4: Table of Core OpenTMS parameter types An example how parameters are used is given in Fig. 2. Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 17/72
  • 18. Software Architecture 04/2008/KW 6.2 Core Models of OpenTMS The following chapter describes the core models of OpenTMS. The key idea is that OpenTMS uses an extendible architecture approach which allows to add new models in an easy, yet compatible way to the kernel architecture. A new model has to fulfill some basic requirements, e.g. that parameters are defined and used in the way as described in the previous chapter 6.1. Fig 5: OpenTMS Models and their relations The OpenTMS models are arranged in a kind of “onion model”. The kernel is rep- resented by the process model which in turn builds on the user, document and data model which model specific aspects of the OpenTMS system. These kernel models are “shielded” by the security model which is responsible for assuring that only allowed operations are performed. Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 18/72
  • 19. Software Architecture 04/2008/KW • Security Model: This model describes the security aspects and require- ments of OpenTMS. Other models use the security model to allow or re- strict the access to OpenTMS specific functions. OpenTMS uses a security model which on the one side secures the communication channel and on the other side secures data (e.g. the value of elements in an xml file or the values in a property file). • User Model: This model realizes the user and its representation in the OpenTMS. The user model works in tight connection with the security. User does now only imply human users, but also other processes. User models have rights attached to them which in turn support the security model of OpenTMS. • Process Model: This model implements the functions (combined finally into applications – see application model) of the OpenTMS, e.g. a converter or a translation memory search. • Data Model: Basically this model implements the database side of OpenTMS. It uses a generalized database model, called data sources. Data sources are any kind of storage media for data, starting from plain text files towards SQL and other types of databases. • Document Model: The document model describes the core documents used in OpenTMS. Basically this is based on XLIFF and TMX. The docu- ment model also could be seen as part of the data model but due to the im- portance of documents as one of the core output produced by the transla- tion and localization process they are modeled separately. • GUI Model: This model specifies editors and other functionality which re- quires a GUI. The GUI model is not further detailed in the architecture specification here. The GUI model should be defined in a separate docu- ment. • Interface Model: The model describes how to extend OpenTMS with new models. The Interface model is an abstract model and needs further inspec- tion. An example of such an extension is the interface to CMS systems. In- terface models are also of quite importance as they serve as the connection Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 19/72
  • 20. Software Architecture 04/2008/KW to other applications (e.g. Web servers, CMS systems) and in general to scripting languages like Perl, PHP etc. • Application Model: This model realizes programs, which performs tasks like translation etc. 6.3 OpenTMS Core Library In order to achieve a consistent implementation and in order to foster a quick im- plementation OpenTMS implements its key functions in a core library. Function implemented in the core library should not be re-implemented (“reinvented”) in ex- ternal functions or processes. Obviously the set of key functions will evolve over time. Functionality and implementation of the core should not be changed without important reasons (similar to the LINUX implementation process). Using a core library OpenTMS will ensure that certain functions behave in the same way across applications. It also gives security to the developer and the user that functionality does not change unforeseeable. Core library functions should be the first one which are realized if OpenTMS is im- plemented in different programming languages. 6.4 The Application Model The OpenTMS architecture just serves as a model how the different aspects of tools supporting the translation process can be implemented. As a model it is in- dependent from any programming language. Applications need to be written in order to make the functionality of OpenTMS accessible to users. This is realized in the application model. The GUI model can be seen as an example of an application model. Applications obviously depend on the existence of a concrete implementation in an existing programming language (Java, C#, Perl or whatever). In this sense OpenTMS provides a programming framework which allows to construct language support tools. Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 20/72
  • 21. Software Architecture 04/2008/KW In the beginning OpenTMS will come with some basic applications (Editors etc.). But the main idea is that a profound framework is defined and specified which al- lows the construction of new language applications. OpenTMS also supports its own scripting language (OpenTMSL). This language makes the OpenTMS functions accessible thru simple calls (similar to batch files). This scripting language can also be used to construct applications. 6.5 Implementation Languages In a first step it is suggested to implement a Java version of OpenTMS. Java has the advantage compared to other languages that it runs on several operating ma- chines (which is one of the goals of FOLT and OpenTMS). Integrating tools written in other language can be done as OpenTMS from its basic model is constructed toward using XML-RPC and similar communication modes. The basic Java implementation can serve as the basis for other implementations (C, C#, C++, Perl, PHP etc.). With regard to security issues associated with choosing a proper programming languages see chapter 7. Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 21/72
  • 22. Software Architecture 04/2008/KW 7 SECURITY MODEL A key success factor of the OpenTMS system is security. As translation always can involve documents of various security levels a proper handling of the docu- ments and document transmission is required. Depending on the security level data can be encoded/encrypted. It is suggested to use three different levels. • Level 0: No security procedures are applied, data are transferred as they are. • Level 1: The communication channel is secured. It uses standard secure protocols here. • Level 2: Encoding for security is done here on data level. Basically this means that strings are encrypted when the are communicated through a communication channel or are written or retrieved from a database. This also involves encrypted XLIFF files (resp. parts of it). • Level 4: GUI level related security Level 1 and 2 can be used together to achieve optimal security where necessary. Security is attached to the OpenTMS User model. A key feature of the OpenTMS architecture is that the security model is transpar- ent. Actually when writing a (new) application the programmer does not need to take care of the security expect. The OpenTMS kernel provides all the functions and interfaces to make those calls transparent; supplying the correct parameters is sufficient. Actually another type of security level (Level 4) can be introduced at GUI level. At this level functions like copy and paste are secured in addition. This should pro- hibit that users can copy and paste the content of text windows (editing windows) into other applications. Defining this security level will be left to the GUI model definition. Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 22/72
  • 23. Software Architecture 04/2008/KW The following diagram shows how several methods can be combined to achieve a high security during the transmission of an XLIFF file. In this example in a first step the XLIFF is secured (encrypted). Once a transfer of the file during the net work is required the channel as such is also secured. Once the XLIFF file is received it is decoded by the OpenTMS system. From a programmatic side this is just realised. by setting and defining the security to be used. Fig 6: Example securing XLIFF document exchange 7.1 Security, OpenTMS and Programming Languages In the previous chapter the issue of programming languages has been discussed. A common known problem with programming languages – more precisely with applications written in those languages and often also only associated with specific Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 23/72
  • 24. Software Architecture 04/2008/KW operating systems – security measures are often not properly implemented (e.g. the very old problem of “buffer overflows” in C). OpenTMS overcomes this problem by clearly defining specific modules which are encapsulated and follow modern software development rules (e.g. access only thru well defined interfaces) a special security layer wraps the various modules. This architecture specification is mainly targeted towards the server part of OpenTMS. Thus it is independently from any GUI application. GUIs can use OpenTMS basically in two ways: a) thru the OpenTMS server functionality: This approach encapsulates all modules and functions and gives the highest possible security measure. Here only “public server sided functionality” can be used. b) Directly calling functions from the OpenTMS library: Obviously this can cause problems if the GUI does not call the functions properly (esp. in pro- gramming languages like C or C++). One of the OpenTMS target GUIs are web based applications (browser based). Those will call all the functionality thru a web server, SOAP or XML-RPC inter- faces. This minimises the danger of introducing security problem on the client size (e.g. for GUIs which have to follow requirements like ZDv 54/100 VS-NfD „IT- Sicherheit in der Bundeswehr“). By restricting to “plain HTML” one can reduce the risk to a minimum. Obviously increasing the security level goes with a decrease in comfort und user friendliness. This decision is up to the end user and his organisa- tion. 7.2 Communication Level Communications which goes through TCP/IP should support (strong) encryption of the data transmitted. This is done in addition to using protocols like https, se- cureFTP etc. 7.3 Document Level The basis of most activities in OpenTMS are documents. Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 24/72
  • 25. Software Architecture 04/2008/KW A key problem is the transfer of xliff files. The content of the segments are nor- mally readable by human readers. If required the segments in the xliff files (as well as in tmx or tbx files) can be encrypted (creating something like a secureXLIFF, secureTMX, secureTBX). The segments can only be read in conjunction with a user and password. The users who have regular access to the content can be stored in encrypted form in the header of the xliff file or be supplied when opening the xliff document. 7.4 Database Level Database entries follow the same procedure. If required the entries should be en- crypted. At this level database specific security functionality can and should be applied to. Without the knowledge of the user - password combination an export etc. of the database does not provide any information in case of an attack. In addition any data base security layers need to be supported too. 7.5 Security Level The following functions assume that each encryption and decryption process as- sociates the relevant user and his roles with the security function. At this point no function parameters are defined. This will be done in an implementation manual. Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 25/72
  • 26. Software Architecture 04/2008/KW Function Comment Encrypt / Decrypt General function which encrypts and decrypts any type of document Encrypt XLIFF This function encrypts the texts (segments) of a XLIFF document. The xml structure as such is still Decrypt XLIFF visible. Depending on the parameters supplied attributes etc. are secured too. Encrypt TMX This function encrypts the texts (segments) of a TMX document. The xml structure as such is still Decrypt TMX visible. Depending on the parameters supplied attributes etc. are secured too. Encrypt TBX This function encrypts the texts (segments) of a TBX document. The xml structure as such is still Decrypt TBX visible. Depending on the parameters supplied attributes etc. are secured too. Establish Secure Communi- Establish a secure communication channel. The cation type of security depends on the supplied parame- ters. Terminate Secure Communi- Terminates a secure communication channel. cation Secure Data Source Enables the encryption / decryption of database entries. Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 26/72
  • 27. Software Architecture 04/2008/KW 8 BASIC OPENTMS COMPONENTS The OpenTMS framework is organized around a set of basic components called models (see chapter 6) which interact and allow to apply processes on them. The following is a brief overview which basic models exist: • Documents: Documents form one key feature of the architecture. Basically documents are every form of text. Translations and other modification proc- esses (e.g. segmentation) are applied to documents. A key document type in OpenTMS is an XLIFF document which is main paradigm for communication text between various processes. • Database: Database refers to any kind of storage which can be used to re- trieve a specific text or sub-text (like a paragraph, segment). Database in the OpenTMS context is understood widely, starting from simple text files towards highly sophisticated SQL or object oriented database systems. OpenTMS uses a general database object which can come in various flavors, e.g. translation memory, a phrase database or terminology databases. OpenTMS database architecture supports various security levels. Encrypting of entries should be supported. OpenTMS uses the notion of “data source” for this generalized data bases. • Processes: Processes apply operations to documents and databases. Opera- tions could be: modifications, inserting, searching, editing, converting etc. A key process in OpenTMS is the translations process. OpenTMS processes are named “Translets” (or Translet in singular). An example of a Translet is a Do- clet, a module which is applied for the conversion, modification etc. of docu- ments. Processes in OpenTMS are normally accessible through the OpenTMS Scripting Language, a language which gives access to the core operations of the OpenTMS architecture (similar to Java Scripts) Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 27/72
  • 28. Software Architecture 04/2008/KW Fig 7: OpenTMS Objects From a certain perspective processes can be seen as a special type of commu- nication. Within OpenTMS three different communication types can be distin- guished. Communication is here used in a broad view. • Command (file) based process: Here an executable is run (batch mode). Command processes use xml based command files as input parameters. • Function based process: Here the specific process is called either as a func- tion or method within a piece of software. • Net (TCP/IP) based process: Here a process is run through a net work (TCP/IP) using SOAP, RPC, XML-RPC or similar communication methods. The method is activated in a certain process while the actual execution is run in an- other process (could be a server, a virtual machine, multi threading or similar). • Workflow: A workflow is a set of processes which are applied in a specific se- quence. A workflow also may involve humans as part of the workflow. A typical workflow could be: PM received document to translate – determines document characteristics – compute statistics – provides offer – client accepts offer – PM determines translator – converts document for translator – sends to translator – and so on. This means that a workflow also can contain purely humans actions interwoven with computer processes. Anyway each human process must be mapped to a computer process. Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 28/72
  • 29. Software Architecture 04/2008/KW Later in the document it is mentioned that processes can be organized in pipe- lines. Actually this means that one process can take the output of another process, do some computation on this output and create a new output which itself can now form the input to another process. Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 29/72
  • 30. Software Architecture 04/2008/KW 9 DOCUMENT MODEL 9.1 Documents Documents(“texts”) are a core concept in OpenTMS. Documents are normally the core interest as documents need to be translated. Documents normally come into OpenTMS as input or output. Documents are normally processed in OpenTMS thru XLIFF (chapter 9.4). Documents are converted into XLIFF and back. Docu- ments come in various formats, e.g.: • WinWord • RTF • Plain text • HTML • XML • OpenOffice • program texts • resource files • property files • database entries • any other common location industry formats • any other document type The most simple type of a document is a string, a sequence of characters. For OpenTMS processes strings are packed into XML structures, mainly a subset of XLIFF. A key property of a document is a language associated with it – although the lan- guage itself may vary within the document. If a document gets translated at least a second language is associated with it. Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 30
  • 31. Software Architecture 04/2008/KW 9.2 Character Sets OpenTMS uses the Unicode character set for all (internal) representation pur- poses. This has the advantage that most of the characters used worldwide can be processed with OpenTMS. Also most programming languages use nowadays Uni- code as their internal character representation. UTF-8 formatted text is used as the core character set if OpenTMS produces and delivers files which are some kind of final document (e.g. for statistics output). De- viations come in if the original character set differs. The core library of OpenTMS contains basic functions to convert from one charac- ter set to another character set. In addition the kernel library should contain some functions which allow the detection of a character format of a document. 9.3 XML document handling OpenTMS heavily uses XML bases standards (XLIFF, TMX, TBX). There are sev- eral good open source implementations for XML handling available (DOM model, SAX parser, JDOM just to name a view). Obviously those functions should used to manipulate those documents. On top of the standard xml library functionality functions are required to support the manipulation of the translation / localization XML standards. Those functions will also be part of the core library. 9.4 XLIFF Documents XLIFF documents form the core document type on which most of the processes are applied (segmentation, translation etc.). XLIFF documents are created by con- verters. Converters take different document formats (rtf, xml, html etc.) and con- vert them to the xml based XLIFF format (XLIFF, 2008). The following shows a very simple example of an XLIFF document. Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 31/72
  • 32. Software Architecture 04/2008/KW <?xml version="1.0" encoding="UTF-8" ?> <xliff version="1.0"> <file datatype="XML" original="D:arayatestsimplexmlsimplexml.xml" source-language="de" target-language="es"> <header> <phase-group> Header of the XLIFF File <phase company-name="Araya" date="Sun May 11 11:29:11 CEST 2008" phase- name="1" process-name="pre-process" tool="XML2XLIFF version 2.0"/> <phase company-name="Araya" date="Sun May 11 11:29:11 CEST 2008" phase- name="2" process-name="Segmentation" tool="SEGMENTER version 2.0"/> </phase-group> <skl> Reference to an external file <external-file href="C:arayasklsimplexml.xml.27120.skl"/> <internal-file form="mimestring">PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiID8+DQo 8c2ltcGxleG1sPg0KPHNl Internal File Z21lbnQ+JSUlMCUlJQo8L3NlZ21lbnQ+DQo8c2VnbWVudD4lJSUxJSUlCjwvc2VnbWVudD4NC jwv c2ltcGxleG1sPg==</internal-file></skl> <prop-group name="encoding"><prop prop-type="encoding">UTF- 8</prop></prop-group> <prop-group name="xmlformat"> <prop Properties of the XLIFF File prop-type="donotresolveentitiesfile">C:arayainiedqm- ent.txt</prop> <prop prop-type="iniFile">c:/Araya/ini/config_simplexml.xml</prop> </prop-group> <prop-group name="specialinfo"> </prop-group> </header> <body> <trans-unit approved="no" help-id="0" id="0" xml:space="preserve"> <source xml:lang="de">Das ist ein Segment</source> <target xml:lang="es" xml:space="preserve"/><prop-group><prop prop- type="segmentid">1067381512</prop></prop-group></trans-unit> Segments <trans-unit approved="no" help-id="1" id="1" xml:space="preserve"> <source xml:lang="de">Das ist ein <ph id="0">&lt;b&gt;</ph>Segment mit<ph id="1">&lt;/b&gt;</ph> Format</source> <target xml:lang="es" xml:space="preserve"/><prop-group><prop prop- type="segmentid">1067381512</prop></prop-group></trans-unit> </body> </file> </xliff> Fig 8: XLIFF File 9.4.1 OpenTMS and Skeleton files Skelton files are one of the key features of XLIFF. In order to reduce the size of content of a segment (transunit, source and target) most converters move the non- Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 32/72
  • 33. Software Architecture 04/2008/KW relevant part (e.g. format information) of an (external) document in an external rep- resentation. They then use a kind of referencing scheme to specify where parts of the text and the segment come together (mainly for back conversion). Skeleton files mainly contain the format (non-textual) part of a document. Often this part is bigger than the core text. One can distinguish between internal and external skeleton files (also called skl files). External skl files keep the XLIFF file small, while internal skl files create a bigger XLIFF file. With external files the problem of back conversion is more complicated as the back converter requires the skl file. One way to overcome this problem is to compress the internal skl file and encode it appropriately. OpenTMS supports the back conversion of a document independently from the place it was created. Thus normally XLIFF files in OpenTMS use internal skl files. In case where this is not possible or wanted a procedure must be supplied which allows to reintegrate the skl file into the xliff file before transmitted to another ma- chine, user etc. 9.4.2 Security and encryption in XLIFF – secureXLIFF As described in the section about security XLIFF documents must follow the secu- rity architecture of OpenTMS. XLIFF documents are potential threat for security. If they are transmitted via the web or by another transport method (USB stick etc.) other persons may read the XLIFF document. In order to prevent access of unau- thorized users it is proposed to encrypt the relevant parts (esp. source and target elements) of the document. Only specified users with the correct password will gain access through an editor or similar to the content of the XLIFF document. XLIFF editors reading the file must support the OpenTMS security layer. Using such a security approach one also could forbid copy and paste etc. for a given xliff document. Annotation: Obviously an open source encryption method should be used. Using a secureXLIFF may be a good argument for industrial user to use the OpenTMS concept and architecture. 9.5 TMX Documents Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 33/72
  • 34. Software Architecture 04/2008/KW TMX documents form the core document type on which database operations apply (fuzzy search, word based search etc.). TMX documents resp. their entries are stored in databases. Converters take different translation memory exchange for- mats (Trados, etc.) and convert them to the xml based TMX format (TMX, 2008). Databases store the tmx entries. While there is no problem with the meta informa- tion associated with each TMX entry (tu) the global TMX document meta informa- tion creates a problem. As databases are organized around entries this meta in- formation must be stored in separate tables and referenced by each entry. 1 TMX files are normally imported into databases to support high access speed . 9.5.1 Security and encryption in TMX – secureTMX The same security architecture as for XLIFF should be applied to TMX. 9.6 TBX Documents TBX documents form the core document type for terminology data. TBX docu- ments are imported into a OpenTMS database. TMX and TBX documents are in- ternally stored in the same entry structure. They can distinguished by specific markers. The reason for storing both TMX and TBX documents in the same type of data- base is that this allows the re-usage of both data in similar situations. Obvi- ously the database functions need to support reading and writing the entries given the context. This a (originally) TBX entry may be used as a TMX entry (translation memory match) in one context while a TMX entry could be used as a terminology match in another context. This internally identical handling should not imply that both entry types are the same but reality shows that often the usage patterns re- quire that they can be used interchangeable. 9.6.1 Security and encryption in TBX – secure TBX The same security architecture as fur XLIFF should be applied to TMX. 1 A key question is if OpenTMS should allow direct access to TMX files (like Star text files) too without having the need to import them into a database. Advantage would be that esp. for small TMX files there is no real need to store them in a database. It would also not require any database drivers. XML access functions would be sufficient. One could see this a special type of database. Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 34/72
  • 35. Software Architecture 04/2008/KW 9.7 Other Documents OpenTMS requires to process all types of other documents. Once those files are brought into the OpenTMS system those files are converted to XLIFF (except those cases discussed above). Once processed those XLIFF documents are con- verted back to their original format. Ideally OpenTMS should contain or interact with a CMS system which provides a convenient way of storing all kinds of documents. Interfaces to CMS will be de- fined. Although the implementation of the interface is not part of the OpenTMS implementation. See chapter 18 9.8 Basic Document Access Functionality In the following some basic XLIFF file functions are described. Those functions should go into the core library of OpenTMS. They are by far not exhaustive. A more detailed function library for XLIFF will be defined later. Although most of the functions can be realised by using DOM functionality, a function library which makes it easy to handle XLIFF files should be realised. As the functions will involve complex parameter combinations the parameters will be supplied as XML constructs. For performance reason one will not really supply flat xml files, but an in-memory version of the XML file (nodes etc.). Basic Translation Func- Comment tions for XLIFF documents Convert Document Converts a given document to XLIFF Backconvert Document Back converts a given document from XLIFF CreateXLIFFDocument Creates an empty XLIFF document. This function maybe questionable as normally XLIFF docu- ments have just an temporary status. The nor- mally come into existence thru a converter call. Nevertheless such a function may be helpful. Pure to text conversion can be achieved anyway. GetProperties Retrieves the (general) properties of the XLIFF document SetProperties Sets the (general) properties of the XLIFF docu- ment Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 35/72
  • 36. Software Architecture 04/2008/KW Segment Segments the XLIFF document based on some SRX rules (configuration file) AddTransUnit Adds a new TransUnit at a certain position. This function also depends on the original format. De- pending on the format this function may cause problems in the back conversion process. RetrieveTransUnit Retrieves a segment of the XLIFF document; this includes all the information of the segment (thus the whole trans-unit is received) RemoveTransUnit Removes a TransUnit; here one could distinguish between immediately (and therefore permanently executing the operation) or just making the change in memory and later saving the changes. ModifyTransUnit Modifies a TransUnit; here one could distinguish between immediately (and therefore permanently executing the operation) or just making the change in memory and later saving the changes. TranslateTransUnit The TransUnit is translated based on some pa- rameters supplied. This can include TM transla- tion, term translation or machine translation or basically any other kind of translations or nvocacation. SplitTransUnit Splits the source part of a TransUnit. Care has to be taken with regard to validity. CombineTransUnit Combines the source parts of a TransUnit. Care has to be taken with regard to validity. SaveDocument Saves the XLIFF document GetStatistics Returns some statistics of the translation process (GMX based) Fig 9: Some basic XLIFF File functions Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 36/72
  • 37. Software Architecture 04/2008/KW 10 OPENTMS AS A CLIENT/SERVER ARCHITECTURE The kernel OpenTMS architecture is based on the client server principle. Using a client server architecture brings many advantages, amongst the very critical one that processes can be spread over several computers or threads in modern oper- ating systems and hardware architectures. This does not imply that the OpenTMS architecture only can be implemented on a client server basis. All the processes (Translets) also can run in a single user environment (e.g. by a procedural call within an editor). But by using a client server framework one avoids the problem to re-program or re-implement a piece of software which was designed to run in a single threaded environment only. This holds with regard to using global or static variables etc. from an implementation point of view. Each procedure developed for OpenTMS should be designed with multi thread- ing in the background. Each procedure should be encapsulated in such a way that it can be surrounded by a (process wrapper) which allows it to run other as a (multi) thread in the same software or computer environment or can be distributed over several computers. Actually this means “globally defined variables” should be avoided as far as possible. As has been described before the key functions are implemented in the OpenTMS core library. All (main) procedures should also be written in such a way that they can be called easily by the OpenTMS scripting language. Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 37/72
  • 38. Software Architecture 04/2008/KW Fig 10: Hierarchy of processes Processes have to adhere to the security concept of OpenTMS. Processes can only be executed if they (and the user associated with the process) have appropri- ate rights (gained thru the security model). This esp. applies for processes which use network connections. Fig 11: Applications Most of the processes are XLIFF exchange based (thinking in terms of functions and variables this means that the parameters of functions are XLIFF documents or substructures of XLIFF). This means that the processes mainly operate on XLIFF based xml structures. They add or modify XLIFF structures. In principle the opera- tions should be non destructive. That is information is not deleted or removed but only added. In some cases this cannot be fully held: e.g. if a translator modifies a translation (in a destructive way) the (older) information is lost. The same may ap- ply to database entries. This also depends on the usage of a proper versioning system. As a consequence of using internally XLIFF related structures conver- sions to related XML based formats like TMX, TBX etc. must be supported. This can be realized by attaching import and export procedures to the OpenTMS ker- nel. Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 38/72
  • 39. Software Architecture 04/2008/KW Exceptions are for example converters which take a whatever formatted docu- ment as input and produce an XLIFF document. The same applies to back con- version. Please note that the above figure also represents some kind of workflow. Basic workflows can be part of the OpenTMS architecture (e.g. each process applying changes to an XLIFF document should document this in the XLIFF header). But it is not intended that OpenTMS as such comes with its own workflow solution. More complex workflow procedures should be modeled either using proprietary or open source software. OpenTMS also follow the “old style” of UNIX pipe lining. Processes (see chapter about process model) take an input and produce an output. The next process will take the output of the previous process applying some further transformation of the input and creating new output. Nevertheless there is some difference. As parame- ters can become quite complex the UNIX style of interpreting the input just as “a string” is opened here up to support input and output in form of the parameters described before. Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 39/72
  • 40. Software Architecture 04/2008/KW Fig 12: Pipeline Architecture Figure 11 shows a typical pipe lining of several processes (Translets) during a translation process. OpenTMS can differentiate between two basic Translets. • Human Initiated Translets: These are Translets which are invoked and (fully) controlled by humans. Examples are a Translation Editor, operation which invoke inserting or updating entries in a database. • Automated Translets: These are processes which are normally run auto- matically and do not require human interactions. Examples are the steps – conversion – segmentation – pre-translation. Here also automated pro- cedures (e.g. pre-translating a project – Translets applied to a set of docu- ments) have to mentioned. Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 40/72
  • 41. Software Architecture 04/2008/KW 11 DATA MODEL 11.1 Data sources Data (mostly databases) are modeled thru data sources. Data sources are the ba- sic objects which allow the access to all kind of data, esp. databases. Data sources mainly store segments from TMX files or TBX entries. Data sources are XML oriented, that is depending on the xml document supplied it converts the en- try in such a way that it can be transferred to a data component. Fig 13: Data sources and data components Why not directly refereeing to databases? The basic idea behind the usage of a data source as the core data object in OpenTMS (representing databases) etc. is that creating such a layer between the real databases (e.g. MySQL) and the OpenTMS software makes adding new types of data quite easy. The various types of data are referred to as data components. Thus an SQL database is a data Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 41/72
  • 42. Software Architecture 04/2008/KW component, but also a TMX file could be seen as a data component if the relevant access operations are supported. Similar an Excel file can be considered as a data source. Using this approach OpenTMS is not restricted to SQL databases, but can use flat files, spread sheets etc. too. It can also support direct access to vendor specific databases or systems. A server sided installation of OpenTMS can also act as data source. Access to data sources through standardised interface O P E N Open T M TMS Data type specific S Data access S Source functions O Layer F T W Maps the OpenTMS A access functions to the specific data component R E Various data components like files etc. Fig 14: Data sources with several data components Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 42/72
  • 43. Software Architecture 04/2008/KW A data component which is connected thru a data source must support a core functionality. This core functionality is divided into three types of functions (meth- ods): • Read methods: This involves all functions retrieving data from a data component. Read methods also maps the results in the way the caller needs the data (e.g. TBX or TMX). • Write methods: This involves all functions writing, updating and deleting data to a data component. Write methods also take into account which in- put format is used (e.g.TMX or TBX etc.) and convert them into the internal data source format. • Select Methods: This methods are part of the read methods and allow to select specific entries from the data source. Care has to be taken which security level has been chosen. Depending on the level the data have to be encrypted and decrypted. Two types of data components can be distinguished: • Read only data components: This type of component can only retrieve data, but not store data. An example could be if a plain TMX file is used as data component. • Full data components: Here both read and write methods are supported. Depending on the user configuration data components can be configured to be- have differently. It can appear as read only data component for one user, while for another used it could be accessible as full data component. 11.2 TM Matches OpenTMS differentiates between three types of matches: • Perfect Match: This is a match where the segment to be searched matches the segment in TM both with regard to the text content and the format • Exact Match: In this case only the text part of the segment matches with the database entry perfectly, the format information differs. Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 43/72
  • 44. Software Architecture 04/2008/KW • Fuzzy Match: In this case there are some deviations between the search segment and the match in the TM. The difference is usually stated in % values. This type of match is also often called inexact match. One may consider in the future other types of matches too, e.g. replacement class matches where only the “blank characters (white spaces)”, differ. For this see also chapter 12.3. 11.3 Basic data source access functionality The following (read and write ) access functions are the core functions need. Ac- cess results in matches. A basic idea is that that the function decides based on the input supplied how the entry is interpreted and written into the database. This means that TMX entries are handled differently from TBX entries etc. Please note that in the description of the functions no explicit reference is made to the security model. It is assumed that the security level is set before or in invoca- tion with the database function invocation. Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 44/72
  • 45. Software Architecture 04/2008/KW Access Type Comment Exact Access A given entry is found by the “string=segment” supplied but independently of the format.. Exact Format Access A given entry is found by the “string” supplied tak- ing format information into account. Fuzzy Access A given entry is found by using a similarity search. Similarity is measured in %, where 100% is iden- tical to an exact access. Fuzzy Format Access A given entry is found by using a similarity search – taking the format into account. Similarity is measured in %, where 100% is identical to an exact format access. Word Based Access A search is done by splitting the string into indi- viduals words. The word identification is language dependent. The words could either be searched 2 using OR or AND . Word based access could be enhanced by supporting stemming (e.g. Porter stemming algorithm) Regular Expression Access A regular expression is used to retrieve the result set. Actually such a function is quite resource consuming. Sub segment Access Segments are retrieved based on some sub seg- ments of a given search string. Actually this could be seen as a more specialized form of the regular expression search or word based search. This type of search is esp. important if a segment ac- tually represents a paragraph and may contain several sentences. Fig 15: Data source access types 2 It is suggested to use a logical represenation of the query similar to Google (www.google.com). Here + denotes”word must exist”, while – denotes that the word is not allowed to exist in the result set. Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 45/72
  • 46. Software Architecture 04/2008/KW Access Functions for TM Comment and TBX data RetrieveTMMatch Get a match from the Translation Memory. The actual result depends on the data source access type chosen. Parameters involve match quality etc. RetrieveTBXMatch Get a TBX match from the terminology database. The actual result depends on the data source ac- cess type chosen. AddEntry This is a generic function adding data (e.g. TMX entries) to data sources. The function is generic in that that sense that it decides on the type of the xml document to be added how the entry is stored (TMX, TBX etc.). CreateEntry Creates an empty data source entry of a specific type AddTMEntry Adds a TM entry; actually a specialization of Ad- dEntry AddTBXEntry Adds a TBX entry; actually a specialization of Ad- dEntry RemoveEntry This is a generic function removing data (e.g. TMX entries) to data sources. The function is ge- neric in that that sense that it decides on the type of the xml document to be added how the entry is stored (TMX, TBX etc.) ModifyEntry This is a generic function modifying data (e.g. TMX entries) to data sources. The function is ge- neric in that that sense that it decides on the type of the xml document to be added how the entry is stored (TMX, TBX etc.) CopyEntry This is a generic function copying data (e.g. TMX entries) to data sources. The function is generic in that that sense that it decides on the type of the xml document to be added how the entry is stored (TMX, TBX etc.) Fig 16: Data source access types Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 46/72
  • 47. Software Architecture 04/2008/KW 11.4 Databases A key principle of the OpenTMS architecture is its independence from database products. OpenTMS defines a core subset of access functions (based on SQL) which can be implemented by nearly all database systems. The following gives a (a non exhaustive) list of database types which should be 3 supported . 11.4.1 Open source SQL data bases • MySQL - www.mysql.de • Postgres - www.mysql.de • H2 - www.h2database.com • Cloudscape - www.ibm.com/software/data/cloudscape (IBM) • … 11.4.2 Closed source SQL databases • SQL Server (different flavors) - www.microsoft.com/germany/sql/default.mspx • Oracle - www.oracle.com • … 11.4.3 Alternatives SQL databases are not the only databases out there. Other database formats could be: • Spreadsheets (like SQL) 3 A key question at this point is if OpenTMS should implement something as an “internal database” which just would mean storing the database as “simple hash tables” which can be serialised and de-serialised. See also the discussion of TMX documents (Footnote 1). Alternatively the internal database could just consist of an xml file. Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 47/72
  • 48. Software Architecture 04/2008/KW • Object oriented databases • XML database systems (e.g. XINDICE) • Plain text files Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 48/72
  • 49. Software Architecture 04/2008/KW 11.4.4 Database Access Internally all main access functions of OpenTMS are based on specific objects (see page 51) and all access happens through these objects. By using this addi- tional abstraction level (interfaces as they are called in most programming lan- guages nowadays) one gets even independent from SQL and is open for future advances in the area of databases development. All access functions are mapped to SQL statements (or their equivalents) which are not hardcoded but stored in xml database configuration files. Till this point there is no real necessity to realize the database only in SQL. The advantage of using SQL as the language describing the access functions is a) that it is widespread and b) standardized. Fig 17:Configuring different database types 11.4.5 Database and data source configuration As OpenTMS needs to support a lot of different database / data sources type add- ing a new database type should not require changing the data source code kernel. Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 49/72
  • 50. Software Architecture 04/2008/KW Therefore for each data source type a configuration file defines the main pa- rameters of the database. Depending on security require the configuration file can be secured using the security model functions for documents. This includes: • Database class driver – e.g com.mysql.jdbc.Driver • Connection String – e.g. jdbc:mysql: • Any other connection string specific commands (e.g. buffer size) • Commit support • Unicode support • Server Address • Port • User (encrypted) • Password (encrypted) • Mapping of OpenTMS database access function to database specific ac- cess code (e.g. SQL code like <command step="1">DROP TABLE MONO IF EXISTS MONO</command>). Depending on the access functions they can be organized in groups if a specific functionality requires to run sev- eral database functions (e.g. creating all the necessary tables for a new database). This is mainly important for SQL databases as here a variation of supported SQL types exist. • Reference to code (e.g. jar file, dll etc.), If a specific functions needs to run at a specific point of time (e.g. creating a new database). This should en- able to inject specific implementation code for specific tasks (e.g. if some functionality cannot be executed thru SQL commands) In addition a more generic interface can be called if a database cannot be inte- grated with the configuration file specifications above. In this case the whole inter- face for the new database needs to be implemented and made available to OpenTMS. Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 50/72
  • 51. Software Architecture 04/2008/KW 12 TRANSLATION OBJECTS A key entity in the translation process are translations. Translations (inherently multilingual) consist usually of segments (monolingual) and languages associated with those segments. As a consequence the architecture uses three types of language related entities. This objects are used by processes to create the translation functionality. A “General Linguistic Object” (GLO) contains information (features, attributes) which are common to all linguistic information types. Examples are: unique id, creation and modification dates, authors etc.. Linguistic Objects always can be serialized to XML. Main supported formats are here: XLIFF, TMX and TBX. From that object two objects are derived: • A “Monolingal Object” (MoLO) which represents a linguistic entity for a given language. It inherits all the features of GLO and adds for example the language of the entity (segment). • A “Multilingual Object” (MuLO) represents translations by linking one or more MoLOS into one object. A MuLO constists at least of one MoLO and can contain up to n MoLOS. It is not required that each MoLO of a MuLO 4 has a different language. Each of those object types contain a unique id, in addition a MoLo inherits an MuLO related id so that it can be easily associated with its translations. 4 The behaviour of multilingual objects can be configured. One option can be to treat all entries as bi-lingual objects only. Thus one MuLo only would contain MoLos – a source and target MoLo. Normally options like this should be used with caution as they introduce problems in managing real multilingual databases. This is esp. true if one source segment may have several transla- tons (target MoLos). Nevertheless there may be cases where one requires to have several translations for a source segment, eg. Something like a temporary translation. In this caseit is suggested to associate “status attributes” with the MoLo. This could be the used on the one hand as a sorting criteria for matches and on the other hand for identifying problem transla- tions. Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 51/72
  • 52. Software Architecture 04/2008/KW Obviously attributes are associated with Linguistic Objects. As several standards are used (TMX, XLIFF and TBX) a mapping of the attributes between the different types is required. Within the object the attributes may be identified through their name space. Fig 18: Representation of linguistic entities as General Linguistic Object 12.1 Format information Format information (e.g. transported thru the <ph> tag in XLIFF ) and its correct handling is a key and kernel function of OpenTMS. The core OpenTMS library contains all the necessary functions to handle format information correctly. OpenTMS should aim at providing the highest possible support in format handling. 12.2 Terminology versus Translation Memory Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 52/72
  • 53. Software Architecture 04/2008/KW Within computational linguistics a key difference is made between terminology and translation memory. Both concepts clearly are used in two different contexts. This is also reflected that there are (at least) two standards: TMX (TMX, 2008) and TBX (TBX, 2008). Nevertheless from a conceptual and software engineering point of view both concepts share more than distinguish them. Both have “strings” as their basic representations – either as terms or as segments – and also meta informa- tion matches in most cases. A main difference is their context usage. TMs are normally applied at segment level; consist normally of more characters), while terms are used at a sub segment (word, phrase) level. As this differences only appear at the usage level OpenTMS consequently imple- ments the same underlying (database) structure for TM and term entries. Using special markers a distinction can be made at run time (= usage time). The advan- tage immediately can be seen that by this approach both concepts can be used in different usage contexts. Search and retrieval functionality is available for both concepts (e.g. fuzzy search is rarely available for term databases; using a com- mon internal representation this drawback is overcome). Fig 19: Conversions of linguistic entities 12.3 Variables , placeholders, replacement classes Translation memory entries, sometimes also terminology entries, often contain textual parts which can act as placeholders. Typical examples of placeholders are Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 53/72
  • 54. Software Architecture 04/2008/KW numbers, month names, acronyms etc. In many cases it is possible automatically replacing those “variable parts” with their actual counterpart in a segment. This is esp. useful in matching, e.g. just be replacing the numbers in a match with its cor- rect value to achieve a better match, even a perfect match. OpenTMS supports for this reason the concept of replacement classes. A re- place class is specific construct which generalizes a certain type of string or infor- mation. A replacement class consists of basically two parts: • A class name (e.g. number) • A procedure describing the replacement class. In many cases the proce- dure can be defined through a regular expression. Another option maybe that specific strings (e.g. terms from a terminology database) may act as replacement class. • A procedure maybe language dependent. If a procedure is language de- pendent transformation rules have to be defined how a value of language A is transformed to a language B. Example: Class: GeneralNumber Procedures: General: Definition: ([0-9]+?)(.)([0-9]+?) Transform: $1.$2 German: Definition: ([0-9]+?)(,)([0-9]+?) Transform: $1,$2 The basic idea is that a language specific procedure involves two parts: • a definition part which describes how to detect (evaluate) an instance of a replacement class • a transformation part which describes how to compute the instance of a replacement class given that a replacement class has been detected (e.g. in another language) Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 54/72
  • 55. Software Architecture 04/2008/KW When a replacement class matches parts of segment the matching part is re- placed with replacement class carrying forward the class name and the value of the original class. Replacement classes invoke two main challenges: • A key problem in defining replace classes is the order in which they are involved (checked). Depending on the definition of the regular expression several expression may match (e.g. numbers without and with decimal points). Open TMS should apply a strict linear order procedure. The first matching expression is applied and used. • The other key problem is checking if all the replacement classes appear a) in both source and target match and b) appear in the source segment (the one which requires translation). For OpenTMS the proposed solution is that the replacement classes in both source and target have to mach exactly. If this is given the replacement classes also have to match source segment to be translated. It has to be noted that another approach could be used too – removing the non matching replacement classes in all three involved strings. Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 55/72
  • 56. Software Architecture 04/2008/KW 13 PROCESS MODEL 13.1 OpenTMS Process An OpenTMS process realizes the functionality of the OpenTMS system – mainly supporting the translation process. Examples of processes are converters, seg- menters, translation memories, machine translation, statistics modules etc. OpenTMS processes build on the core library functions and move them into a process environment. In many cases this does not really mean that a process is created in the deep meaning of a process, it also cold mean that a function of the core library (but any othr function defined in another OpenTMS context) is called from an application. 13.2 OpenTMS Scripting Language Most OpenTMS processes are available through the OpenTMS Scripting Lan- guage (OpenTMSL). The OpenTMS Scripting language enables developers and users to write their own scripts to adapt the OpenTMS processes to their needs. OpenTMSL is defined in a programming language independent way and should be implemented in different programming languages. It basically makes the functions defined in the core library accessible to the public through an easy to learn script- ing language. Fig 20: OpenTMS Scripting Language Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 56/72