Jax, FL Admin Community Group 05.14.2024 Combined Deck
TEI ODD based development
1. Welcome to my presentation on ODD. I have been working with ODD for about 4
years, first because I wanted universal dependency based linguistics in Frisian
corpora,
later because I wanted a strict TEI and universal dependency based dictionary format.
I want to show you how ODD has helped me to deliver reliable, interoperable
solutions.
First I will show you what ODD is, then I will show you how ODD can be used in
development pipelines. We will have a look at how ODD's can inherit from each
other,
and have a glance atteipublisher. Finally I will share some pro's and con's and the
reasons why we at the Fryske Akademy stick to ODD as a basis for solution
development.
1
TEI and ODD for
LINGUISTICS
A solid basis for development?
edrenth@fryske-akademy.nl
2. So what is an ODD? An ODD is a regular TEI document in which you define your data
model using a schemaSpec. In the ODD you can document your data model using TEI
elements such as div, p, gloss and def. In this documentation you can include the
actual specifications, to which you can refer from within schemaSpec. This gives you a
nice way to specify your data model in a documented manner.
Inside the specification of an element you can indicate how this element is to be
processed. We don't use this yet, but since the latest versions ofteipublisher it has
become a very interesting mechanism. Once you have your ODD you can generate
validation schemes, documentation and more.
2
What is ODD
• One Document Doesall
• It is a TEI document
• Holding one schemaSpec element
• It is the mechanism to customize TEI
• TEI is designed to be customized
• What canit do
• Generate validation
• Generate documentation
• Describe processing model
• https://tei-c.org/guidelines/customization/
• https://tei-c.org/release/doc/tei-p5-doc/en/html/USE.html#IM-unified
• https://tei-c.org/release/doc/tei-p5-doc/en/html/TD.html#TDmodules
• https://tei-c.org/release/doc/tei-p5-doc/en/html/TD.html#TDPMPM
3. Here you see a basic example of a schema definition in ODD. Often the structure will
be a schemaSpec with references to modules you want to work with and
specifications of elements you want to change.
The start attribute on schemaSpec tells which root element(s) is(are) allowed. Each
moduleRef points to an existing module, available in TEI (online)
or elsewhere. A source attribute, left out in this example, allows you to point to the
file where module definitions can be found, I will show you more on this later.
To limit the vast number of elements in modules you use the include or except
attributes on moduleRef. NOTE that if you omit an element in the include attribute
and refer to it later from
an elementRef, schema generation will not fail, instead the element will just not be
there. After the module references you usually list some elementSpecs for elements
that you want to change.
NOTE that omitting the mode attribute on elementSpec means add, not change.
Adding already existing elements is weird but again often does not make
transformation fail!
Including a content element in an elementSpec will overwrite existing content. The
content of content resembles for example xsd, basically you can use sequence and
alternate.
3
4. A very nice feature I think is the ability to use constraints in element specifications,
most people will use schematron with assert, report and xpath.
3
5. Besides elements also attributes can be specified, in this case I add linguistic feature
attributes to the analysis module. These attributes can be defined in their own
namespace.
You can define the datatype of attributes which can be an xml schema datatype using
the name attribute, or like here, a TEI datatype using key.
4
6. You can refer to previously defined attributes via memberOf. When an element is a
member of an attribute class, the attributes defined in this class are allowed for that
element.
NOTE the "mode is change" on classes, if you omit it the default will be "replace"
meaning you will loose all other class memberships.
5
7. More possibilities worth mentioning but not in detail. The first two keep things
organized, the model specifies element processing.
6
• specGrp –specRef: grouping specs
• macroSpec –macroRef: expanding spec content
• model: definebehaviourofelements
More possibilities
8. Now, this is where the benefits really start. Once you have your ODD you can
construct a pipeline without the need for coding that will give you validation and
documentation,
which you can use in for example editing environments like oxygen.
First thing to do is "compile" or better said expand your ODD using the available TEI
stylesheet. The necessary parts from modules in the TEI source will be combined with
your specifications.
After this is done you can transform to rng, again using the available stylesheet, or,
transform to a separate schematron. Rng can be transformed to xsd, which you may
want to generate jaxb classes. Last but not least there is a nice library that deals with
the complexity of transforming schematron to xslt, the execution of validation and
with the processing of validation results.
You can also use oxygen to transform, or oxgarage oryou can use roma to construct
ODD online, but the downside is this gives you less control and insight and you get
the version of TEI source and stylesheets available in these tools at the time.
7
odd odd2odd
• .compiled
odd2rng
• rng with
schematron
odd2sch
• .sch
trang
• .xsd
dmaus
schxslt
• xslt and/or
java
validation
ODD, processing
1. Maven: https://bitbucket.org/fryske-akademy/online-dictionaries/src/master/pom.xml
2. Oxygen
3. https://oxgarage.tei-c.org/
4. https://roma.tei-c.org/
5. Command line / maven
1. https://github.com/TEIC/Stylesheets/tags
2 – 4 use a version you may not want!
9. This makes me really happy! Recently I discovered it is possible, though verbose, to
define a maven pipeline that implements a lot of steps I mostly performed by hand
before. Now I can just do mvn verify, no ant needed either and no dependencies to
online sources.
8
ODD, processing, maven
<transformationSet>
<stylesheet>src/main/Stylesheets-${stylesheetversion}/odds/odd2odd.xsl</stylesheet>
<parameters>….</parameters>
<outputDir>src/main/resources/odd</outputDir>
<fileMappers>….</fileMappers>
</transformationSet>
<transformationSet>
<stylesheet>src/main/Stylesheets-${stylesheetversion}/odds/odd2relax.xsl</stylesheet>
<parameters>….</parameters>
</transformationSet>
<transformationSet>
<stylesheet>src/main/Stylesheets-${stylesheetversion}/odds/extract-isosch.xsl</stylesheet>
<outputDir>src/main/resources/schematron</outputDir>
<fileMappers>….</fileMappers>
</transformationSet>
<plugin>
<groupId>net.sigmalab.trang</groupId>
<artifactId>trang-maven-plugin</artifactId>
<version>1.2</version>
1. <dependency>
<groupId>name.dmaus.schxslt</groupId>
<artifactId>java</artifactId>
<version>2.0.3</version>
https://bitbucket.org/fryske-akademy/online-dictionaries/src/master/pom.xml
10. On top of the available transformations from the TEI community I found it very useful
to write transformations from ODD. For example to generate a configuration file for
blacklab,
which in turn is used to build lucene indexes. Transformations like that help to stay
consistent and in control for example in case of data model changes. Naturally they
can be included in maven pipelines.
9
ODD, generation
https://search.maven.org/search?q=a:TeiLinguisticsFa
https://bitbucket.org/fryske-akademy/tei-encoding/src/master/reusables/
11. Something about inheritance now. I must admit I recently abandoned it, because of
added complexity and lack of use-case. The basics are simple, write an ODD, compile
it, write another ODD that uses the compiled first. The source attribute is crucial, you
can specify it on schemaSpec, which means all moduleRef without a source attribute
will retrieve their content from that source. All moduleRef wíth a source attribute
will retrieve their content from there.
An elementRef can also have a source attribute allowing you for example to re-add an
element left out by the parent ODD.
Despite these simple basics it is kind of cumbersome to find out exactly which
elements and modules come from exactly where, how they are defined, modified,
etc.
Rule of thumb: use fixed versions and keep It simple.
10
Compile odd1
• odd2odd.xsl
Create odd2
using compiled
odd1
• @source=...
ODD, chaining
http://teic.github.io/PDF/howtoChain.pdf
Inherit from other odd's
12. Now, a glance at perhaps one of the most promising possibilities of ODD, especially
when looking at the teipublisher implementation of it. You can specify a processing
model for elements.
This allows you to decouple element definition from visual element behaviour.
A model defines behaviour and can do so conditionally. You can provide parameters
for the processing. Parameter values originate from the actual element at the time of
processing.
OutputRendition should I think be avoided, instead rendition definitions should be
external, like (s)css and classes.
Teipublishertakes processing model a step further through the use of templates, web
components and xquery instead of xpath. We will probably be using it for digital
editions.
11
ODD, processing model
https://tei-c.org/release/doc/tei-p5-doc/en/html/TD.html#TDPM
https://teipublisher.com
https://e-editiones.org/
Very promising!
13. These are some examples of solutions at the Fryske Akademy. For corpora we
generate blacklab config and javascript from ODD and we use the html stylesheet
from TEI to build a fully functional corpus query system.
For dictionaries we generate rng, xsd and schematron that are used in a
validationhelper which is published to maven central. This library is then used in an
app that publishes approved dictionary articles. An exist-db app allows querying the
dictionary and presents results in either json or html.
Another example is a library for linguistics in corpora where the generatedxsd is
translated into jaxb classes using an also generated bind.xml. This library is used in a
Frisian lexicon service.
12
ODD
•corpora
blacklab
config, js
docs
borpus
linguistics
Usage in
applications
eclipse moxy
jax-rs, json rest
apache cxf wsdl2java
jaxb jax-ws, soap ws
maven central
jaxb2/xjc
jaxb classes
ODD
rng/xsd bind.xml
ODD
•dictionaries
rng/xsd/schematron validationhelper maven central
publish app, json
service, gui
Frisian lexicon
https://web2.fa.knaw.nl/corpus-frontend
https://web2.fa.knaw.nl/exist/apps/onfw/index.html (TEST!)
https://web2.fa.knaw.nl/foarkarswurdlist-ws/
14. Wrapping up I give you a list of pro's and con's of ODD based developments. The pro's
weigh heavier for us, perhaps the most problematic in practice is the complexity of
the development pipelines
that often consist out of multiple generation and publication steps and possibly
inherited dependencies.
For me as a java adapt it is a pitty that TEI focus is on rng, not xsd. I realy like and
benefit from jaxb and still hope xsd 1.1 will be a success and find it's way into a
follow-up for jaxb.
13
pros
• Reliable build processes that guaranteeinteroperability
• Maintain data logic inoneplace
• Generation ofrng, schematron,xsd
• Generation using xslt
• Sticking closeto TEI, benefit from updates and tools
• Limit knowledge and technologies to maintain
cons
• Niche (complex) knowledge
• Stylesheets maynot generatewhatyou want
• Chaining (inheritance) canbe confusing
• Hard to debug and test
• ODD change may cascadeupdates oflibs and applications
• Xsd support(via trang) less stablethen rng
15. For us at the Fryske Akademy there are a lot of reasons to stick to our ODD based
approach. Perhaps I raised some curiosity that will lead to increased use of ODD
which in turn will lead to a load of github issues on ODD that will be solved,
improving the usability of ODD.
14
To ODD or not to ODD
• It is possible to maintain stable build processes based on ODD
•With code generation
• Active community, active maintenance of stylesheets
• It is possible to build reusable libraries based on ODD
• Over the past 4 years little problems
• ODD syntax is rather simple
• ODD with teipublisher for digital editions and integration in blacklab
16. Thank you for watching, my live version will now be available if you have any
questions.
15
Thanks
Eduard Drenth
edrenth@fryske-akademy.nl
I would like odd to get a more prominent
place in the TEI stack and community. It
could be a well known goldmine