The document describes S EX+, an extension of S EX that allows formalizing and annotating technical documents with semantic metadata. S EX+ enables defining ad hoc vocabularies to describe project-specific concepts and annotate documents accordingly. It produces output in PDF, OMDoc+RDFa, and XHTML+MathML+RDFa to enable interactive services. S EX+ aims to balance formalization with flexibility for existing authoring practices.
sTeX+ – a System for Flexible Formalization of Linked Data
1. S EX+ – a System for Flexible Formalization of Linked
T
Data
Andrea Kohlhase and Michael Kohlhase and Christoph Lange
ch.lange@jacobs-university.de
Computer Science
Jacobs University
Bremen, Germany
September 1, 2010
Christoph Lange: S EX+
T 1 September 1, 2010
2. SAMS: A Case Study for Semantic Authoring
• SAMS: Safety Component for Autonomous Mobile Systems [FHL+ 08]
• develop a safety component for autonomous mobile service robots and to get it
certified as SIL-3 standard compliant
• Follow the V-Model Discipline
1 Implement in Misra-C
2 Verify safety properties in Isabelle
3 ¨
Submit to Certification Agency (TUV)
• Document Collection SAMSDocs
A
(Word, LTEX)
• Idea: turn documents into
Linked [Closed] Data
Christoph Lange: S EX+
T 2 September 1, 2010
3. Linked Data in Software Engineering: Motivation
Information needs in a software engineering project:
Programmer: Symbol: defined where?
Specification: how much already implemented?
Proof: already verified?
Project Manager: Has the code been verified? (needs high-level figures)
Changes since last certification? Affected parts?
Who is in charge of what? How can one employee be replaced?
Certifier: Get an overview
Follow links through specification and implementation
What needs to be re-certified?
All: Whom to ask for details about something?
Goal: Answer questions from distributed document collection
• Partially formalize LTEX documents
A (S EX+)
T
• Linked Data also works in the [enterprise] intranet [Ser08]
(it’s just an architecture after all)
Christoph Lange: S EX+
T 3 September 1, 2010
4. A
Agenda: From LTEX to Linked Data
1 SAMSDocs were available as LTEX A (state of the art for technical writing)
2 A
We had a semantic extension of LTEX: S EX T (formalizing mathematics)
3 Too rigid w.r.t. existing layout, no support for project-specific metadata
4 “S EX+”: identify objects, define ad hoc metadata vocabulary, annotate objects
T
with these metadata terms
RDF
5 Output targets: PDF and OMDoc+RDFa
XHTML+MathML+RDFa
6 Offer interactive services in XHTML+MathML+RDFa documents
Christoph Lange: S EX+
T 4 September 1, 2010
5. S EX, a Semantic Variant of TEX/LTEX
T A
• Problem: Need content markup formats for semantic services, but
A
Mathematicians write LTEX
• Idea: Enable the author to make structure explicit and disambiguate meanings
• use the TEX macro mechanism for this (well established)
• the author knows the semantics best (at least she understands)
• the burden is is alleviated by manageability savings (MKM on TEX/LTEX) A
• Definition 1 (S EX Approach) Semantic pre-loading of TEX/LTEX documents.
T A
• Introduce semantic macros: e.g. union{a,b,c} a∪b∪c
• Mark up discourse structure: (largely invisible)
e.g. begin{sproof}[id=Wiles,for=Fermat]. . . end{sproof}
• Generate PDF and OMDoc from that A
(via LTEXML [Mil])
http://trac.kwarc.info/sTeX/
Christoph Lange: S EX+
T 5 September 1, 2010
6. S EXIDE: Integrated S EX Development Environment
T T
http://stexide.googlecode.com [JK10]
Christoph Lange: S EX+
T 6 September 1, 2010
8. Coping with SAMS Practices in S EX I
T
• Problem: vanilla S EX is not enough for this project
T
The SAMS has its own structures (but want to preserve appearance)
• Example 2 (Definition tables)
Definitions in tables (OMDoc only allows sequences of definitions)
Idea: Extend S EX to S EX-SD with custom macros and environments to cope.
T T
Christoph Lange: S EX+
T 8 September 1, 2010
9. Coping with SAMS Practices in S EX II
T
• Idea: Tabular environment that only outputs definitions to OMDoc
A
(LTEXML bindings)
Christoph Lange: S EX+
T 9 September 1, 2010
10. Non-Logical Relations in SAMSDocs
• The V-Model introduces relations
between document fragments
• formalize V-Model vocabulary:
• refines
• implements
• describesUse
• ...
• Idea: Mark up these secondary (non-logical) relations as OMDoc metadata
• OMDoc allows flexibly extensible metadata [LK09]:
• annotate relations via RDFa (RDF in XML)
• specify their meaning via ontologies (also expressible in OMDoc)
• Example 3 SemVMrel[cd=reqspec,refid=R12,rel=refines] generates
the RDFa metadata
< l i n k r e l =” s v m : r e f i n e s ” h r e f=” . . / r e q u i r e m e n t s s p e c#R12” />
Christoph Lange: S EX+
T 10 September 1, 2010
11. The S EX-SD Vocabulary Extensions Classified
T
Christoph Lange: S EX+
T 11 September 1, 2010
12. A
Vocabulary Extensions Without LTExml Hacking
Dual role of S EX:
T
• define modular, domain-specific vocabularies
• . . . and use them to annotate documents
So far: mathematical theories (“vocabularies” of mathematical symbols)
use semantic macros for symbols in formulae
(otherwise fixed vocabulary of mathematical structures)
Now (S EX+ ): ontologies (RDF vocabularies)
T
annotate documents with metadata
Christoph Lange: S EX+
T 12 September 1, 2010
13. Defining RDF Vocabularies in S EX+
T
b e g i n { module } [ i d= c e r t i f i c a t i o n ]
metalanguage [ . . / background / r d f s ]{ r d f s }
% m e t a d a t a p r o p e r t y w i t h domain :
k e y d e f { document }{ h a s S t a t e }
symdef { s t a t e −doc−r d } [ 1 ] { r d . #1}
symdef { t u e v }{ t e x t {T”UV}}
b e g i n { d e f i n i t i o n } [ f o r=h a s S t a t e ]
A document d e f i [ h a s S t a t e ] {%
h a s s t a t e } $ x $ , i f f t h e p r o j e c t manager
d e c r e e s i t s o . end{ d e f i n i t i o n }
b e g i n { d e f i n i t i o n } [ f o r=s t a t e −doc−r d ]
A document h a s s t a t e d e f i n i e n d u m [%
s t a t e −doc−r d ] { r d . $ x $} ,
i f f i t has been s u b m i t t e d to
$ x $ f o r c e r t i f i c a t i o n . end{ d e f i n i t i o n }
our ad hoc vocabulary so far b e g i n { d e f i n i t i o n } [ f o r=t u e v ]
The $ t u e v $ ( T e c h n i s c h e r
” U b e r w a c h u n g s v e r e i n ) i s a
We want something that w e l l −known c e r t i f i c a t i o n a g e n c y
i n Germany . end{ d e f i n i t i o n }
• . . . is more scalable and reusable end{ module }
A
than hand-crafted LTExml
% Usage
bindings importmodule [ . . / onto / c e r t ]{ c e r t i f i c a t i o n }
• . . . translates to a real RDFS/OWL b e g i n { document } [ h a s S t a t e=
$ s t a t e d o c r d { t u e v } $ ] . . .
ontology (via OMDoc, or directly) end{ document }
Christoph Lange: S EX+
T 13 September 1, 2010
14. Our Infrastructure
http://kwarc.info/LinkedLectures/ [DKL+ 10]
Christoph Lange: S EX+
T 14 September 1, 2010
15. Finding a Substitute for an Employee via the V-Model
• Harvest the RDFa from OMDoc into a RDF triple store (standard)
• Ask the following Query to a SPARQL endpoint
PREFIX vm : <h t t p : / /www . sams−p r o j e k t . de / o n t o l o g i e s / V e r s i o n M a n a g e m e n t #>
PREFIX omdoc : <h t t p : / / omdoc . o r g / o n t o l o g y #> # OMDoc
PREFIX semVM : <h t t p : / /www . sams−p r o j e k t . de / o n t o l o g i e s /V−model#>
PREFIX dc : <h t t p : / / p u r l . o r g / dc / e l e m e n t s /1.1/ > # Dublin Core
PREFIX xsd : <h t t p : / /www . w3 . o r g /2001/ XMLSchema#> # XML Schema d a t a t y p e s
SELECT ? p o t e n t i a l S u b s t i t u t e N a m e WHERE {
? document vm : r e s p o n s i b l e < . . . / e m p l o y e e s # A l i c e > ;
omdoc : h a s P a r t ? o b j e c t .
{ ? o b j e c t semVM : r e f i n e s ? r e l a t e d O b j e c t }
UNION
{ ? o b j e c t omdoc : o c c u r s I n D e f i n i t i o n O f ? r e l a t e d O b j e c t }
? o t h e r D o c u m e n t omdoc : h a s P a r t ? r e l a t e d O b j e c t ;
dc : d a t e ? date ;
vm : r e s p o n s i b l e ? p o t e n t i a l S u b s t i t u t e .
FILTER ( ? o t h e r D o c u m e n t > ”2009−01−01” ˆˆ x s d : d a t e )
? p o t e n t i a l S u b s t i t u t e f o a f : name ? p o t e n t i a l S u b s t i t u t e N a m e .
}
• Present the result to the user.
Christoph Lange: S EX+
T 15 September 1, 2010
16. RDFa and MathML Annotations as Anchors for Interactive
Services/Mashups
Here: an example in mathematical lecture notes
In SAMSDocs: e.g. information on the processing state of a document
Christoph Lange: S EX+
T 16 September 1, 2010
17. Conclusions
• Software engineering documents: Contracts, Requirements, Mathematical
Models, Manuals
• Wanted to exploit these structures as Linked Data
• Formalized them from LTEX to S EX+ making project-specific semantic
A T
structures explicit
• We can define RDF vocabularies and annotate documents in the same language
document and ontology co-development (reduces formalization barriers)
• We obtain OMDoc+RDFa output, and further . . .
Plain RDF: SPARQL queries: How much is implemented/verified? How to
replace an employee? What needs to be re-certified?
XHTML+MathML+RDFa: hook interactive (lookup) services right into the
annotations
• Ongoing work: stabilize the setup (current focus: CS/math lecture notes)
Christoph Lange: S EX+
T 17 September 1, 2010
18. Catalin David, Michael Kohlhase, Christoph Lange, Florian Rabe, Nikita
Zhiltsov, and Vyacheslav Zholudev.
Publishing math lecture notes as linked data.
In Lora Aroyo, Grigoris Antoniou, Eero Hyv¨nen, Annette ten Teije, Heiner
o
Stuckenschmidt, Liliana Cabral, and Tania Tudorache, editors, The Semantic
Web: Research and Applications (Part II), number 6089 in Lecture Notes in
Computer Science, pages 370–375. Springer Verlag, 2010.
Udo Frese, Daniel Hausmann, Christoph L¨th, Holger T¨ubig, and Dennis
u a
Walter.
The importance of being formal.
In Hardi Hungar, editor, International Workshop on the Certification of
Safety-Critical Software Controlled Systems SafeCert’08, volume 238 of
Electronic Notes in Theoretical Computer Science, pages 57–70, September
2008.
Constantin Jucovschi and Michael Kohlhase.
sTeXIDE: An integrated development environment for sTeX collections.
In Serge Autexier, Jacques Calmet, David Delahaye, Patrick D. F. Ion,
Laurence Rideau, Renaud Rioboo, and Alan P. Sexton, editors, Intelligent
Computer Mathematics, number 6167 in LNAI. Springer Verlag, 2010.
Christoph Lange and Michael Kohlhase.
Christoph Lange: S EX+
T 17 September 1, 2010
19. A mathematical approach to ontology authoring and documentation.
In Jacques Carette, Lucas Dixon, Claudio Sacerdoti Coen, and Stephen M.
Watt, editors, MKM/Calculemus Proceedings, number 5625 in LNAI, pages
389–404. Springer Verlag, July 2009.
Bruce Miller.
A
LaTeXML: A LTEX to XML converter.
Web Manual at http://dlmf.nist.gov/LaTeXML/.
seen September2011.
Fran¸ois-Paul Servant.
c
Linking enterprise data.
In Christian Bizer, Tom Heath, Kingsley Idehen, and Tim Berners-Lee, editors,
Linked Data on the Web (LDOW), number 369 in CEUR Workshop
Proceedings, Aachen, April 2008.
Christoph Lange: S EX+
T 17 September 1, 2010