SlideShare a Scribd company logo
1 of 10
Download to read offline
Embedding semantic annotations
                     within texts: the FRETTA approach
                                    Gioele Barabucci - barabucc@cs.unibo.it
                            Silvio Peroni - essepuntato@cs.unibo.it
                                        Francesco Poggi - fpoggi@cs.unibo.it
                                              Fabio Vitali - fabio@cs.unibo.it




http://creativecommons.org/licenses/by-sa/3.0
Outline




•   Conversion from an XML format into another

•   Overlapping markup

•   Abstract conversion framework

•   FRETTA

•   Evaluation

•   Conclusions
Converting XML vocabularies that use
            syntactic workarounds
•   The conversion of OpenOffice Writer documents (ODT) into Microsoft Word
    documents (DOCX) (and vice versa) is not a straightforward operation

•   Converters exist and are included as core components of word processors

•   Those converters do not implement mechanisms for a full and effective document
    conversion, especially when particular features are needed – e.g., information tracking
    document changes occuring over time
What happens to markup
                                                    <text:tracked-changes>
                                                        <text:changed-region text:id="S1">
                                                        !    <text:insertion><office:change-info>
OpenOffice (ODT)




                                                        !    !   <dc:creator>John Smith</dc:creator>
                                                        !    !   <dc:date>2009-10-27T18:45:00</dc:date>
                        <text:p>                        !    </office:change-info></text:insertion>
                            The beginning               </text:changed-region>
                            and the end.            </text:tracked-changes>
                        </text:p>                   […]
                                                    <text:p>The beginning and
                                                    !   <text:change-start text:change-id="S1"/></text:p>
                                                    <text:p>also
                                                        <text:change-end text:change-id="S1"/>
                                                        the end.</text:p>
Microsoft Word (DOCX)




                                                    <w:p>
                                                    !   <w:pPr><w:rPr>
                        <w:p>
                                                    !   !   <w:ins w:id="0" w:author="John Smith"
                            <w:r>
                                                    !   !   !    w:date="2009-10-27T18:50:00Z"/>
                                <w:t>
                                                    !   </w:rPr></w:pPr>
                                    The beginning
                                                    !   <w:r><w:t>The beginning and </w:t></w:r></w:p>
                                    and the end.
                                                    <w:p>
                                </w:t>
                                                    !   <w:ins w:id="1" w:author="John Smith"
                            </w:r>
                                                    !   !   w:date="2009-10-27T18:50:00Z">
                        </w:p>
                                                    !   !   <w:r><w:t>also </w:t></w:r></w:ins>
                                                    !   <w:r><w:t>the end.</w:t></w:r></w:p>
Overlapping markup

•       Overlapping markup is needed when different markup items refer to the same
        document fragment
        Previous example in incorrect XML
        <p>The beginning and <ins></p>
        <p>also </ins> the end</p>

        XML formalisation via workarounds
        <p>The beginning and <ins start=”foo”/></p>
        <p>also <ins end=”foo”/>the end</p>

•       Different techniques to embed overlapping structures in XML hierarchies:
    ✦     milestones: a pair of empty elements representing the start and the end tags, connected to each other by
          special attributes
    ✦     fragmentation: elements separated within the primary hierarchy and connected to each other by special
          attributes
    ✦     twin documents: each hierarchy is represented by a different document which contains the same textual
          content
    ✦     stand-off: places overlapping elements in a separate resource (e.g. another file) specifying the position
          (down to the individual character) of each start and end location within the main structure
Abstract conversion framework


       XML format 1 with                                                   XML format 2 with
  overlapping workarounds                                              overlapping workarounds
(e.g., ODT + change tracking)                                      (e.g., DOCX + change tracking)


     Step1: Indentification of XML        Step2: Syntactic and
                                                                       Step3: Linearisation into
       overlapping workarounds           semantic conversion
                                                                         XML document with
     and creation of document with        from format 1 into
                                                                       overlapping workarounds
            explicit overlap                  format 2




XML document                 EARMARK                            EARMARK                XML document
format 1                     document                           document                    format 2
                              format 1                           format 2
      EARMARK is a non-XML markup metalanguage used as
                                                                      Today’s contribution
             intermediate language for the conversion.
    It allows markup structures to be organized both as trees
        and as generic graphs with no particular limitations.
FRETTA

 •   FRETTA (From EARMARK To Tag) is a general and extensible Java framework
     for expressing EARMARK documents in an embedded XML syntax

 •   Users that want to convert from EARMARK into XML document formats
     must indicate which workarounds are used in a certain target format

 •   Fretta performs the requested conversion passing through four different and
     consecutive steps
EARMARK
document                                                                                  XML document
             workaround            structural          semantic
                                                                          linearisation
             specification         conversion          conversion
       The user specifies Pure-structural conversion   Semantic conversion Generation of the
      which workaround       that produces a new      that may change the resulting XML tree
      to use to represent EARMARK document in current structure of the with the requested
        an (EARMARK)          which overlapping      EARMARK document        workarounds
      overlapping element elements are transformed   according to how the
            in XML        appropriately according to   target XML format
                          the specified workarounds   handles the specified
                                                          workarounds
Evaluation

•       Comparing FRETTA’s outputs
                                                    document       workarounds WF         V    N    M
        against a set of twelve TEI
        documents (TEIDocs) written by                agrippine     fragmentation    ✓    ✓    ✓     ✓
        markup experts                                agrippine       milestones     ✓    ✓    ✓     ✓
                                                     drivemycar     fragmentation    ✓    ✓    X     X
•       The evaluation took into account           johnlovesmary    fragmentation    ✓    ✓    ✓     ✓
        four different principles                  johnlovesmary      milestones     ✓    ✓    ✓     ✓
    ✦     well-formedness (WF): whether the
                                                          peergynt       fragmentation  ✓    ✓    ✓   ✓
          framework returns well-formed XML
          documents                                       peergynt         milestones   ✓    ✓    ✓   ✓
    ✦     validity (V): whether the framework returns peterpaulhammer      milestones   ✓    ✓    ✓   ✓
          valid XML documents according to the          thoughtalice     fragmentation  ✓    ✓    ✓   ✓
          particular target XML vocabulary                titwillow      fragmentation  ✓    ✓ X      ✓
    ✦     naturalness (N): how much the XML               titwillow      fragmentation  ✓    ✓ X      X
          documents returned by the framework are
          structurally similar to TEIDocs                 titwillow        milestones   ✓    ✓ X      ✓
    ✦     minimality (M): how much the amount of              100% well-formed and valid documents
          nodes (i.e., elements, attributes and text    67% continues to be natural (N) against TEIDocs
          nodes) in the XML documents returned by 83% continues to be minimal (M) against TEIDocs
          the framework varies from TEIDocs
Conclusions


•       Converting XML documents with overlaps expressed via XML
        workarounds is not a straightforward task

•       We propose an abstract framework to address this issue, composed of
        three consecutive steps

•       FRETTA implements the third step of the conversion framework. It
        enables one to convert any EARMARK document (that allows multiple
        overlapping hierarchies at the same time) into one or more embedded
        XML markup structures

•       Future works:
    ✦    developing algorithms that autonomously select the workarounds to adopt in the
         conversions
    ✦    integrating FRETTA in the broader framework for the semi-automatic and round-
         trip conversion from any supported XML format into another
Thanks for your attention

More Related Content

What's hot

Xml 215-presentation
Xml 215-presentationXml 215-presentation
Xml 215-presentation
philipsinter
 

What's hot (20)

XML-talk
XML-talkXML-talk
XML-talk
 
Xml 215-presentation
Xml 215-presentationXml 215-presentation
Xml 215-presentation
 
Full xml
Full xmlFull xml
Full xml
 
Xml
XmlXml
Xml
 
Wsdl1
Wsdl1Wsdl1
Wsdl1
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
 
XML Introduction
XML IntroductionXML Introduction
XML Introduction
 
Xml applications
Xml applicationsXml applications
Xml applications
 
Xml
XmlXml
Xml
 
XML-Extensible Markup Language
XML-Extensible Markup Language XML-Extensible Markup Language
XML-Extensible Markup Language
 
XML and XML Applications - Lecture 04 - Web Information Systems (WE-DINF-11912)
XML and XML Applications - Lecture 04 - Web Information Systems (WE-DINF-11912)XML and XML Applications - Lecture 04 - Web Information Systems (WE-DINF-11912)
XML and XML Applications - Lecture 04 - Web Information Systems (WE-DINF-11912)
 
paper about xml
paper about xmlpaper about xml
paper about xml
 
XML
XMLXML
XML
 
Introduction to xml
Introduction to xmlIntroduction to xml
Introduction to xml
 
Xml
XmlXml
Xml
 
XML
XMLXML
XML
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
 
XML - The Extensible Markup Language
XML - The Extensible Markup LanguageXML - The Extensible Markup Language
XML - The Extensible Markup Language
 
Markup Languages
Markup Languages Markup Languages
Markup Languages
 
Xml
XmlXml
Xml
 

Similar to Embedding semantic annotations within texts: the FRETTA approach

Similar to Embedding semantic annotations within texts: the FRETTA approach (20)

Exploring data models for heterogenous dialect data: the case of e​xplore.bre...
Exploring data models for heterogenous dialect data: the case of e​xplore.bre...Exploring data models for heterogenous dialect data: the case of e​xplore.bre...
Exploring data models for heterogenous dialect data: the case of e​xplore.bre...
 
XML/XSLT
XML/XSLTXML/XSLT
XML/XSLT
 
Processing XML with Java
Processing XML with JavaProcessing XML with Java
Processing XML with Java
 
Unit 10: XML and Beyond (Sematic Web, Web Services, ...)
Unit 10: XML and Beyond (Sematic Web, Web Services, ...)Unit 10: XML and Beyond (Sematic Web, Web Services, ...)
Unit 10: XML and Beyond (Sematic Web, Web Services, ...)
 
XML
XMLXML
XML
 
eXtensible Markup Language (XML)
eXtensible Markup Language (XML)eXtensible Markup Language (XML)
eXtensible Markup Language (XML)
 
Xml and DTD's
Xml and DTD'sXml and DTD's
Xml and DTD's
 
Xml schema
Xml schemaXml schema
Xml schema
 
Text based search engine on a fixed corpus and utilizing indexation and ranki...
Text based search engine on a fixed corpus and utilizing indexation and ranki...Text based search engine on a fixed corpus and utilizing indexation and ranki...
Text based search engine on a fixed corpus and utilizing indexation and ranki...
 
23xml
23xml23xml
23xml
 
Soap vs-rest
Soap vs-restSoap vs-rest
Soap vs-rest
 
XML(EXtensible Markup Language). XML(EXtensible Markup Language).pptppt
XML(EXtensible Markup Language). XML(EXtensible Markup Language).pptpptXML(EXtensible Markup Language). XML(EXtensible Markup Language).pptppt
XML(EXtensible Markup Language). XML(EXtensible Markup Language).pptppt
 
Unit 5 xml (1)
Unit 5   xml (1)Unit 5   xml (1)
Unit 5 xml (1)
 
Xml
XmlXml
Xml
 
[DSBW Spring 2010] Unit 10: XML and Web And beyond
[DSBW Spring 2010] Unit 10: XML and Web And beyond[DSBW Spring 2010] Unit 10: XML and Web And beyond
[DSBW Spring 2010] Unit 10: XML and Web And beyond
 
Web data management (chapter-1)
Web data management (chapter-1)Web data management (chapter-1)
Web data management (chapter-1)
 
1 xml fundamentals
1 xml fundamentals1 xml fundamentals
1 xml fundamentals
 
Ch2 neworder
Ch2 neworderCh2 neworder
Ch2 neworder
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
 
XML Pipelines
XML PipelinesXML Pipelines
XML Pipelines
 

More from University of Bologna

Bringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointersBringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointers
University of Bologna
 
The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...
University of Bologna
 
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
University of Bologna
 

More from University of Bologna (16)

The Initiative for Open Citations and the OpenCitations Corpus
The Initiative for Open Citations and the OpenCitations CorpusThe Initiative for Open Citations and the OpenCitations Corpus
The Initiative for Open Citations and the OpenCitations Corpus
 
OpenCitations
OpenCitationsOpenCitations
OpenCitations
 
A document-inspired way for tracking changes of RDF data - The case of the Op...
A document-inspired way for tracking changes of RDF data - The case of the Op...A document-inspired way for tracking changes of RDF data - The case of the Op...
A document-inspired way for tracking changes of RDF data - The case of the Op...
 
A Simplified Agile Methodology for Ontology Development
A Simplified Agile Methodology for Ontology DevelopmentA Simplified Agile Methodology for Ontology Development
A Simplified Agile Methodology for Ontology Development
 
FOOD: FOod in Open Data
FOOD: FOod in Open DataFOOD: FOod in Open Data
FOOD: FOod in Open Data
 
Freedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations ariseFreedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations arise
 
A pattern-based ontology for describing publishing workflows
A pattern-based ontology for describing publishing workflowsA pattern-based ontology for describing publishing workflows
A pattern-based ontology for describing publishing workflows
 
Semantic lenses to bring digital and semantic publishing together
Semantic lenses to bring digital and semantic publishing togetherSemantic lenses to bring digital and semantic publishing together
Semantic lenses to bring digital and semantic publishing together
 
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
 
Characterising citations in scholarly articles: an experiment
Characterising citations in scholarly articles: an experimentCharacterising citations in scholarly articles: an experiment
Characterising citations in scholarly articles: an experiment
 
Bringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointersBringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointers
 
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
 
Towards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citationsTowards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citations
 
The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...
 
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
 
Dealing with Markup Semantics
Dealing with Markup SemanticsDealing with Markup Semantics
Dealing with Markup Semantics
 

Recently uploaded

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Embedding semantic annotations within texts: the FRETTA approach

  • 1. Embedding semantic annotations within texts: the FRETTA approach Gioele Barabucci - barabucc@cs.unibo.it Silvio Peroni - essepuntato@cs.unibo.it Francesco Poggi - fpoggi@cs.unibo.it Fabio Vitali - fabio@cs.unibo.it http://creativecommons.org/licenses/by-sa/3.0
  • 2. Outline • Conversion from an XML format into another • Overlapping markup • Abstract conversion framework • FRETTA • Evaluation • Conclusions
  • 3. Converting XML vocabularies that use syntactic workarounds • The conversion of OpenOffice Writer documents (ODT) into Microsoft Word documents (DOCX) (and vice versa) is not a straightforward operation • Converters exist and are included as core components of word processors • Those converters do not implement mechanisms for a full and effective document conversion, especially when particular features are needed – e.g., information tracking document changes occuring over time
  • 4. What happens to markup <text:tracked-changes> <text:changed-region text:id="S1"> ! <text:insertion><office:change-info> OpenOffice (ODT) ! ! <dc:creator>John Smith</dc:creator> ! ! <dc:date>2009-10-27T18:45:00</dc:date> <text:p> ! </office:change-info></text:insertion> The beginning </text:changed-region> and the end. </text:tracked-changes> </text:p> […] <text:p>The beginning and ! <text:change-start text:change-id="S1"/></text:p> <text:p>also <text:change-end text:change-id="S1"/> the end.</text:p> Microsoft Word (DOCX) <w:p> ! <w:pPr><w:rPr> <w:p> ! ! <w:ins w:id="0" w:author="John Smith" <w:r> ! ! ! w:date="2009-10-27T18:50:00Z"/> <w:t> ! </w:rPr></w:pPr> The beginning ! <w:r><w:t>The beginning and </w:t></w:r></w:p> and the end. <w:p> </w:t> ! <w:ins w:id="1" w:author="John Smith" </w:r> ! ! w:date="2009-10-27T18:50:00Z"> </w:p> ! ! <w:r><w:t>also </w:t></w:r></w:ins> ! <w:r><w:t>the end.</w:t></w:r></w:p>
  • 5. Overlapping markup • Overlapping markup is needed when different markup items refer to the same document fragment Previous example in incorrect XML <p>The beginning and <ins></p> <p>also </ins> the end</p> XML formalisation via workarounds <p>The beginning and <ins start=”foo”/></p> <p>also <ins end=”foo”/>the end</p> • Different techniques to embed overlapping structures in XML hierarchies: ✦ milestones: a pair of empty elements representing the start and the end tags, connected to each other by special attributes ✦ fragmentation: elements separated within the primary hierarchy and connected to each other by special attributes ✦ twin documents: each hierarchy is represented by a different document which contains the same textual content ✦ stand-off: places overlapping elements in a separate resource (e.g. another file) specifying the position (down to the individual character) of each start and end location within the main structure
  • 6. Abstract conversion framework XML format 1 with XML format 2 with overlapping workarounds overlapping workarounds (e.g., ODT + change tracking) (e.g., DOCX + change tracking) Step1: Indentification of XML Step2: Syntactic and Step3: Linearisation into overlapping workarounds semantic conversion XML document with and creation of document with from format 1 into overlapping workarounds explicit overlap format 2 XML document EARMARK EARMARK XML document format 1 document document format 2 format 1 format 2 EARMARK is a non-XML markup metalanguage used as Today’s contribution intermediate language for the conversion. It allows markup structures to be organized both as trees and as generic graphs with no particular limitations.
  • 7. FRETTA • FRETTA (From EARMARK To Tag) is a general and extensible Java framework for expressing EARMARK documents in an embedded XML syntax • Users that want to convert from EARMARK into XML document formats must indicate which workarounds are used in a certain target format • Fretta performs the requested conversion passing through four different and consecutive steps EARMARK document XML document workaround structural semantic linearisation specification conversion conversion The user specifies Pure-structural conversion Semantic conversion Generation of the which workaround that produces a new that may change the resulting XML tree to use to represent EARMARK document in current structure of the with the requested an (EARMARK) which overlapping EARMARK document workarounds overlapping element elements are transformed according to how the in XML appropriately according to target XML format the specified workarounds handles the specified workarounds
  • 8. Evaluation • Comparing FRETTA’s outputs document workarounds WF V N M against a set of twelve TEI documents (TEIDocs) written by agrippine fragmentation ✓ ✓ ✓ ✓ markup experts agrippine milestones ✓ ✓ ✓ ✓ drivemycar fragmentation ✓ ✓ X X • The evaluation took into account johnlovesmary fragmentation ✓ ✓ ✓ ✓ four different principles johnlovesmary milestones ✓ ✓ ✓ ✓ ✦ well-formedness (WF): whether the peergynt fragmentation ✓ ✓ ✓ ✓ framework returns well-formed XML documents peergynt milestones ✓ ✓ ✓ ✓ ✦ validity (V): whether the framework returns peterpaulhammer milestones ✓ ✓ ✓ ✓ valid XML documents according to the thoughtalice fragmentation ✓ ✓ ✓ ✓ particular target XML vocabulary titwillow fragmentation ✓ ✓ X ✓ ✦ naturalness (N): how much the XML titwillow fragmentation ✓ ✓ X X documents returned by the framework are structurally similar to TEIDocs titwillow milestones ✓ ✓ X ✓ ✦ minimality (M): how much the amount of 100% well-formed and valid documents nodes (i.e., elements, attributes and text 67% continues to be natural (N) against TEIDocs nodes) in the XML documents returned by 83% continues to be minimal (M) against TEIDocs the framework varies from TEIDocs
  • 9. Conclusions • Converting XML documents with overlaps expressed via XML workarounds is not a straightforward task • We propose an abstract framework to address this issue, composed of three consecutive steps • FRETTA implements the third step of the conversion framework. It enables one to convert any EARMARK document (that allows multiple overlapping hierarchies at the same time) into one or more embedded XML markup structures • Future works: ✦ developing algorithms that autonomously select the workarounds to adopt in the conversions ✦ integrating FRETTA in the broader framework for the semi-automatic and round- trip conversion from any supported XML format into another
  • 10. Thanks for your attention