SlideShare une entreprise Scribd logo
1  sur  53
Télécharger pour lire hors ligne
XML in software development
  Technical overview




  Lars Marius Garshol
  Development Manager, Ontopia
  <larsga@ontopia.net>

  2004-09-29

© 2003-2004 Ontopia AS           1   http://www.ontopia.net/
Who speaks?

  •   Lars Marius Garshol
       – Development manager at Ontopia, and one of the founders
       – Author of Definitive XML Application Development, published by
           Prentice-Hall
       –   Wrote the xmlproc validating parser in Python
       –   Responsible for translation of SAX to Python
       –   Member of ISO SC34, which developed SGML
       –   Editor of parts of the topic map standard (ISO 13250-2 and 13250-3)
       –   Editor of the TMQL standard (topic map query language, ISO 18048)
       –   Maintainer of Free XML Tools web site
  •   Ontopia
       – Leading vendor of topic map software
       – “The Oracle of Topic Maps”
       – Norwegian company with partners world-wide

© 2003-2004 Ontopia AS                   2                         http://www.ontopia.net/
My personal XML history

  • Started with XML in 1997
       – started my MSc thesis on content management just as XML work
         was taking off
       – followed the XML process from the start
       – believed all the promises that XML would make it possible to find
         information and exchange anything with anyone
  • Now I work with topic maps
       –   XML turned out not to be what I was looking for
       –   many of the supporting standards I do not think good enough
       –   am now a bitter and disappointed man
       –   will return to this at the end




© 2003-2004 Ontopia AS                   3                        http://www.ontopia.net/
Overview

  • Introduction
  • XML and application architecture
       – impedance mismatch
       – web services
  • Common XML-related tasks
       – XML tools and standards
  • Conclusion




© 2003-2004 Ontopia AS             4   http://www.ontopia.net/
Introduction


                         What is XML really?
                         Data models
                         Interchange and storage




© 2003-2004 Ontopia AS                 5           http://www.ontopia.net/
XML is a way to represent data

  • XML is one of many ways to do this
  • XML is a data format (or syntax)
       – used when storing XML in files
       – also used when transmitting XML
  • XML has several data models
       – used in APIs, XML databases, and query languages
       – some support for this, not main usage




© 2003-2004 Ontopia AS                6                     http://www.ontopia.net/
Some data representations

  • Relational
       – tabular, rows and columns
       – used by relational databases
       – primary focus on storage, limited interchange with CSV files
  • Object-oriented
       –   objects with properties and methods
       –   used by most programming languages today
       –   primary focus on application-internal representation
       –   some interchange, also some database support
  • XML
       – tree of labeled nodes
       – primary focus on interchange
       – some database support

© 2003-2004 Ontopia AS                    7                        http://www.ontopia.net/
So, what is XML good for?

  • Well, it was created for documents...
       – <p>allows <term>mixed content</term>, which is unusual</p>
       – also strictly preserves order everywhere (except for attributes)
  • XML works very well for documents
  • XML also works for data
       – however, the document features make it more complicated than
         necessary for implementors
       – in use it is quite straightforward, though weak on references
       – for storage it is not optimal
       – for interchange it is still the best alternative




© 2003-2004 Ontopia AS                   8                          http://www.ontopia.net/
Why XML is good for interchange

  • Standard is done right
       –   short, implementable, precise, formal, readable, hackable
       –   everything is Unicode all the way: no internationalization problems
       –   Draconian error handling forces users to do things right
       –   schema languages make validation simple and effective
  • Everyone agrees on the standard
       – Microsoft, Sun, IBM, Oracle, you-name-it
  • Lots of high-quality tools
       – parsers tend to be fast, highly conformant, and robust
       – lots and lots of higher-level tools make life easier
       – tools available for all languages and platforms



© 2003-2004 Ontopia AS                    9                          http://www.ontopia.net/
XML and architecture


                         Traditional information systems
                         The impedance mismatch
                         An example XML application




© 2003-2004 Ontopia AS                   10                http://www.ontopia.net/
Information systems

  • Information-centric computing has traditionally been about
    information systems
  • Typically, these were clusters of applications with a database
    at the center
  • Originally, the business logic would reside in the database
  • With n-tier architecture it was encapsulated by an object
    layer
  • The basic concept has remained the same, however




© 2003-2004 Ontopia AS           11                     http://www.ontopia.net/
Traditional 2-tier architecture



                             Application #2
       Application #3




                                              Application #1

   Application #4        Database


© 2003-2004 Ontopia AS       12                  http://www.ontopia.net/
XML enters the picture




© 2003-2004 Ontopia AS    13   http://www.ontopia.net/
Impedance mismatch

  • The OO/RDBMS impedance mismatch                               Objects
       – object-oriented languages use objects with properties
       – RDBMSs use tables
       – these two data models do not match, and mapping
         between them requires substantial effort
  • Common solutions                                                   mismatch
       – attempt to isolate RDBMS interaction in an
         application module
       – use object-relational mapping tools
       – give up, just plunge in, and create a horrible mess
  • Conclusion
       – the problem is real, but with effort it can be handled
                                                                  RDBMS
© 2003-2004 Ontopia AS                   14                       http://www.ontopia.net/
A very common architecture
                         Objects                  <?xml version="1.0" encoding="iso-8859-1"?>
                                                  <!DOCTYPE spec PUBLIC "-//W3C//DTD Specification V2.1//EN"
                                                   "http://www.w3.org/XML/1998/06/xmlspec-v21.dtd" [
                                                  <!--ArborText, Inc., 1988-2000, v.4002-->
                                                  <!ENTITY http-ident "http://www.w3.org/TR/2000/REC-xml">
                                                  <!ENTITY draft.month "October">
                                                  <!ENTITY draft.day "6">
                                                  <!ENTITY iso6.doc.date "20001006">
                                                  <!ENTITY draft.year "2000">
                                                  <!ENTITY versionOfXML "1.0">
                                                  <!ENTITY pio "'&lt;?xml'">
                                                  <!ENTITY doc.date "10 February 1998">
                                                  <!ENTITY w3c.doc.date "02-Feb-1998">
                                                  <!ENTITY WebSGML "WebSGML Adaptations Annex to ISO 8879">
                                                  <!ENTITY pic "'?>'">
                                                  <!ENTITY br "n">
                                                  <!ENTITY cellback "#c0d9c0">
                                                  <!ENTITY mdash "--">
                                                  <!ENTITY com "--">




                                       mismatch
                                                  <!ENTITY como "--">
                                                  <!ENTITY comc "--">
                                                  <!ENTITY hcro "&amp;#x">
                                                  <!ENTITY nbsp "&#160;">
                                                  <!ENTITY magicents "<code>amp</code>,
                                                  <code>lt</code>,
                                                  <code>gt</code>,
                                                  <code>apos</code>,
                                                  <code>quot</code>">
                                                  <!ENTITY doc.audience "public review and discussion">
                                                  <!ENTITY doc.distribution "may be distributed freely, as long as
                                                  all text and legal notices remain intact">
                                                  ]>
                                                  <spec w3c-doctype="rec">




                            mismatch                            XML
                                       mismatch




                         RDBMS
© 2003-2004 Ontopia AS                 15                                                                 http://www.ontopia.net/
The brave new world of XML

  • Originally we had the OO-RDBMS mismatch
  • XML adds the OO-XML and XML-RDBMS mismatches
       – in other words: yet another issue for developers to deal with
  • Solutions are much the same
       – use data binding tools (we'll return to these)
       – restrict XML code to a specific module
       – give up and create a mess
  • Conclusion
       – interchange is complicated, and there is no silver bullet




© 2003-2004 Ontopia AS                   16                          http://www.ontopia.net/
But is XML really all bad?

  • Imagine an RDBMS/object interchange format
       – would support RDBMS/object export and import with no effort
       – already exists in XML form
       – still need transformations, because different applications are unlikely
         to use, or want to use, the same internal structure
  • XML enforces loose coupling of data
       – since internal and external representation are so different
  • Other benefits
       – it supports easy XML-to-XML transformations, which makes
         information interchange easier
       – it is human-readable, which makes debugging and understanding
         easier
  • However, life could still be easier than it is
© 2003-2004 Ontopia AS                   17                            http://www.ontopia.net/
So, what to do?

  • XML is already here
       – all the big vendors are pushing it
       – government standards and customers require it
       – the open source community has embraced it
  • In short, we just have to live with it now




© 2003-2004 Ontopia AS                18                 http://www.ontopia.net/
An example application

  • From January 2003 the EU required all member states to
    submit individual case safety reports for drugs
  • Basically, every time someone suffers side-effects from a
    drug, this is to be reported to EMEA in London
  • A standardized XML format is used for this
  • Ontopia developed the solution used by Norwegian
    authorities




© 2003-2004 Ontopia AS           19                    http://www.ontopia.net/
Architecture of the application




© 2003-2004 Ontopia AS   20        http://www.ontopia.net/
The internals of the application




© 2003-2004 Ontopia AS   21         http://www.ontopia.net/
The XML part




                                           Obj.model




                         Export   Import




© 2003-2004 Ontopia AS               22                http://www.ontopia.net/
Native XML databases

  • XML databases have been on the rise for the past few years
       – these are databases whose storage model is XML
       – in other words, they store XML directly
       – query languages tend to be XPath and/or XQuery
  • Reasons for using XML databases include
       – supports semi-structured data
       – may be faster when only specific views wanted (fewer joins)
       – well suited to document storage
  • Reasons not to use them are
       –   have to choose between poor architecture and impedance mismatch
       –   few mature products yet
       –   SQL and RDBMSs usually do the same job better
       –   unfamiliar technology

© 2003-2004 Ontopia AS                 23                         http://www.ontopia.net/
Using an XML database




© 2003-2004 Ontopia AS   24   http://www.ontopia.net/
Other considerations

  • Using an XML database would have simplified the regional
    applications
       – no need for the object model, since application is a simple editor
       – however, validation would have been somewhat awkward to add
  • The central application is different, however
       –   limited need for editing
       –   main need is advanced reporting
       –   advanced reporting means complex queries and joins
       –   XML databases are not well suited for this
       –   solution also needs support for replication, which few XML DBs have
  • Architecturally, this means going back to the 2-tier model
       – might work in this case, unlikely to work in the general case


© 2003-2004 Ontopia AS                   25                         http://www.ontopia.net/
A different kind of information system

  • RSS is
       – a simple XML format for newsfeeds
       – probably the simplest useful XML application there is
       – probably the most widespread XML application
  • Today there are
       – tens of thousands of RSS feeds
       – lots of news aggregation sites using RSS
       – lots of desktop tools for reading RSS feeds directly




© 2003-2004 Ontopia AS                  26                       http://www.ontopia.net/
Information system?
    topicmaps.bond.edu.au                                                                                                                            <?xml version="1.0" encoding="iso-8859-1"?>
                                                                                                                                                     <!DOCTYPE spec PUBLIC "-//W3C//DTD Specifi
                                                                                                                                                      "http://www.w3.org/XML/1998/06/xmlspec-v21.dtd" [
                                                                                                                                                     <!--ArborText, Inc., 1988-2000, v.4002-->
                                                                                                                                                     <!ENTITY http-ident "http://www.w3.org/TR/2000/
                                                                                                                                                     <!ENTITY draft.month "October">
                                                                                                                                                                                                           <?xml version="1.0" encoding="iso-8859-1"?>
                                                                                                                                                                                                           <!DOCTYPE spec PUBLIC "-//W3C//DTD Specifi
                                                                                                                                                                                                            "http://www.w3.org/XML/1998/06/xmlspec-v21.dtd" [
                                                                                                                                                                                                           <!--ArborText, Inc., 1988-2000, v.4002-->
                                                                                                                                                                                                           <!ENTITY http-ident "http://www.w3.org/TR/2000/
                                                                                                                                                                                                           <!ENTITY draft.month "October">
                                                                                                                                                                                                           <!ENTITY draft.day "6">
                                                                                                                                                     <!ENTITY draft.day "6">
                                                                                                                                                     <!ENTITY iso6.doc.date "20001006">                    <!ENTITY iso6.doc.date "20001006">
                                                                                                                                                     <!ENTITY draft.year "2000">                           <!ENTITY draft.year "2000">
                                                                                                                                                                                                           <!ENTITY versionOfXML "1.0">
                                                                                                                                                     <!ENTITY versionOfXML "1.0">
                                                                                                                                                     <!ENTITY pio "'&lt;?xml'">                            <!ENTITY pio "'&lt;?xml'">
                                                                                                                                                     <!ENTITY doc.date "10 February 1998">                 <!ENTITY doc.date "10 February 1998">
                                                                                                                                                                                                           <!ENTITY w3c.doc.date "02-Feb-1998">
                                                                                                                                                     <!ENTITY w3c.doc.date "02-Feb-1998">
                                                                                                                                                     <!ENTITY WebSGML "WebSGML Adaptations A               <!ENTITY WebSGML "WebSGML Adaptations A
                                                                                                                                                     <!ENTITY pic "'?>'">                                  <!ENTITY pic "'?>'">
                                                                                                                                                                                                           <!ENTITY br "n">
                                                                                                                                                     <!ENTITY br "n">
                                                                                                                                                     <!ENTITY cellback "#c0d9c0">                          <!ENTITY cellback "#c0d9c0">
                                                                                                                                                     <!ENTITY mdash "--">                                  <!ENTITY mdash "--">
                                                                                                                                                                                                           <!ENTITY com "--">
                                 <?xml version="1.0" encoding="iso-8859-1"?>          <?xml version="1.0" encoding="iso-8859-1"?>                    <!ENTITY com "--">
                                                                                                                                                     <!ENTITY como "--">                                   <!ENTITY como "--">
                                 <!DOCTYPE spec PUBLIC "-//W3C//DTD Specifi           <!DOCTYPE spec PUBLIC "-//W3C//DTD Specifi
                                                                                                                                                     <!ENTITY comc "--">                                   <!ENTITY comc "--">
                                  "http://www.w3.org/XML/1998/06/xmlspec-v21.dtd" [    "http://www.w3.org/XML/1998/06/xmlspec-v21.dtd" [




      Publishing
                                                                                                                                                                                                           <!ENTITY hcro "&amp;#x">
                                 <!--ArborText, Inc., 1988-2000, v.4002-->            <!--ArborText, Inc., 1988-2000, v.4002-->                      <!ENTITY hcro "&amp;#x">
                                                                                                                                                     <!ENTITY nbsp "&#160;">                               <!ENTITY nbsp "&#160;">
                                 <!ENTITY http-ident "http://www.w3.org/TR/2000/      <!ENTITY http-ident "http://www.w3.org/TR/2000/
                                 <!ENTITY draft.month "October">                      <!ENTITY draft.month "October">
                                 <!ENTITY draft.day "6">                              <!ENTITY draft.day "6">
                                 <!ENTITY iso6.doc.date "20001006">                   <!ENTITY iso6.doc.date "20001006">
                                 <!ENTITY draft.year "2000">                          <!ENTITY draft.year "2000">
                                 <!ENTITY versionOfXML "1.0">                         <!ENTITY versionOfXML "1.0">
                                 <!ENTITY pio "'&lt;?xml'">                           <!ENTITY pio "'&lt;?xml'">




      application
                                 <!ENTITY doc.date "10 February 1998">                <!ENTITY doc.date "10 February 1998">
                                 <!ENTITY w3c.doc.date "02-Feb-1998">                 <!ENTITY w3c.doc.date "02-Feb-1998">
                                 <!ENTITY WebSGML "WebSGML Adaptations A              <!ENTITY WebSGML "WebSGML Adaptations A
                                 <!ENTITY pic "'?>'">                                 <!ENTITY pic "'?>'">
                                 <!ENTITY br "n">                                    <!ENTITY br "n">
                                 <!ENTITY cellback "#c0d9c0">                         <!ENTITY cellback "#c0d9c0">
                                 <!ENTITY mdash "--">                                 <!ENTITY mdash "--">
                                 <!ENTITY com "--">                                   <!ENTITY com "--">
                                 <!ENTITY como "--">                                  <!ENTITY como "--">
                                 <!ENTITY comc "--">                                  <!ENTITY comc "--">
                                 <!ENTITY hcro "&amp;#x">                             <!ENTITY hcro "&amp;#x">
                                 <!ENTITY nbsp "&#160;">                              <!ENTITY nbsp "&#160;">




                            weblogs.rss weblogs.html




                    bloogz.com                                                                        weblogs.com



                         User desktop

                                                                 RSS                                                                         Web
                                                                reader                                                                     browser

© 2003-2004 Ontopia AS                                                                                         27                                                                                         http://www.ontopia.net/
Web services


                         What they are
                         The promise of web services




© 2003-2004 Ontopia AS                  28             http://www.ontopia.net/
What is a web service, anyway?

  • Basically any software service made available over http
       – must be intended to be invoked by another piece of software
       – line is somewhat blurry: is Google a web service? MapQuest?
  • Two schools of thought:
       – REST holds that http + XML has all that is needed
       – the SOAP camp wants special protocols and standards
  • In practice we see both
       – REST is good because it fits seamlessly into the existing web
       – SOAP is good because it has better tool support
  • Make your choice based on what is important for you




© 2003-2004 Ontopia AS                 29                         http://www.ontopia.net/
SOAP

  • Essentially a wrapper for XML messages
  • Consists of
       – a header (with routing information etc)
       – a body (which holds the message)
  • Very little is defined in terms of message structure
  • Effectively, SOAP encapsulates XML, and you must figure
    out how to deal with the XML yourself
       – this is not very different from how HTTP would encapsulate XML




© 2003-2004 Ontopia AS                  30                      http://www.ontopia.net/
Web services and architecture

                         Web service               <?xml version="1.0" encoding="iso-8859-1"?>
                                                   <!DOCTYPE spec PUBLIC "-//W3C//DTD Specification V2.1//EN"
                                                    "http://www.w3.org/XML/1998/06/xmlspec-v21.dtd" [
                                                   <!--ArborText, Inc., 1988-2000, v.4002-->
                                                   <!ENTITY http-ident "http://www.w3.org/TR/2000/REC-xml">
                                                   <!ENTITY draft.month "October">
                                                   <!ENTITY draft.day "6">
                                                   <!ENTITY iso6.doc.date "20001006">
                                                   <!ENTITY draft.year "2000">




                                       http
                                                   <!ENTITY versionOfXML "1.0">
                                                   <!ENTITY pio "'&lt;?xml'">
                                                   <!ENTITY doc.date "10 February 1998">
                                                   <!ENTITY w3c.doc.date "02-Feb-1998">
                                                   <!ENTITY WebSGML "WebSGML Adaptations Annex to ISO 8879">
                                                   <!ENTITY pic "'?>'">
                                                   <!ENTITY br "n">
                                                   <!ENTITY cellback "#c0d9c0">
                                                   <!ENTITY mdash "--">
                                                   <!ENTITY com "--">
                                                   <!ENTITY como "--">
                                                   <!ENTITY comc "--">
                                                   <!ENTITY hcro "&amp;#x">
                                                   <!ENTITY nbsp "&#160;">
                                                   <!ENTITY magicents "<code>amp</code>,
                                                   <code>lt</code>,
                                                   <code>gt</code>,
                                                   <code>apos</code>,
                                                   <code>quot</code>">
                                                   <!ENTITY doc.audience "public review and discussion">
                                                   <!ENTITY doc.distribution "may be distributed freely, as long as
                                                   all text and legal notices remain intact">
                                                   ]>
                                                   <spec w3c-doctype="rec">




                                                                 XML




© 2003-2004 Ontopia AS                        31                                                          http://www.ontopia.net/
The promise of web services

  • Connect legacy applications
  • Create services anyone can connect to and use
  • Integrate disparate applications across the enterprise
  • Publish your service in a web service marketplace
       – people can find it using UDDI and bind to it dynamically with WSDL
       – you will, of course, charge them for this




© 2003-2004 Ontopia AS                 32                        http://www.ontopia.net/
A word of caution

  • We've heard all this before
  • CORBA was widely touted as doing the same thing in the
    '90s
       –   applications connecting to each other over the net
       –   CORBA as the enterprise-wide “bus” connecting all applications
       –   directory services and dynamic service binding
       –   component brokers and online trading
  • CORBA did the first, but not the last three
       – political, economic, and legal issues intruded
       – information integration turns out to be difficult
       – dynamic service binding was harder than anyone thought
  • In short, exposing services on the net works
       – be skeptical about the rest
© 2003-2004 Ontopia AS                  33                         http://www.ontopia.net/
Another caution

  • Integrating applications is not really the issue
       – what is necessary is to integrate the information
       – XML is about information, but it's not really designed for integration
  • XML has no notion of identity
       – no way to say when two elements represent the same thing
       – nothing tells you what to do when two elements do represent the
         same thing
  • Knowledge technologies are about identity
       – they have rules for identity and merging
       – better suited for information integration
       – thus also for application integration




© 2003-2004 Ontopia AS                   34                          http://www.ontopia.net/
What web services are, second try

  • Web services are an idea more than anything else
  • In some cases new technology makes it easier to apply
  • The idea is what matters, however
       – seeing the possibilities and trying to make use of them
       – which way you do it always matters less than doing it




© 2003-2004 Ontopia AS                  35                         http://www.ontopia.net/
Web services?
    topicmaps.bond.edu.au

                                 <?xml version="1.0" encoding="iso-8859-1"?>          <?xml version="1.0" encoding="iso-8859-1"?>
                                 <!DOCTYPE spec PUBLIC "-//W3C//DTD Specifi           <!DOCTYPE spec PUBLIC "-//W3C//DTD Specifi
                                  "http://www.w3.org/XML/1998/06/xmlspec-v21.dtd" [    "http://www.w3.org/XML/1998/06/xmlspec-v21.dtd" [




      Publishing
                                 <!--ArborText, Inc., 1988-2000, v.4002-->            <!--ArborText, Inc., 1988-2000, v.4002-->
                                 <!ENTITY http-ident "http://www.w3.org/TR/2000/      <!ENTITY http-ident "http://www.w3.org/TR/2000/
                                 <!ENTITY draft.month "October">                      <!ENTITY draft.month "October">
                                 <!ENTITY draft.day "6">                              <!ENTITY draft.day "6">
                                 <!ENTITY iso6.doc.date "20001006">                   <!ENTITY iso6.doc.date "20001006">
                                 <!ENTITY draft.year "2000">                          <!ENTITY draft.year "2000">
                                 <!ENTITY versionOfXML "1.0">                         <!ENTITY versionOfXML "1.0">
                                 <!ENTITY pio "'&lt;?xml'">                           <!ENTITY pio "'&lt;?xml'">




      application
                                 <!ENTITY doc.date "10 February 1998">                <!ENTITY doc.date "10 February 1998">
                                 <!ENTITY w3c.doc.date "02-Feb-1998">                 <!ENTITY w3c.doc.date "02-Feb-1998">
                                 <!ENTITY WebSGML "WebSGML Adaptations A              <!ENTITY WebSGML "WebSGML Adaptations A
                                 <!ENTITY pic "'?>'">                                 <!ENTITY pic "'?>'">
                                 <!ENTITY br "n">                                    <!ENTITY br "n">
                                 <!ENTITY cellback "#c0d9c0">                         <!ENTITY cellback "#c0d9c0">
                                 <!ENTITY mdash "--">                                 <!ENTITY mdash "--">
                                 <!ENTITY com "--">                                   <!ENTITY com "--">
                                 <!ENTITY como "--">                                  <!ENTITY como "--">
                                 <!ENTITY comc "--">                                  <!ENTITY comc "--">
                                 <!ENTITY hcro "&amp;#x">                             <!ENTITY hcro "&amp;#x">
                                 <!ENTITY nbsp "&#160;">                              <!ENTITY nbsp "&#160;">




                            weblogs.rss weblogs.html




                    bloogz.com                                                                        weblogs.com



                         User desktop

                                                                 RSS                                                                         Web
                                                                reader                                                                     browser

© 2003-2004 Ontopia AS                                                                                         36                                    http://www.ontopia.net/
Common XML challenges


                         Import/export
                         Important groups of tools
                         Validation
                         Using XML databases




© 2003-2004 Ontopia AS                   37          http://www.ontopia.net/
Deserialization

  • That is, building an object structure from XML
  • Usually involves some level of validation as well
  • Several ways to do this
       –   use SAX, which is low-level but fast
       –   use DOM, which is high-level and awful
       –   use XPath, which lets you extract information easily
       –   use a data binding tool




© 2003-2004 Ontopia AS                   38                       http://www.ontopia.net/
SAX

  • Standard for event-based parser APIs
       –   passes the document to the application piece by piece
       –   somewhat like staring at a parade through a keyhole
       –   very fast, consumes no memory at all
       –   suitable for applications where
             • documents may be big
             • documents require heavy processing

  • De-facto standard created by self-appointed group
       – supported by pretty much every parser there is
       – effectively the foundation for all XML work in Java
       – less standardized in other languages




© 2003-2004 Ontopia AS                    39                       http://www.ontopia.net/
DOM

  • Presents the document as an object structure
  • W3C Recommendation
       – widely supported and widely derided
       – in most programming languages better alternatives are found
       – in Java JDOM and XOM are good alternatives
  • Downsides
       – this approach requires the entire document to be loaded into memory
       – using an API is awkward, whether tree-based or event-based




© 2003-2004 Ontopia AS                 40                        http://www.ontopia.net/
SAX vs DOM

  • Or, rather, event-based vs tree-based
       – most XML technologies use one of these two approaches
       – understanding the difference is important in order to choose correctly
  • Essentially the difference is this
       –   event-based solutions require less resources
       –   however, they make many common operations too hard to be practical
       –   tree-based solutions are slower and use more memory
       –   but there is no limit on what you can do
  • Which approach is the right one depends on the requirements




© 2003-2004 Ontopia AS                  41                         http://www.ontopia.net/
XPath

  • A simple query language for XML
       – remarkably simple to learn given its expressive power
       – graph-traversal semantics
  • Simplifies extracting information from XML enormously
       – probably the single most important XML specification
       – used in query languages, mapping tools, schema languages, ...
  • Much less powerful than SQL
       – can't return structured results, only a list of values
       – limited support for handling reference relationships
       – no support for aggregate functions




© 2003-2004 Ontopia AS                    42                      http://www.ontopia.net/
Data binding tools

  • Tools that simplify serialization and deserialization
       – automate as much as possible of those tasks
       – some generate the object model for you
       – others let you map the XML to your object model
  • Most such tools have limitations
       –   no support for mixed content
       –   no support for element order
       –   ignore comments, processing instructions, and entities
       –   limited support for references
  • When suitable they can simplify development considerably
       – some event-based, others tree-based
  • The biggest hurdle is picking the right one and learning it

© 2003-2004 Ontopia AS                   43                         http://www.ontopia.net/
Validation
  • Validation is to ensure the correctness of incoming data
       –   that every <person> has a <birth-date>
       –   that every <birth-date> is a valid date
       –   that every <death-date> is later than the <birth-date>
       –   that the <birth-date> is actually correct
       –   ...
  • These three constraints can be grouped into
       –   structural constraints
       –   type constraints
       –   “semantic” constraints
       –   “existential” constraint
  • Schema languages can be used to define the first two
       – application logic must usually be used for the semantic ones
       – the last category is for human beings
© 2003-2004 Ontopia AS                    44                        http://www.ontopia.net/
Schema languages

  • DTDs
       – part of XML 1.0, but only supports structural constraints
       – serious problem: the document says which schema to use
  • XML Schema
       – has both structural and type constraints
       – W3C Recommendation, widely supported and widely criticized
  • RELAX-NG
       – has very strong structural and type constraints
       – ISO standard, growing support, and widely praised
  • Schematron
       – weak structural and type constraints, strong on semantic constraints
       – constraints specified with XPath
       – about to become an ISO standard
© 2003-2004 Ontopia AS                  45                        http://www.ontopia.net/
Serialization

  • The opposite of deserialization: writing XML from objects
  • Straightforward, but some pitfalls
       – remember to quote special XML characters everywhere
       – handling character encodings correctly
       – handling namespaces correctly
  • Validation usually part of testing, but otherwise not an issue
       – one assumes the object structure is already valid
  • Again several ways to do it
       –   use simple print statements, and do all the above yourself
       –   use a SAX2XML tool, which will handle the above for you
       –   build a DOM instance, then write it out (slow and awkward)
       –   use a data binding tool (has limitations)

© 2003-2004 Ontopia AS                   46                        http://www.ontopia.net/
Importing XML to an RDBMS

  • A form of deserialization, but with issues of its own
  • Typical issues are
       –   how to represent mixed content, if allowed
       –   dealing with referential integrity
       –   data typing
       –   recognizing null values
       –   validation
  • Again, there are many ways to do this
       – just hack it in
       – having an XML-to-OO mapper and an OO-to-RDBMS mapper
       – using a data binding tool



© 2003-2004 Ontopia AS                   47                 http://www.ontopia.net/
Writing XML from an RDBMS

  • A special kind of serialization
  • Much easier than going the other way
  • Main problem is matching the desired output format
  • Several tools to do this
       – template-based approaches where SQL is embedded in the XML
       – extensions to SQL that allow XML element constructors in SELECT
       – some allow XSLT transformations of the initial output




© 2003-2004 Ontopia AS                48                       http://www.ontopia.net/
XQuery

  • The query language for XML databases in the future
  • Embeds XPath inside a functional programming language
  • Much more powerful than XPath
  • Progress on XQuery is slow, but language highly regarded
       – XPath 2.0, which is used in XQuery, has usability problems with the
         data typing approach taken
  • Likely to become an important tool in the future




© 2003-2004 Ontopia AS                 49                         http://www.ontopia.net/
SQL/XML

  • ISO SC32 is working on adding XML support to SQL
       – this involves columns whose data type is XML
       – one assumes XPath expressions can be applied to these
       – probably also support for XML output
  • RDBMS vendors are committed to this
  • SQL/XML is likely to be a key building block in the future
       – simplifies XML storage in databases
       – does not, however, remove the impedance mismatch
  • SQL/XML may well become an XQuery killer




© 2003-2004 Ontopia AS               50                          http://www.ontopia.net/
Wrapping up


                         What XML means for developers
                         Resources to learn more




© 2003-2004 Ontopia AS                 51                http://www.ontopia.net/
XML and software development

  • The possibilities for interchange and integration are not new
       – XML makes them easier to achieve
       – XML makes us think of these possibilities in ways we didn't before
  • In practice, this means more work for developers
       – new lists of acronyms to learn and master
       – new kinds of tasks compared to earlier
  • XML makes life harder, but it's worth it




© 2003-2004 Ontopia AS                  52                         http://www.ontopia.net/
Where to learn more

  • http://www.xml.com
  • http://www.cafeconleche.org
  • The XML-DEV mailing list
  • http://www.w3.org/TR/
  • “Definitive XML Application Development” by me, published
    by Prentice-Hall




© 2003-2004 Ontopia AS            53                 http://www.ontopia.net/

Contenu connexe

Similaire à XML in software development

Metadata for web ontologies and rules: current practices and perspectives
Metadata for web ontologies and rules: current practices and perspectivesMetadata for web ontologies and rules: current practices and perspectives
Metadata for web ontologies and rules: current practices and perspectivesCarlos Tejo-Alonso
 
Web Introduction
Web IntroductionWeb Introduction
Web Introductionasim78
 
NASA SensorWeb Enterprise Services
NASA SensorWeb Enterprise ServicesNASA SensorWeb Enterprise Services
NASA SensorWeb Enterprise ServicesPat Cappelaere
 
MySQL JSON Document Store - A Document Store with all the benefits of a Trans...
MySQL JSON Document Store - A Document Store with all the benefits of a Trans...MySQL JSON Document Store - A Document Store with all the benefits of a Trans...
MySQL JSON Document Store - A Document Store with all the benefits of a Trans...Olivier DASINI
 
Introduction to Web Standards
Introduction to Web StandardsIntroduction to Web Standards
Introduction to Web StandardsJussi Pohjolainen
 
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Lucas Jellema
 
OrientDB the database for the web 1.1
OrientDB the database for the web 1.1OrientDB the database for the web 1.1
OrientDB the database for the web 1.1Luca Garulli
 
O2 presentation jan 09 - v1.00
O2  presentation   jan 09 - v1.00O2  presentation   jan 09 - v1.00
O2 presentation jan 09 - v1.00Dinis Cruz
 
Why SKOS should be a Focal Point of your Linked Data Strategy
Why SKOS should be a Focal Point of your Linked Data StrategyWhy SKOS should be a Focal Point of your Linked Data Strategy
Why SKOS should be a Focal Point of your Linked Data StrategySemantic Web Company
 
WebGUI And The Semantic Web
WebGUI And The Semantic WebWebGUI And The Semantic Web
WebGUI And The Semantic WebWilliam McKee
 
MySQL Day Paris 2018 - MySQL JSON Document Store
MySQL Day Paris 2018 - MySQL JSON Document StoreMySQL Day Paris 2018 - MySQL JSON Document Store
MySQL Day Paris 2018 - MySQL JSON Document StoreOlivier DASINI
 
OrientDB for real & Web App development
OrientDB for real & Web App developmentOrientDB for real & Web App development
OrientDB for real & Web App developmentLuca Garulli
 
Intro to XML in libraries
Intro to XML in librariesIntro to XML in libraries
Intro to XML in librariesKyle Banerjee
 

Similaire à XML in software development (20)

Introduction to JavaScript
Introduction to JavaScriptIntroduction to JavaScript
Introduction to JavaScript
 
Metadata for web ontologies and rules: current practices and perspectives
Metadata for web ontologies and rules: current practices and perspectivesMetadata for web ontologies and rules: current practices and perspectives
Metadata for web ontologies and rules: current practices and perspectives
 
Web Introduction
Web IntroductionWeb Introduction
Web Introduction
 
NASA SensorWeb Enterprise Services
NASA SensorWeb Enterprise ServicesNASA SensorWeb Enterprise Services
NASA SensorWeb Enterprise Services
 
MySQL JSON Document Store - A Document Store with all the benefits of a Trans...
MySQL JSON Document Store - A Document Store with all the benefits of a Trans...MySQL JSON Document Store - A Document Store with all the benefits of a Trans...
MySQL JSON Document Store - A Document Store with all the benefits of a Trans...
 
XML Pipelines
XML PipelinesXML Pipelines
XML Pipelines
 
Intro to Web Standards
Intro to Web StandardsIntro to Web Standards
Intro to Web Standards
 
Metadata is back!
Metadata is back!Metadata is back!
Metadata is back!
 
Introduction to Web Standards
Introduction to Web StandardsIntroduction to Web Standards
Introduction to Web Standards
 
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
 
OrientDB the database for the web 1.1
OrientDB the database for the web 1.1OrientDB the database for the web 1.1
OrientDB the database for the web 1.1
 
Linked Open Data & XIMDEX CMS (OKIOconf 2014)
Linked Open Data & XIMDEX CMS (OKIOconf 2014)Linked Open Data & XIMDEX CMS (OKIOconf 2014)
Linked Open Data & XIMDEX CMS (OKIOconf 2014)
 
O2 presentation jan 09 - v1.00
O2  presentation   jan 09 - v1.00O2  presentation   jan 09 - v1.00
O2 presentation jan 09 - v1.00
 
Why SKOS should be a Focal Point of your Linked Data Strategy
Why SKOS should be a Focal Point of your Linked Data StrategyWhy SKOS should be a Focal Point of your Linked Data Strategy
Why SKOS should be a Focal Point of your Linked Data Strategy
 
WebGUI And The Semantic Web
WebGUI And The Semantic WebWebGUI And The Semantic Web
WebGUI And The Semantic Web
 
MySQL Day Paris 2018 - MySQL JSON Document Store
MySQL Day Paris 2018 - MySQL JSON Document StoreMySQL Day Paris 2018 - MySQL JSON Document Store
MySQL Day Paris 2018 - MySQL JSON Document Store
 
OrientDB for real & Web App development
OrientDB for real & Web App developmentOrientDB for real & Web App development
OrientDB for real & Web App development
 
Intro to XML in libraries
Intro to XML in librariesIntro to XML in libraries
Intro to XML in libraries
 
Sem web tutorial general
Sem web tutorial generalSem web tutorial general
Sem web tutorial general
 
Markup As An Api
Markup As An ApiMarkup As An Api
Markup As An Api
 

Plus de Lars Marius Garshol

JSLT: JSON querying and transformation
JSLT: JSON querying and transformationJSLT: JSON querying and transformation
JSLT: JSON querying and transformationLars Marius Garshol
 
Data collection in AWS at Schibsted
Data collection in AWS at SchibstedData collection in AWS at Schibsted
Data collection in AWS at SchibstedLars Marius Garshol
 
NoSQL and Einstein's theory of relativity
NoSQL and Einstein's theory of relativityNoSQL and Einstein's theory of relativity
NoSQL and Einstein's theory of relativityLars Marius Garshol
 
Using the search engine as recommendation engine
Using the search engine as recommendation engineUsing the search engine as recommendation engine
Using the search engine as recommendation engineLars Marius Garshol
 
Linked Open Data for the Cultural Sector
Linked Open Data for the Cultural SectorLinked Open Data for the Cultural Sector
Linked Open Data for the Cultural SectorLars Marius Garshol
 
NoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativityNoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativityLars Marius Garshol
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningLars Marius Garshol
 
Hafslund SESAM - Semantic integration in practice
Hafslund SESAM - Semantic integration in practiceHafslund SESAM - Semantic integration in practice
Hafslund SESAM - Semantic integration in practiceLars Marius Garshol
 

Plus de Lars Marius Garshol (20)

JSLT: JSON querying and transformation
JSLT: JSON querying and transformationJSLT: JSON querying and transformation
JSLT: JSON querying and transformation
 
Data collection in AWS at Schibsted
Data collection in AWS at SchibstedData collection in AWS at Schibsted
Data collection in AWS at Schibsted
 
Kveik - what is it?
Kveik - what is it?Kveik - what is it?
Kveik - what is it?
 
Nature-inspired algorithms
Nature-inspired algorithmsNature-inspired algorithms
Nature-inspired algorithms
 
Collecting 600M events/day
Collecting 600M events/dayCollecting 600M events/day
Collecting 600M events/day
 
History of writing
History of writingHistory of writing
History of writing
 
NoSQL and Einstein's theory of relativity
NoSQL and Einstein's theory of relativityNoSQL and Einstein's theory of relativity
NoSQL and Einstein's theory of relativity
 
Norwegian farmhouse ale
Norwegian farmhouse aleNorwegian farmhouse ale
Norwegian farmhouse ale
 
Archive integration with RDF
Archive integration with RDFArchive integration with RDF
Archive integration with RDF
 
The Euro crisis in 10 minutes
The Euro crisis in 10 minutesThe Euro crisis in 10 minutes
The Euro crisis in 10 minutes
 
Using the search engine as recommendation engine
Using the search engine as recommendation engineUsing the search engine as recommendation engine
Using the search engine as recommendation engine
 
Linked Open Data for the Cultural Sector
Linked Open Data for the Cultural SectorLinked Open Data for the Cultural Sector
Linked Open Data for the Cultural Sector
 
NoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativityNoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativity
 
Bitcoin - digital gold
Bitcoin - digital goldBitcoin - digital gold
Bitcoin - digital gold
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
Hops - the green gold
Hops - the green goldHops - the green gold
Hops - the green gold
 
Big data 101
Big data 101Big data 101
Big data 101
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Hafslund SESAM - Semantic integration in practice
Hafslund SESAM - Semantic integration in practiceHafslund SESAM - Semantic integration in practice
Hafslund SESAM - Semantic integration in practice
 
Approximate string comparators
Approximate string comparatorsApproximate string comparators
Approximate string comparators
 

Dernier

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Dernier (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

XML in software development

  • 1. XML in software development Technical overview Lars Marius Garshol Development Manager, Ontopia <larsga@ontopia.net> 2004-09-29 © 2003-2004 Ontopia AS 1 http://www.ontopia.net/
  • 2. Who speaks? • Lars Marius Garshol – Development manager at Ontopia, and one of the founders – Author of Definitive XML Application Development, published by Prentice-Hall – Wrote the xmlproc validating parser in Python – Responsible for translation of SAX to Python – Member of ISO SC34, which developed SGML – Editor of parts of the topic map standard (ISO 13250-2 and 13250-3) – Editor of the TMQL standard (topic map query language, ISO 18048) – Maintainer of Free XML Tools web site • Ontopia – Leading vendor of topic map software – “The Oracle of Topic Maps” – Norwegian company with partners world-wide © 2003-2004 Ontopia AS 2 http://www.ontopia.net/
  • 3. My personal XML history • Started with XML in 1997 – started my MSc thesis on content management just as XML work was taking off – followed the XML process from the start – believed all the promises that XML would make it possible to find information and exchange anything with anyone • Now I work with topic maps – XML turned out not to be what I was looking for – many of the supporting standards I do not think good enough – am now a bitter and disappointed man – will return to this at the end © 2003-2004 Ontopia AS 3 http://www.ontopia.net/
  • 4. Overview • Introduction • XML and application architecture – impedance mismatch – web services • Common XML-related tasks – XML tools and standards • Conclusion © 2003-2004 Ontopia AS 4 http://www.ontopia.net/
  • 5. Introduction What is XML really? Data models Interchange and storage © 2003-2004 Ontopia AS 5 http://www.ontopia.net/
  • 6. XML is a way to represent data • XML is one of many ways to do this • XML is a data format (or syntax) – used when storing XML in files – also used when transmitting XML • XML has several data models – used in APIs, XML databases, and query languages – some support for this, not main usage © 2003-2004 Ontopia AS 6 http://www.ontopia.net/
  • 7. Some data representations • Relational – tabular, rows and columns – used by relational databases – primary focus on storage, limited interchange with CSV files • Object-oriented – objects with properties and methods – used by most programming languages today – primary focus on application-internal representation – some interchange, also some database support • XML – tree of labeled nodes – primary focus on interchange – some database support © 2003-2004 Ontopia AS 7 http://www.ontopia.net/
  • 8. So, what is XML good for? • Well, it was created for documents... – <p>allows <term>mixed content</term>, which is unusual</p> – also strictly preserves order everywhere (except for attributes) • XML works very well for documents • XML also works for data – however, the document features make it more complicated than necessary for implementors – in use it is quite straightforward, though weak on references – for storage it is not optimal – for interchange it is still the best alternative © 2003-2004 Ontopia AS 8 http://www.ontopia.net/
  • 9. Why XML is good for interchange • Standard is done right – short, implementable, precise, formal, readable, hackable – everything is Unicode all the way: no internationalization problems – Draconian error handling forces users to do things right – schema languages make validation simple and effective • Everyone agrees on the standard – Microsoft, Sun, IBM, Oracle, you-name-it • Lots of high-quality tools – parsers tend to be fast, highly conformant, and robust – lots and lots of higher-level tools make life easier – tools available for all languages and platforms © 2003-2004 Ontopia AS 9 http://www.ontopia.net/
  • 10. XML and architecture Traditional information systems The impedance mismatch An example XML application © 2003-2004 Ontopia AS 10 http://www.ontopia.net/
  • 11. Information systems • Information-centric computing has traditionally been about information systems • Typically, these were clusters of applications with a database at the center • Originally, the business logic would reside in the database • With n-tier architecture it was encapsulated by an object layer • The basic concept has remained the same, however © 2003-2004 Ontopia AS 11 http://www.ontopia.net/
  • 12. Traditional 2-tier architecture Application #2 Application #3 Application #1 Application #4 Database © 2003-2004 Ontopia AS 12 http://www.ontopia.net/
  • 13. XML enters the picture © 2003-2004 Ontopia AS 13 http://www.ontopia.net/
  • 14. Impedance mismatch • The OO/RDBMS impedance mismatch Objects – object-oriented languages use objects with properties – RDBMSs use tables – these two data models do not match, and mapping between them requires substantial effort • Common solutions mismatch – attempt to isolate RDBMS interaction in an application module – use object-relational mapping tools – give up, just plunge in, and create a horrible mess • Conclusion – the problem is real, but with effort it can be handled RDBMS © 2003-2004 Ontopia AS 14 http://www.ontopia.net/
  • 15. A very common architecture Objects <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE spec PUBLIC "-//W3C//DTD Specification V2.1//EN" "http://www.w3.org/XML/1998/06/xmlspec-v21.dtd" [ <!--ArborText, Inc., 1988-2000, v.4002--> <!ENTITY http-ident "http://www.w3.org/TR/2000/REC-xml"> <!ENTITY draft.month "October"> <!ENTITY draft.day "6"> <!ENTITY iso6.doc.date "20001006"> <!ENTITY draft.year "2000"> <!ENTITY versionOfXML "1.0"> <!ENTITY pio "'&lt;?xml'"> <!ENTITY doc.date "10 February 1998"> <!ENTITY w3c.doc.date "02-Feb-1998"> <!ENTITY WebSGML "WebSGML Adaptations Annex to ISO 8879"> <!ENTITY pic "'?>'"> <!ENTITY br "n"> <!ENTITY cellback "#c0d9c0"> <!ENTITY mdash "--"> <!ENTITY com "--"> mismatch <!ENTITY como "--"> <!ENTITY comc "--"> <!ENTITY hcro "&amp;#x"> <!ENTITY nbsp "&#160;"> <!ENTITY magicents "<code>amp</code>, <code>lt</code>, <code>gt</code>, <code>apos</code>, <code>quot</code>"> <!ENTITY doc.audience "public review and discussion"> <!ENTITY doc.distribution "may be distributed freely, as long as all text and legal notices remain intact"> ]> <spec w3c-doctype="rec"> mismatch XML mismatch RDBMS © 2003-2004 Ontopia AS 15 http://www.ontopia.net/
  • 16. The brave new world of XML • Originally we had the OO-RDBMS mismatch • XML adds the OO-XML and XML-RDBMS mismatches – in other words: yet another issue for developers to deal with • Solutions are much the same – use data binding tools (we'll return to these) – restrict XML code to a specific module – give up and create a mess • Conclusion – interchange is complicated, and there is no silver bullet © 2003-2004 Ontopia AS 16 http://www.ontopia.net/
  • 17. But is XML really all bad? • Imagine an RDBMS/object interchange format – would support RDBMS/object export and import with no effort – already exists in XML form – still need transformations, because different applications are unlikely to use, or want to use, the same internal structure • XML enforces loose coupling of data – since internal and external representation are so different • Other benefits – it supports easy XML-to-XML transformations, which makes information interchange easier – it is human-readable, which makes debugging and understanding easier • However, life could still be easier than it is © 2003-2004 Ontopia AS 17 http://www.ontopia.net/
  • 18. So, what to do? • XML is already here – all the big vendors are pushing it – government standards and customers require it – the open source community has embraced it • In short, we just have to live with it now © 2003-2004 Ontopia AS 18 http://www.ontopia.net/
  • 19. An example application • From January 2003 the EU required all member states to submit individual case safety reports for drugs • Basically, every time someone suffers side-effects from a drug, this is to be reported to EMEA in London • A standardized XML format is used for this • Ontopia developed the solution used by Norwegian authorities © 2003-2004 Ontopia AS 19 http://www.ontopia.net/
  • 20. Architecture of the application © 2003-2004 Ontopia AS 20 http://www.ontopia.net/
  • 21. The internals of the application © 2003-2004 Ontopia AS 21 http://www.ontopia.net/
  • 22. The XML part Obj.model Export Import © 2003-2004 Ontopia AS 22 http://www.ontopia.net/
  • 23. Native XML databases • XML databases have been on the rise for the past few years – these are databases whose storage model is XML – in other words, they store XML directly – query languages tend to be XPath and/or XQuery • Reasons for using XML databases include – supports semi-structured data – may be faster when only specific views wanted (fewer joins) – well suited to document storage • Reasons not to use them are – have to choose between poor architecture and impedance mismatch – few mature products yet – SQL and RDBMSs usually do the same job better – unfamiliar technology © 2003-2004 Ontopia AS 23 http://www.ontopia.net/
  • 24. Using an XML database © 2003-2004 Ontopia AS 24 http://www.ontopia.net/
  • 25. Other considerations • Using an XML database would have simplified the regional applications – no need for the object model, since application is a simple editor – however, validation would have been somewhat awkward to add • The central application is different, however – limited need for editing – main need is advanced reporting – advanced reporting means complex queries and joins – XML databases are not well suited for this – solution also needs support for replication, which few XML DBs have • Architecturally, this means going back to the 2-tier model – might work in this case, unlikely to work in the general case © 2003-2004 Ontopia AS 25 http://www.ontopia.net/
  • 26. A different kind of information system • RSS is – a simple XML format for newsfeeds – probably the simplest useful XML application there is – probably the most widespread XML application • Today there are – tens of thousands of RSS feeds – lots of news aggregation sites using RSS – lots of desktop tools for reading RSS feeds directly © 2003-2004 Ontopia AS 26 http://www.ontopia.net/
  • 27. Information system? topicmaps.bond.edu.au <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE spec PUBLIC "-//W3C//DTD Specifi "http://www.w3.org/XML/1998/06/xmlspec-v21.dtd" [ <!--ArborText, Inc., 1988-2000, v.4002--> <!ENTITY http-ident "http://www.w3.org/TR/2000/ <!ENTITY draft.month "October"> <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE spec PUBLIC "-//W3C//DTD Specifi "http://www.w3.org/XML/1998/06/xmlspec-v21.dtd" [ <!--ArborText, Inc., 1988-2000, v.4002--> <!ENTITY http-ident "http://www.w3.org/TR/2000/ <!ENTITY draft.month "October"> <!ENTITY draft.day "6"> <!ENTITY draft.day "6"> <!ENTITY iso6.doc.date "20001006"> <!ENTITY iso6.doc.date "20001006"> <!ENTITY draft.year "2000"> <!ENTITY draft.year "2000"> <!ENTITY versionOfXML "1.0"> <!ENTITY versionOfXML "1.0"> <!ENTITY pio "'&lt;?xml'"> <!ENTITY pio "'&lt;?xml'"> <!ENTITY doc.date "10 February 1998"> <!ENTITY doc.date "10 February 1998"> <!ENTITY w3c.doc.date "02-Feb-1998"> <!ENTITY w3c.doc.date "02-Feb-1998"> <!ENTITY WebSGML "WebSGML Adaptations A <!ENTITY WebSGML "WebSGML Adaptations A <!ENTITY pic "'?>'"> <!ENTITY pic "'?>'"> <!ENTITY br "n"> <!ENTITY br "n"> <!ENTITY cellback "#c0d9c0"> <!ENTITY cellback "#c0d9c0"> <!ENTITY mdash "--"> <!ENTITY mdash "--"> <!ENTITY com "--"> <?xml version="1.0" encoding="iso-8859-1"?> <?xml version="1.0" encoding="iso-8859-1"?> <!ENTITY com "--"> <!ENTITY como "--"> <!ENTITY como "--"> <!DOCTYPE spec PUBLIC "-//W3C//DTD Specifi <!DOCTYPE spec PUBLIC "-//W3C//DTD Specifi <!ENTITY comc "--"> <!ENTITY comc "--"> "http://www.w3.org/XML/1998/06/xmlspec-v21.dtd" [ "http://www.w3.org/XML/1998/06/xmlspec-v21.dtd" [ Publishing <!ENTITY hcro "&amp;#x"> <!--ArborText, Inc., 1988-2000, v.4002--> <!--ArborText, Inc., 1988-2000, v.4002--> <!ENTITY hcro "&amp;#x"> <!ENTITY nbsp "&#160;"> <!ENTITY nbsp "&#160;"> <!ENTITY http-ident "http://www.w3.org/TR/2000/ <!ENTITY http-ident "http://www.w3.org/TR/2000/ <!ENTITY draft.month "October"> <!ENTITY draft.month "October"> <!ENTITY draft.day "6"> <!ENTITY draft.day "6"> <!ENTITY iso6.doc.date "20001006"> <!ENTITY iso6.doc.date "20001006"> <!ENTITY draft.year "2000"> <!ENTITY draft.year "2000"> <!ENTITY versionOfXML "1.0"> <!ENTITY versionOfXML "1.0"> <!ENTITY pio "'&lt;?xml'"> <!ENTITY pio "'&lt;?xml'"> application <!ENTITY doc.date "10 February 1998"> <!ENTITY doc.date "10 February 1998"> <!ENTITY w3c.doc.date "02-Feb-1998"> <!ENTITY w3c.doc.date "02-Feb-1998"> <!ENTITY WebSGML "WebSGML Adaptations A <!ENTITY WebSGML "WebSGML Adaptations A <!ENTITY pic "'?>'"> <!ENTITY pic "'?>'"> <!ENTITY br "n"> <!ENTITY br "n"> <!ENTITY cellback "#c0d9c0"> <!ENTITY cellback "#c0d9c0"> <!ENTITY mdash "--"> <!ENTITY mdash "--"> <!ENTITY com "--"> <!ENTITY com "--"> <!ENTITY como "--"> <!ENTITY como "--"> <!ENTITY comc "--"> <!ENTITY comc "--"> <!ENTITY hcro "&amp;#x"> <!ENTITY hcro "&amp;#x"> <!ENTITY nbsp "&#160;"> <!ENTITY nbsp "&#160;"> weblogs.rss weblogs.html bloogz.com weblogs.com User desktop RSS Web reader browser © 2003-2004 Ontopia AS 27 http://www.ontopia.net/
  • 28. Web services What they are The promise of web services © 2003-2004 Ontopia AS 28 http://www.ontopia.net/
  • 29. What is a web service, anyway? • Basically any software service made available over http – must be intended to be invoked by another piece of software – line is somewhat blurry: is Google a web service? MapQuest? • Two schools of thought: – REST holds that http + XML has all that is needed – the SOAP camp wants special protocols and standards • In practice we see both – REST is good because it fits seamlessly into the existing web – SOAP is good because it has better tool support • Make your choice based on what is important for you © 2003-2004 Ontopia AS 29 http://www.ontopia.net/
  • 30. SOAP • Essentially a wrapper for XML messages • Consists of – a header (with routing information etc) – a body (which holds the message) • Very little is defined in terms of message structure • Effectively, SOAP encapsulates XML, and you must figure out how to deal with the XML yourself – this is not very different from how HTTP would encapsulate XML © 2003-2004 Ontopia AS 30 http://www.ontopia.net/
  • 31. Web services and architecture Web service <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE spec PUBLIC "-//W3C//DTD Specification V2.1//EN" "http://www.w3.org/XML/1998/06/xmlspec-v21.dtd" [ <!--ArborText, Inc., 1988-2000, v.4002--> <!ENTITY http-ident "http://www.w3.org/TR/2000/REC-xml"> <!ENTITY draft.month "October"> <!ENTITY draft.day "6"> <!ENTITY iso6.doc.date "20001006"> <!ENTITY draft.year "2000"> http <!ENTITY versionOfXML "1.0"> <!ENTITY pio "'&lt;?xml'"> <!ENTITY doc.date "10 February 1998"> <!ENTITY w3c.doc.date "02-Feb-1998"> <!ENTITY WebSGML "WebSGML Adaptations Annex to ISO 8879"> <!ENTITY pic "'?>'"> <!ENTITY br "n"> <!ENTITY cellback "#c0d9c0"> <!ENTITY mdash "--"> <!ENTITY com "--"> <!ENTITY como "--"> <!ENTITY comc "--"> <!ENTITY hcro "&amp;#x"> <!ENTITY nbsp "&#160;"> <!ENTITY magicents "<code>amp</code>, <code>lt</code>, <code>gt</code>, <code>apos</code>, <code>quot</code>"> <!ENTITY doc.audience "public review and discussion"> <!ENTITY doc.distribution "may be distributed freely, as long as all text and legal notices remain intact"> ]> <spec w3c-doctype="rec"> XML © 2003-2004 Ontopia AS 31 http://www.ontopia.net/
  • 32. The promise of web services • Connect legacy applications • Create services anyone can connect to and use • Integrate disparate applications across the enterprise • Publish your service in a web service marketplace – people can find it using UDDI and bind to it dynamically with WSDL – you will, of course, charge them for this © 2003-2004 Ontopia AS 32 http://www.ontopia.net/
  • 33. A word of caution • We've heard all this before • CORBA was widely touted as doing the same thing in the '90s – applications connecting to each other over the net – CORBA as the enterprise-wide “bus” connecting all applications – directory services and dynamic service binding – component brokers and online trading • CORBA did the first, but not the last three – political, economic, and legal issues intruded – information integration turns out to be difficult – dynamic service binding was harder than anyone thought • In short, exposing services on the net works – be skeptical about the rest © 2003-2004 Ontopia AS 33 http://www.ontopia.net/
  • 34. Another caution • Integrating applications is not really the issue – what is necessary is to integrate the information – XML is about information, but it's not really designed for integration • XML has no notion of identity – no way to say when two elements represent the same thing – nothing tells you what to do when two elements do represent the same thing • Knowledge technologies are about identity – they have rules for identity and merging – better suited for information integration – thus also for application integration © 2003-2004 Ontopia AS 34 http://www.ontopia.net/
  • 35. What web services are, second try • Web services are an idea more than anything else • In some cases new technology makes it easier to apply • The idea is what matters, however – seeing the possibilities and trying to make use of them – which way you do it always matters less than doing it © 2003-2004 Ontopia AS 35 http://www.ontopia.net/
  • 36. Web services? topicmaps.bond.edu.au <?xml version="1.0" encoding="iso-8859-1"?> <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE spec PUBLIC "-//W3C//DTD Specifi <!DOCTYPE spec PUBLIC "-//W3C//DTD Specifi "http://www.w3.org/XML/1998/06/xmlspec-v21.dtd" [ "http://www.w3.org/XML/1998/06/xmlspec-v21.dtd" [ Publishing <!--ArborText, Inc., 1988-2000, v.4002--> <!--ArborText, Inc., 1988-2000, v.4002--> <!ENTITY http-ident "http://www.w3.org/TR/2000/ <!ENTITY http-ident "http://www.w3.org/TR/2000/ <!ENTITY draft.month "October"> <!ENTITY draft.month "October"> <!ENTITY draft.day "6"> <!ENTITY draft.day "6"> <!ENTITY iso6.doc.date "20001006"> <!ENTITY iso6.doc.date "20001006"> <!ENTITY draft.year "2000"> <!ENTITY draft.year "2000"> <!ENTITY versionOfXML "1.0"> <!ENTITY versionOfXML "1.0"> <!ENTITY pio "'&lt;?xml'"> <!ENTITY pio "'&lt;?xml'"> application <!ENTITY doc.date "10 February 1998"> <!ENTITY doc.date "10 February 1998"> <!ENTITY w3c.doc.date "02-Feb-1998"> <!ENTITY w3c.doc.date "02-Feb-1998"> <!ENTITY WebSGML "WebSGML Adaptations A <!ENTITY WebSGML "WebSGML Adaptations A <!ENTITY pic "'?>'"> <!ENTITY pic "'?>'"> <!ENTITY br "n"> <!ENTITY br "n"> <!ENTITY cellback "#c0d9c0"> <!ENTITY cellback "#c0d9c0"> <!ENTITY mdash "--"> <!ENTITY mdash "--"> <!ENTITY com "--"> <!ENTITY com "--"> <!ENTITY como "--"> <!ENTITY como "--"> <!ENTITY comc "--"> <!ENTITY comc "--"> <!ENTITY hcro "&amp;#x"> <!ENTITY hcro "&amp;#x"> <!ENTITY nbsp "&#160;"> <!ENTITY nbsp "&#160;"> weblogs.rss weblogs.html bloogz.com weblogs.com User desktop RSS Web reader browser © 2003-2004 Ontopia AS 36 http://www.ontopia.net/
  • 37. Common XML challenges Import/export Important groups of tools Validation Using XML databases © 2003-2004 Ontopia AS 37 http://www.ontopia.net/
  • 38. Deserialization • That is, building an object structure from XML • Usually involves some level of validation as well • Several ways to do this – use SAX, which is low-level but fast – use DOM, which is high-level and awful – use XPath, which lets you extract information easily – use a data binding tool © 2003-2004 Ontopia AS 38 http://www.ontopia.net/
  • 39. SAX • Standard for event-based parser APIs – passes the document to the application piece by piece – somewhat like staring at a parade through a keyhole – very fast, consumes no memory at all – suitable for applications where • documents may be big • documents require heavy processing • De-facto standard created by self-appointed group – supported by pretty much every parser there is – effectively the foundation for all XML work in Java – less standardized in other languages © 2003-2004 Ontopia AS 39 http://www.ontopia.net/
  • 40. DOM • Presents the document as an object structure • W3C Recommendation – widely supported and widely derided – in most programming languages better alternatives are found – in Java JDOM and XOM are good alternatives • Downsides – this approach requires the entire document to be loaded into memory – using an API is awkward, whether tree-based or event-based © 2003-2004 Ontopia AS 40 http://www.ontopia.net/
  • 41. SAX vs DOM • Or, rather, event-based vs tree-based – most XML technologies use one of these two approaches – understanding the difference is important in order to choose correctly • Essentially the difference is this – event-based solutions require less resources – however, they make many common operations too hard to be practical – tree-based solutions are slower and use more memory – but there is no limit on what you can do • Which approach is the right one depends on the requirements © 2003-2004 Ontopia AS 41 http://www.ontopia.net/
  • 42. XPath • A simple query language for XML – remarkably simple to learn given its expressive power – graph-traversal semantics • Simplifies extracting information from XML enormously – probably the single most important XML specification – used in query languages, mapping tools, schema languages, ... • Much less powerful than SQL – can't return structured results, only a list of values – limited support for handling reference relationships – no support for aggregate functions © 2003-2004 Ontopia AS 42 http://www.ontopia.net/
  • 43. Data binding tools • Tools that simplify serialization and deserialization – automate as much as possible of those tasks – some generate the object model for you – others let you map the XML to your object model • Most such tools have limitations – no support for mixed content – no support for element order – ignore comments, processing instructions, and entities – limited support for references • When suitable they can simplify development considerably – some event-based, others tree-based • The biggest hurdle is picking the right one and learning it © 2003-2004 Ontopia AS 43 http://www.ontopia.net/
  • 44. Validation • Validation is to ensure the correctness of incoming data – that every <person> has a <birth-date> – that every <birth-date> is a valid date – that every <death-date> is later than the <birth-date> – that the <birth-date> is actually correct – ... • These three constraints can be grouped into – structural constraints – type constraints – “semantic” constraints – “existential” constraint • Schema languages can be used to define the first two – application logic must usually be used for the semantic ones – the last category is for human beings © 2003-2004 Ontopia AS 44 http://www.ontopia.net/
  • 45. Schema languages • DTDs – part of XML 1.0, but only supports structural constraints – serious problem: the document says which schema to use • XML Schema – has both structural and type constraints – W3C Recommendation, widely supported and widely criticized • RELAX-NG – has very strong structural and type constraints – ISO standard, growing support, and widely praised • Schematron – weak structural and type constraints, strong on semantic constraints – constraints specified with XPath – about to become an ISO standard © 2003-2004 Ontopia AS 45 http://www.ontopia.net/
  • 46. Serialization • The opposite of deserialization: writing XML from objects • Straightforward, but some pitfalls – remember to quote special XML characters everywhere – handling character encodings correctly – handling namespaces correctly • Validation usually part of testing, but otherwise not an issue – one assumes the object structure is already valid • Again several ways to do it – use simple print statements, and do all the above yourself – use a SAX2XML tool, which will handle the above for you – build a DOM instance, then write it out (slow and awkward) – use a data binding tool (has limitations) © 2003-2004 Ontopia AS 46 http://www.ontopia.net/
  • 47. Importing XML to an RDBMS • A form of deserialization, but with issues of its own • Typical issues are – how to represent mixed content, if allowed – dealing with referential integrity – data typing – recognizing null values – validation • Again, there are many ways to do this – just hack it in – having an XML-to-OO mapper and an OO-to-RDBMS mapper – using a data binding tool © 2003-2004 Ontopia AS 47 http://www.ontopia.net/
  • 48. Writing XML from an RDBMS • A special kind of serialization • Much easier than going the other way • Main problem is matching the desired output format • Several tools to do this – template-based approaches where SQL is embedded in the XML – extensions to SQL that allow XML element constructors in SELECT – some allow XSLT transformations of the initial output © 2003-2004 Ontopia AS 48 http://www.ontopia.net/
  • 49. XQuery • The query language for XML databases in the future • Embeds XPath inside a functional programming language • Much more powerful than XPath • Progress on XQuery is slow, but language highly regarded – XPath 2.0, which is used in XQuery, has usability problems with the data typing approach taken • Likely to become an important tool in the future © 2003-2004 Ontopia AS 49 http://www.ontopia.net/
  • 50. SQL/XML • ISO SC32 is working on adding XML support to SQL – this involves columns whose data type is XML – one assumes XPath expressions can be applied to these – probably also support for XML output • RDBMS vendors are committed to this • SQL/XML is likely to be a key building block in the future – simplifies XML storage in databases – does not, however, remove the impedance mismatch • SQL/XML may well become an XQuery killer © 2003-2004 Ontopia AS 50 http://www.ontopia.net/
  • 51. Wrapping up What XML means for developers Resources to learn more © 2003-2004 Ontopia AS 51 http://www.ontopia.net/
  • 52. XML and software development • The possibilities for interchange and integration are not new – XML makes them easier to achieve – XML makes us think of these possibilities in ways we didn't before • In practice, this means more work for developers – new lists of acronyms to learn and master – new kinds of tasks compared to earlier • XML makes life harder, but it's worth it © 2003-2004 Ontopia AS 52 http://www.ontopia.net/
  • 53. Where to learn more • http://www.xml.com • http://www.cafeconleche.org • The XML-DEV mailing list • http://www.w3.org/TR/ • “Definitive XML Application Development” by me, published by Prentice-Hall © 2003-2004 Ontopia AS 53 http://www.ontopia.net/