8. 8 http://www.w3c.it/talks/2005/openCulture/slide7-0.html
Sunday 30 May 2010
9. Sunday 30 May 2010
9
http://en.wikipedia.org/wiki/List_of_XML_markup_languages
10. XML is not ...
• Extension of HTML
• XHTML is XML-compliant, and extensible
• Just for Web pages
• Useful when data are stored or exchanged
• Concerned with semantics
• XML does not define semantics, just syntax
• Innovative new technology
• Standard, building on existing technology
• Only a hype
• Though also
Sunday 30 May 2010
10
11. XML is ...
• Endorsed by W3C and major companies
• Extensible
• No tag name limitations
• No language limitations
• Human software developer-readable
• Can be processed with basic text tools
• Open standard
• no vendor lock-in (in theory...)
• Easy to implement
• powerful, cheap (free), off-the-shelf XML tools
Sunday 30 May 2010
11
12. • 1969: SGML (Standard Generalized Markup Language)
• Meta-language: describe other languages
• Powerful, but rather complicated
• 1986: ISO standard
• 1992: HTML (HyperText Markup Language)
• Based on SGML
• Simple, but limited
• 1996: Start design of XML
• By World Wide Web Consortium (W3C)
• 1998: Publication of XML 1.0
12
Sunday 30 May 2010
13. Design Goals
• Easy to use over the Internet
• Power of SGML
• Simplicity of HTML
• Human-legible
• Easy to create
• Compactness is not an issue
• “The ASCII of the Web”
13
Sunday 30 May 2010
14. XML Basics
<Person>
<Name>
<First>Thomas</First>
<Last>Atkinson</Last>
</Name>
<Age>30</Age>
</Person>
• Self-defined, meaningful tags
• Separate data and its representation
14
Sunday 30 May 2010
15. • Language for defining syntax
• Records and fields have explicit boundaries
• parse-able without knowing structure (self-descriptive)
• Unicode support (UTF-8, UTF-16, ...)
• Web-aware
• DTD, ENTITY and Schema can be loaded through URL
• Strictly parsed: no ambiguity (case sensitive!)
• Extensible: namespaces
15
Sunday 30 May 2010
16. <?xml version="1.0” encoding=“UTF-8”?>
<!-- processing instruction: XML follows -->
<!DOCTYPE addressbook SYSTEM
"http://www/~koenh/ddml/addressbook.dtd”>
<!-- Document Type Declaration... -->
<!-- ExternalDTDPointer -->
<addressbook> <!--root element -->
<person first-name="John" family-name="Doe”
employee-number="1234">
<contact-info>
<email address="Jdoe@home.com"/>
</contact-info>
<address street="Celestijnenlaan”
number="200A"/>
</person></addressbook>
16
Sunday 30 May 2010
17. <H1 align=”center” > a Heading </H1>
attribute
opening closing
content
tag tag
element
17
Sunday 30 May 2010
18. • Cfr. HTML markup tags
<H1 align=”center” > a Heading </H1>
attribute
opening closing
content
tag tag
element
17
Sunday 30 May 2010
19. • Cfr. HTML markup tags
<H1 align=”center” > a Heading </H1>
attribute
opening closing
content
tag tag
element
17
Sunday 30 May 2010
20. • Cfr. HTML markup tags
<H1 align=”center” > a Heading </H1>
attribute
opening closing
content
tag tag
element
17
Sunday 30 May 2010
21. • Cfr. HTML markup tags
<H1 align=”center” > a Heading </H1>
attribute
opening closing
content
tag tag
element
17
Sunday 30 May 2010
22. • Cfr. HTML markup tags
<H1 align=”center” > a Heading </H1>
attribute
opening closing
content
tag tag
element
17
Sunday 30 May 2010
23. • Cfr. HTML markup tags
<H1 align=”center” > a Heading </H1>
attribute
opening closing
content
tag tag
element
17
Sunday 30 May 2010
24. • Cfr. HTML markup tags
<H1 align=”center” > a Heading </H1>
attribute
opening closing
content
tag tag
element
• Major differences:
• Case sensitive
• Proper nesting: No <A> … <B> … </A> … </B>
• Unicode instead of ASCII
17
Sunday 30 May 2010
25. Vocabularies
• Agreed-upon XML tag sets for specific domain
• Examples
• Chemical Markup Language (CML)
• Business: ebXML, RosettaNet, BizTalk
• Mathematics: MathML
• Multimedia: Synchronized Multimedia Integration Language (SMIL)
• Etc.
18
Sunday 30 May 2010
26. • well-formed: follows XML syntax
• Proper tag and attribute names
• Tags properly closed
• Attributes and text between tags do not contain
‘<‘ (escape with <)
• valid: well-formed and vocabulary
• All elements and their attributes declared in DTD
• Attribute values follow DTD type declarations
• CDATA, ID, IDREF, IDREFS, NMTOKEN, NMTOKENS, enumerated
• Nesting and sequencing of elements follows DTD
19
Sunday 30 May 2010
27. Elements
• XML’s container for
• Attributes
• Character data
• Other elements (“child” elements)
• Delimited by opening and closing tags
• Non-empty element:
<name>..</name>
• Empty element:
<name/>
• Form a simple hierarchic tree
• Root = “document element”
20
Sunday 30 May 2010
28. Attributes and Strings
• Attributes
• Name-value pairs: name=value
• Only strings as value!
• Strings
• Enclosed by ‘...’ or “...”
→ replace with ' or "
• Character data
• Any text that is not markup
• ‘&’, ‘<’ and ‘>’ are markup
→ replace with & < and >
21
Sunday 30 May 2010
29. Document structure
• Prolog (optional)
• <?xml version="1.0” encoding=“UTF-8”?>
• (compulsory)
version="number"
•
encoding="character encoding" (optional)
• Document type declaration
• <!DOCTYPE document_element ... >
• Body
– The document element
22
Sunday 30 May 2010
30. Another example
<?xml version="1.0" standalone="no"?>
<!DOCTYPE BankAccounts ...>
<!-- This is an example XML document -->
<BankAccounts>
<Account accountNr="123-456789-01" use="personal">
<Owners> <Person ID="1258-a8d72-98">
<Name>John Smith</Name></Person>
<Person ID="5842-df5ef-e9">
<Name>Claudia Scott</Name></Person>
</Owners>
<CreditCards><CreditCard number="12345"/></CreditCards>
<Balance Currency="EUR">50000</Balance>
</Account>
...
</BankAccounts> 23
Sunday 30 May 2010
31. Document Type Definition
<!ELEMENT address EMPTY>
<!-- no content, used for attributes only -->
<!ATTLIST address city CDATA #REQUIRED
<!-- character data: any string -->
<!-- value for that attribute must be present -->
state NMTOKEN #REQUIRED
<!-- name token: letters, numbers, ., -, _ and : only -->
number CDATA #REQUIRED
street CDATA #REQUIRED>
<!ELEMENT addressbook (person+)>
<!-- 1 or more -->
<!ELEMENT contact-info
(home-phone|mobile-phone|email)*>
<!-- choice -->
<!-- o or more -->
24
Sunday 30 May 2010
33. Document Type Definition
<!ELEMENT manager EMPTY>
<!ATTLIST manager empnumber IDREF #REQUIRED>
<!-- reference to empnumber of person -->
<!ELEMENT person (contact-info,address,
job-info?,manager?,misc-info?)>
<!-- sequence -->
<!-- zero or one -->
<!ATTLIST person first-name CDATA #REQUIRED
middle-initial CDATA #IMPLIED
<!-- can, but need not be provided -->
employee-number ID #REQUIRED
<!-- can be referred to by manager.empnumber -->
family-name CDATA #REQUIRED>
26
Sunday 30 May 2010
34. namespaces: problem
<widget type="gadget">
<head size="medium"/>
<big><subwidget ref="gizmo"/></big>
<info>
<head><title>Gadget</title></head>
<body><h1>Gadget</h1>
A gadget contains a big gizmo
</body> Name collision!
</info>
</widget> 27
Sunday 30 May 2010
35. namespaces: approach
• A collection of names, identified by a URI
reference, which are used in XML documents as
element types and attribute names
•xmlns:prefix="URI"
• URI used only as identifier
• does not need to point to anything
• applies to all nested elements and attributes
28
Sunday 30 May 2010
39. Accessing XML documents
• Manual text file manipulation
• Cumbersome & Error-prone
• Parser
• Simplifies document manipulation
• Ensures proper grammar, well-formedness
• Abstracts content from grammar
• Accessed through standard API
• Document Object Model (DOM)
• Simple API for XML (SAX)
32
Sunday 30 May 2010
40. • DOM parser
• create DOM object tree
• SAX parser
• generates events when elements encountered
• one-pass translation
• no need to keep whole document tree in memory
• Both can be validating or non-validating
• Many available
(most freeware, open source)
• ibm xml4j, apache xerces, sun parser, microsoft,
datachannel, oracle, ...
33
Sunday 30 May 2010
41. DOM approach
http://java.sun.com/xml/jaxp/dist/1.1/docs/tutorial/overview/3_apis.html#JAXP
34
Sunday 30 May 2010
42. DOM Node Tree
Doc
<?xml version="1.0"?>
Com An example XML document
<!-- An example XML document -->
El BankAccounts
<BankAccounts>
El Account
<Account accountNr="123-456789-01“>
<Owner ID="1258-a8d72-98"> Att accountNr = “123-456789-01”
John Smith
El Owner = “John Smith”
</Owner>
<Balance Currency="EUR"> Att ID = “1258-a8d72-98”
50000
El Balance = “50000”
</Balance>
</Account> Att Currency = “Eur”
<Account ...>
... El Account
</BankAccounts> ...
35
Sunday 30 May 2010
43. parsing: DOM
public void print(Node node) {
...
NodeList nlist=node.getChildNodes();
if (nlist != null) {
int l = nlist.getLength();
for (int i=0; i<l; i++) {
print(nlist.item(i));
...
}...}...}
36
Sunday 30 May 2010
44. Dom Benefits & Drawbacks
• Benefits
• W3C Recommendation
• Language- and platform-independent
• Random access
• Intuitive
• Drawback
• Entire object tree in memory
37
Sunday 30 May 2010
45. Simple API for XML (SAX)
• Not an official standard
• Ad-hoc product by XML developers
• Primarily Java API
• Event-based mechanism
• Don’t call the parser, the parser calls you
• No object model in memory
• Programmer must keep state information
38
Sunday 30 May 2010
49. • Start and end of document
– startDocument()
– endDocument()
• Start and end of element
– startElement(namespace, name, qname, attlist)
– endElement(namespace, name, qname)
• Character data
– characters(char[] ch, int start, int length)
• Processing Instruction
– processingInstruction(target, data)
• No event for comments!
Sunday 30 May 2010
42
50. Another SAX example
<?xml version="1.0" standalone="no"?>
<!DOCTYPE BankAccounts ...>
<!-- This is an example XML document -->
<BankAccounts>
<Account accountNr="123-456789-01" use="personal">
<Owners>
<Person ID="1258-a8d72-98"><Name>John Smith</Name></Person>
<Person ID="5842-df5ef-e9"><Name>Claudia Scott</Name></Person>
</Owners>
<CreditCards><CreditCard number="12345"/></CreditCards>
<Balance Currency="EUR">50000</Balance>
</Account>
...
</BankAccounts> 43
Sunday 30 May 2010
51. public class AvgBalanceCalculator extends DefaultHandler
{private double total = 0.0;
private int count = 0;
private boolean isBalance = false;
public void startElement(String uri, String name, String qname, Attributes atts)
{if (name.equals(“Balance")) {
isBalance = true;
count++; }}
public void characters(char[] ch, int start, int len) throws SaxException
{if (isBalance) {
String help = new String(ch, start, len);
double balance = (new double(help)).doubleValue();
total = total + balance;
isBalance = false; }}
public void endDocument()
{if (count != 0)
System.out.println(“Average balance is ”+(total/count));
}
}
44
Sunday 30 May 2010
52. SAX Benefits & Drawbacks
• Benefits
• Suitable when
• parsing large documents
• constructing proprietary object structures
• only small subset of information is needed
• Simple and fast
• Drawbacks
• Read-only
• No random access
• Complex searches messy to program
Sunday 30 May 2010
45
53. beperkingen van DTDs
• geen typering van tekst elementen en attributen
• alle waarden zijn strings, geen integers, reals, enz.
• ongeordende verzameling van subelementen moeilijk te definiëren
• orde is meestal irrelevant in gegevensbanken
• IDs en IDREFs zijn niet getypeerd
• het DNO attribuut van een EMPLOYEE kan een referentie bevatten aan een andere
EMPLOYEE, wat zinloos is
vb. <EMPLOYEE SSN="_888665555 " SEX="M" DNO="_888665555 ">
• het DNO attribuut zou als beperking moeten hebben dat het slechts aan een
DEPARTMENT kan refereren
46
Sunday 30 May 2010
54. XML Schema
• typering van waarden
• vb. integer, string, enz.
• ook beperkingen op min/max waarden
• types door gebruiker gedefinieerd
• is gespecificeerd in XML syntax,
• meer gestandaardiseerde voorstelling
• is geïntegreerd met namespaces
• en nog andere mogelijkheden
• lijst types, uniciteitsbeperking op sleutels,
verwijssleutelbeperkingen, overerving,…
47
Sunday 30 May 2010
55. XSDL
• XML Schema Definition Language
• documenten met suffix .xsd
48
Sunday 30 May 2010
56. XML Schema: voorbeeld
XML schema
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
....
<xsd:element name="PWORKER" minOccurs="0" maxOccurs="unbounded">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="HOURS" type="xsd:float"/>
</xsd:sequence>
<xsd:attribute name="SSN" type="xsd:IDREF" use="required"/>
</xsd:complexType>
</xsd:element>
....
</xsd:schema>
XML instantie
<PWORKER SSN="_123456789">
<HOURS>7.5</HOURS>
</PWORKER> 49
Sunday 30 May 2010
57. XML: eenvoudige types
– ingebouwde eenvoudige types
• string, integer, decimal, float, boolean, date, time,…
• <xsd:element name=“gebdat” type=“xsd:date” />
– door gebruiker gedefinieerde eenvoudige types
• gedefinieerd met simpleType element
• restriction element geeft het basistype waarop gesteund is
• <xsd:simpleType name=“salaryRange”>
<xsd:restriction base=“xsd:integer”>
<xsd:minInclusive value=“25000” />
<xsd:maxInclusive value=“100000” />
</xsd:restriction>
</xsd:simpleType>
50
Sunday 30 May 2010
69. XML family of technologies
• Xlink: hypertext
• XSL: Extensible Style Sheet Language
• XSL-T Transformation
• Formatting Objects
• Xschema: additional constraints on attribute types
• and more...
62
Sunday 30 May 2010
70. XML applications
• RDF: Resource Description Framework
• infra
• XHTML: eXtensible HTML en HTML5
• XML compliant HTML
• MathML
• SMILE: synchronized multimedia presentation
• Many others
• Chemical Markup Language,Vector Graphics Markup Language, Open Software
Description Format, Weather observation, astronomical data, financial data,
electronic components, workflow, business cards, real estate, newspaper,
classifieds, javadoc, human resource, advertising, architecture ….
63
Sunday 30 May 2010
71. XML Working Groups
• XML Coordination
• XML Core
• XSL (XSLT, XSL/FO) -> W3C architecture
• Efficient XML Interchange
• XML Processing Model
• XML Query (XQuery, XPath)
• XML Schema
• Service Modeling Language (SML)
64
Sunday 30 May 2010
72. More XPath Features
• Operator “|” used to implement union
• E.g. //EMPLOYEE[count(DEPENDENT) = 1] | //EMPLOYEE[not(DEPENDENT)]
• gives employees with either 0 or 1 dependents
• “//” can be used to skip multiple levels of nodes
• E.g. /COMPANY//FNAME
• finds any FNAME element anywhere under the /COMPANY element, regardless of the
element in which it is contained.
• A step in the path can go to:
parents, siblings, ancestors and descendants
of the nodes generated by the previous step, not just to the children
• “//”, described above, is a short from for specifying “all descendants”
• “..” specifies the parent.
• e.g. : /COMPANY//FNAME/../BDATE
65
Sunday 30 May 2010
73. XQuery
• laat toe om meer algemene queries te formuleren dan XPath
• algemene vorm: FLWOR uitdrukking
FOR
< for-variabele > IN < in-uitdrukking >
LET
< let-variabele > := < let-uitdrukking >
[ WHERE
< filter-uitdrukking > ]
[ ORDER BY
< orde-specificatie > ]
RETURN
uitdrukking >
<
• opm: FOR en LET kunnen alleen of samen voorkomen
66
Sunday 30 May 2010
74. • Q1: voornaam en familienaam van alle werknemers die meer
dan 70000 verdienen
• FOR $x IN doc(www.company.com/info.xml)
// employee [employeeSalary > 70000] / employeeName
RETURN < res > $x / firstName, $x / lastName </ res >
• alternatief:
FOR $x IN doc(www.company.com/info.xml)
company / employee
WHERE $x / employeeSalary > 70000
RETURN < res > $x / employeeName / firstName,
$x / employeeName / lastName </ res >
67
Sunday 30 May 2010
75. • Q3: voornaam en familienaam van alle werknemers die meer
dan 20 uur op project nummer 5 werken, met dat aantal uren
• FOR $x IN doc(www.company.com/info.xml)
/ company / project [projectNumber = 5] / projectWorker ,
$y IN doc(www.company.com/info.xml) / company /
employee
WHERE $x/hours > 20.0 AND $y.ssn = $x.ssn
RETURN < res > $y / employeeName / firstName,
$y / employeeName / lastName,
$x / hours </ res >
68
Sunday 30 May 2010
76. The End...
Bedankt!
Vragen...?
69
Sunday 30 May 2010
77. NoSQL
• non-relational
• distributed
• open source
• horizontally scalable
• “web scale”
70
Sunday 30 May 2010
78. NoSQL
• non-relational
• schema free
• distributed
• easy replication
• open source
• simple API
• horizontally scalable
• BASE (not ACID)
• “web scale”
70
Sunday 30 May 2010