10. 10 http://www.w3c.it/talks/2005/openCulture/slide7-0.html
Thursday 31 March 2011
11. Thursday 31 March 2011
11
http://en.wikipedia.org/wiki/List_of_XML_markup_languages
12. XML is not ...
• Extension of HTML
• XHTML is XML-compliant, and extensible
• Just for Web pages
• Useful when data are stored or exchanged
• Concerned with semantics
• XML does not define semantics, just syntax
• Innovative new technology
• Standard, building on existing technology
• Only a hype
• Though also
Thursday 31 March 2011
12
13. XML is ...
• Endorsed by W3C and major companies
• Extensible
• No tag name limitations
• No language limitations
• Human software developer-readable
• Can be processed with basic text tools
• Open standard
• no vendor lock-in (in theory...)
• Easy to implement
• powerful, cheap (free), off-the-shelf XML tools
Thursday 31 March 2011
13
14. when was XML invented?
14
Thursday 31 March 2011
15. • 1969: SGML (Standard Generalized Markup Language)
• Meta-language: describe other languages
• Powerful, but rather complicated
• 1986: ISO standard
• 1992: HTML (HyperText Markup Language)
• Based on SGML
• Simple, but limited
• 1996: Start design of XML
• By World Wide Web Consortium (W3C)
• 1998: Publication of XML 1.0
15
Thursday 31 March 2011
16. Design Goals
• Easy to use over the Internet
• Power of SGML
• Simplicity of HTML
• Human-legible
• Easy to create
• Compactness is not an issue
• “The ASCII of the Web”
16
Thursday 31 March 2011
18. XML Basics
<Person>
<Name>
<First>Thomas</First>
<Last>Atkinson</Last>
</Name>
<Age>30</Age>
</Person>
• Self-defined, meaningful tags
• Separate data and its representation
18
Thursday 31 March 2011
19. • Language for defining syntax
• Records and fields have explicit boundaries
• parse-able without knowing structure (self-descriptive)
• Unicode support (UTF-8, UTF-16, ...)
• Web-aware
• DTD, ENTITY and Schema can be loaded through URL
• Strictly parsed: no ambiguity (case sensitive!)
• Extensible: namespaces
19
Thursday 31 March 2011
20. <?xml version="1.0” encoding=“UTF-8”?>
<!-- processing instruction: XML follows -->
<!DOCTYPE addressbook SYSTEM
"http://www/~koenh/ddml/addressbook.dtd”>
<!-- Document Type Declaration... -->
<!-- ExternalDTDPointer -->
<addressbook> <!--root element -->
<person first-name="John" family-name="Doe”
employee-number="1234">
<contact-info>
<email address="Jdoe@home.com"/>
</contact-info>
<address street="Celestijnenlaan”
number="200A"/>
</person></addressbook>
20
Thursday 31 March 2011
21. <H1 align=”center” > a Heading </H1>
attribute
opening closing
content
tag tag
element
21
Thursday 31 March 2011
22. • Cfr. HTML markup tags
<H1 align=”center” > a Heading </H1>
attribute
opening closing
content
tag tag
element
21
Thursday 31 March 2011
23. • Cfr. HTML markup tags
<H1 align=”center” > a Heading </H1>
attribute
opening closing
content
tag tag
element
21
Thursday 31 March 2011
24. • Cfr. HTML markup tags
<H1 align=”center” > a Heading </H1>
attribute
opening closing
content
tag tag
element
21
Thursday 31 March 2011
25. • Cfr. HTML markup tags
<H1 align=”center” > a Heading </H1>
attribute
opening closing
content
tag tag
element
21
Thursday 31 March 2011
26. • Cfr. HTML markup tags
<H1 align=”center” > a Heading </H1>
attribute
opening closing
content
tag tag
element
21
Thursday 31 March 2011
27. • Cfr. HTML markup tags
<H1 align=”center” > a Heading </H1>
attribute
opening closing
content
tag tag
element
21
Thursday 31 March 2011
28. • Cfr. HTML markup tags
<H1 align=”center” > a Heading </H1>
attribute
opening closing
content
tag tag
element
• Major differences:
• Case sensitive
• Proper nesting: No <A> … <B> … </A> … </B>
• Unicode instead of ASCII
21
Thursday 31 March 2011
29. Vocabularies
• Agreed-upon XML tag sets for specific domain
• Examples
• Chemical Markup Language (CML)
• Business: ebXML, RosettaNet, BizTalk
• Mathematics: MathML
• Multimedia: Synchronized Multimedia Integration Language (SMIL)
• Etc.
22
Thursday 31 March 2011
30. • well-formed: follows XML syntax
• Proper tag and attribute names
• Tags properly closed
• Attributes and text between tags do not contain
‘<‘ (escape with <)
• valid: well-formed and vocabulary
• All elements and their attributes declared in DTD
• Attribute values follow DTD type declarations
• CDATA, ID, IDREF, IDREFS, NMTOKEN, NMTOKENS, enumerated
• Nesting and sequencing of elements follows DTD
23
Thursday 31 March 2011
31. Elements
• XML’s container for
• Attributes
• Character data
• Other elements (“child” elements)
• Delimited by opening and closing tags
• Non-empty element:
<name>..</name>
• Empty element:
<name/>
• Form a simple hierarchic tree
• Root = “document element”
24
Thursday 31 March 2011
32. Attributes and Strings
• Attributes
• Name-value pairs: name=value
• Only strings as value!
• Strings
• Enclosed by ‘...’ or “...”
→ replace with ' or "
• Character data
• Any text that is not markup
• ‘&’, ‘<’ and ‘>’ are markup
→ replace with & < and >
25
Thursday 31 March 2011
33. Document structure
• Prolog (optional)
• <?xml version="1.0” encoding=“UTF-8”?>
• (compulsory)
version="number"
•
encoding="character encoding" (optional)
• Document type declaration
• <!DOCTYPE document_element ... >
• Body
– The document element
26
Thursday 31 March 2011
34. Another example
<?xml version="1.0" standalone="no"?>
<!DOCTYPE BankAccounts ...>
<!-- This is an example XML document -->
<BankAccounts>
<Account accountNr="123-456789-01" use="personal">
<Owners> <Person ID="1258-a8d72-98">
<Name>John Smith</Name></Person>
<Person ID="5842-df5ef-e9">
<Name>Claudia Scott</Name></Person>
</Owners>
<CreditCards><CreditCard number="12345"/></CreditCards>
<Balance Currency="EUR">50000</Balance>
</Account>
...
</BankAccounts> 27
Thursday 31 March 2011
35. namespaces: problem
<widget type="gadget">
<head size="medium"/>
<big><subwidget ref="gizmo"/></big>
<info>
<head><title>Gadget</title></head>
<body><h1>Gadget</h1>
A gadget contains a big gizmo
</body> Name collision!
</info>
</widget> 28
Thursday 31 March 2011
37. namespaces: approach
• A collection of names, identified by a URI
reference, which are used in XML documents as
element types and attribute names
•xmlns:prefix="URI"
• URI used only as identifier
• does not need to point to anything
• applies to all nested elements and attributes
30
Thursday 31 March 2011
41. how would you process XML?
34
Thursday 31 March 2011
42. Accessing XML documents
• Manual text file manipulation
• Cumbersome & Error-prone
• Parser
• Simplifies document manipulation
• Ensures proper grammar, well-formedness
• Abstracts content from grammar
• Accessed through standard API
• Document Object Model (DOM)
• Simple API for XML (SAX)
35
Thursday 31 March 2011
43. • DOM parser
• create DOM object tree
• SAX parser
• generates events when elements encountered
• one-pass translation
• no need to keep whole document tree in memory
• Both can be validating or non-validating
• Many available
(most freeware, open source)
• ibm xml4j, apache xerces, sun parser, microsoft,
datachannel, oracle, ...
36
Thursday 31 March 2011
44. DOM approach
http://java.sun.com/xml/jaxp/dist/1.1/docs/tutorial/overview/3_apis.html#JAXP
37
Thursday 31 March 2011
45. Dom Benefits & Drawbacks
• Benefits
• W3C Recommendation
• Language- and platform-independent
• Random access
• Intuitive
• Drawback
• Entire object tree in memory
38
Thursday 31 March 2011
46. Simple API for XML (SAX)
• Not an official standard
• Ad-hoc product by XML developers
• Primarily Java API
• Event-based mechanism
• Don’t call the parser, the parser calls you
• No object model in memory
• Programmer must keep state information
39
Thursday 31 March 2011
48. SAX Benefits & Drawbacks
• Benefits
• Suitable when
• parsing large documents
• constructing proprietary object structures
• only small subset of information is needed
• Simple and fast
• Drawbacks
• Read-only
• No random access
• Complex searches messy to program
Thursday 31 March 2011
41
49. how to define valid instances?
42
Thursday 31 March 2011
50. XML Schema
• typering van waarden
• vb. integer, string, enz.
• ook beperkingen op min/max waarden
• types door gebruiker gedefinieerd
• is gespecificeerd in XML syntax,
• meer gestandaardiseerde voorstelling
• is geïntegreerd met namespaces
• en nog andere mogelijkheden
• lijst types, uniciteitsbeperking op sleutels,
verwijssleutelbeperkingen, overerving,…
43
Thursday 31 March 2011
51. XSDL
• XML Schema Definition Language
• documenten met suffix .xsd
44
Thursday 31 March 2011
52. XML Schema: voorbeeld
XML schema
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
....
<xsd:element name="PWORKER" minOccurs="0" maxOccurs="unbounded">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="HOURS" type="xsd:float"/>
</xsd:sequence>
<xsd:attribute name="SSN" type="xsd:IDREF" use="required"/>
</xsd:complexType>
</xsd:element>
....
</xsd:schema>
XML instantie
<PWORKER SSN="_123456789">
<HOURS>7.5</HOURS>
</PWORKER> 45
Thursday 31 March 2011
53. XML: eenvoudige types
– ingebouwde eenvoudige types
• string, integer, decimal, float, boolean, date, time,…
• <xsd:element name=“gebdat” type=“xsd:date” />
– door gebruiker gedefinieerde eenvoudige types
• gedefinieerd met simpleType element
• restriction element geeft het basistype waarop gesteund is
• <xsd:simpleType name=“salaryRange”>
<xsd:restriction base=“xsd:integer”>
<xsd:minInclusive value=“25000” />
<xsd:maxInclusive value=“100000” />
</xsd:restriction>
</xsd:simpleType>
46
Thursday 31 March 2011
66. XML family of technologies
• Xlink: hypertext
• XSL: Extensible Style Sheet Language
• XSL-T Transformation
• Formatting Objects
• Xschema: additional constraints on attribute types
• and more...
59
Thursday 31 March 2011
67. XML applications
• RDF: Resource Description Framework
• infra
• XHTML: eXtensible HTML en HTML5
• XML compliant HTML
• MathML
• SMILE: synchronized multimedia presentation
• Many others
• Chemical Markup Language,Vector Graphics Markup Language, Open Software
Description Format, Weather observation, astronomical data, financial data,
electronic components, workflow, business cards, real estate, newspaper,
classifieds, javadoc, human resource, advertising, architecture ….
60
Thursday 31 March 2011
68. More XPath Features
• Operator “|” used to implement union
• E.g. //EMPLOYEE[count(DEPENDENT) = 1] | //EMPLOYEE[not(DEPENDENT)]
• gives employees with either 0 or 1 dependents
• “//” can be used to skip multiple levels of nodes
• E.g. /COMPANY//FNAME
• finds any FNAME element anywhere under the /COMPANY element, regardless of the
element in which it is contained.
• A step in the path can go to:
parents, siblings, ancestors and descendants
of the nodes generated by the previous step, not just to the children
• “//”, described above, is a short from for specifying “all descendants”
• “..” specifies the parent.
• e.g. : /COMPANY//FNAME/../BDATE
61
Thursday 31 March 2011
69. XQuery
• laat toe om meer algemene queries te formuleren dan XPath
• algemene vorm: FLWOR uitdrukking
FOR
< for-variabele > IN < in-uitdrukking >
LET
< let-variabele > := < let-uitdrukking >
[ WHERE
< filter-uitdrukking > ]
[ ORDER BY
< orde-specificatie > ]
RETURN
uitdrukking >
<
• opm: FOR en LET kunnen alleen of samen voorkomen
62
Thursday 31 March 2011
70. • Q1: voornaam en familienaam van alle werknemers die meer
dan 70000 verdienen
• FOR $x IN doc(www.company.com/info.xml)
// employee [employeeSalary > 70000] / employeeName
RETURN < res > $x / firstName, $x / lastName </ res >
• alternatief:
FOR $x IN doc(www.company.com/info.xml)
company / employee
WHERE $x / employeeSalary > 70000
RETURN < res > $x / employeeName / firstName,
$x / employeeName / lastName </ res >
63
Thursday 31 March 2011
71. • Q3: voornaam en familienaam van alle werknemers die meer
dan 20 uur op project nummer 5 werken, met dat aantal uren
• FOR $x IN doc(www.company.com/info.xml)
/ company / project [projectNumber = 5] / projectWorker ,
$y IN doc(www.company.com/info.xml) / company /
employee
WHERE $x/hours > 20.0 AND $y.ssn = $x.ssn
RETURN < res > $y / employeeName / firstName,
$y / employeeName / lastName,
$x / hours </ res >
64
Thursday 31 March 2011
72. • XML
• NoSQL (Met dank aan Steven Noels)
65
Thursday 31 March 2011
75. select
fun, profit
from
real_world
where
relational=false;
68
Thursday 31 March 2011
76. NoSQL
• problems with existing relational approach for
Amazon (Dynamo) and Google (BigTable)
• flexibility, performance, scaling, cost
• millions of users
• application changes rolled out
incrementally without downtime
• now more broadly applicable (velcro)
• Open source developments:
Facebook,Yahoo! - Cassandra, Hadoop,
MapReduce, Hive, Pig
69
Thursday 31 March 2011
88. no attempt to ACID
• Atomicity
• Consistency
• Isolation
• Durability
• BASE: trade ACID off in favor of high availability
78
Thursday 31 March 2011