2. What is XML?
A markup language like HTML
Stands for Xstensible Markup Language
Designed to store data, not display
Tags not predefined – one must define own tags and
document structure
Uses DTD/Schema to describe data
XML document, as well as DTD or XML Schema are designed to
be self-descriptive
XML is a W3C Recommendation
3. Difference with HTML?
XML designed to structure, store and transport data
Focus on what data is
HTML designed to display data – focus on how data looks
i.e. HTML is about displaying information, while XML is about
describing info
HTML uses predefined tags <h1>, <p> <div>, while XML the
author must define both the tags and the document
structure
XML is just information wrapped in tags, not designed to DO
anything
5. XML is Free & Extensible
Most XML applications will work as expected even if new data is
added (or removed)
For example, a newer version of note.xml with added <date> and
<hour> elements, and a removed <heading>
<note>
<date>2015-09-01</date>
<hour>08:30</hour>
<to>Amit</to>
<from>Bobby</from>
<body>Meet me during this weekend!</body>
</note>
Older versions of applications will still work using new note.xml
6. Seggregate data from presentation
With HTML, actual data is stored inside HTML along with
presentation / styling elements
With XML, data is stored in separate files, without any
information on how to display it
Same data can be displayed differently under different
scenarios
HTML can be used with XML to display data
One should not have to edit the HTML file when the XML data
changes
With a few lines of JavaScript code, one can read an XML file and
update the data content of any HTML page.
XML can also be stored inside HTML as data islands
7. Uses of XML
Simplifies exchanging data among incompatible systems
Stores data in plain text format - a software & hardware
independent way of storing, transporting, and sharing data
Makes platform changes easier
One can upgrade to new operating systems, new applications, or
new browsers, without losing data
Simplifies data availability
With XML, data can be available to all kinds of "reading machines"
like people, computers, voice machines, news feeds etc.
Converting data to XML can reduce complexity of exchanging
data between incompatible systems , and create data that can
be read by many applications
8. Uses of XML
Can also be used to store data in database files
Generic applications can be written to store & retrieve data from
data stores and display the same
Other applications can access XML files as data sources, as if they
were accessing databases
Can be used to create new languages like WAP / WML
Wireless Markup Languages used to mark up internet applications
for handheld devices
.
XML Syntax easy to learn and use
9. Components of XML
XML is an ASCII text file, with .xml extension
Main components
Elements
Attributes
Content
Comment
10. Elements
Basic building block of XML document
Each element represents a piece of data identified by tag(s)
Most tags in pair, a start tag at the beginning and an end tag placed
at the end of data
One can have a hierarchical structure by nesting elements
Elements can contain text, attributes, other elements, mix
Elements that contain data embedded with start and end tags –
container elements, while information represents by elements
called content
<Univ> Jadavpur </Univ> Jadavpur content, <Univ> container
Empty elements do not contain data/content – do not come in pairs
<tagname/> instead of <tagname></tagname> self-closing tags
Example, <br/>
11. Attributes
Provide additional information about elements
Each attribute has a name and value
Value could be number, string, URL
Attribute values must always be enclosed in quotation marks
Either single or double quotes maybe used
<Univ location=“kolkata” > Jadavpur </Univ>
location attribute has the value “ kolkata”
Generally metadata (data about data) should be stored as
attributes, and the data itself should be stored as elements.
12. Attributes
• If attribute value itself contains double quotes, single quotes used
<person name='George "Shotgun" Ziegler'>
• Or, character entities may be used
<person name="George "Shotgun" Ziegler">
• Some things to consider while defining attributes
• attributes cannot contain multiple values (elements can)
• attributes cannot contain tree structures (elements can)
• attributes are not easily expandable (for future changes)
<note day="10" month="01" year="2008"
to="Tove" from="Jani" heading="Reminder"
body=“Meet me during this weekend!"> </note> incorrect !
13. Comments
• To be ignored by XML processors, used to add useful notes
• Syntax for writing comments in XML is similar to that of HTML
<!-- This is a comment -->
• Two dashes in the middle of a comment are not allowed
<!-- This is a -- comment --> not allowed
<!-- This is a - - comment --> allowed
14. Entity References
• Some characters have a special meaning in XML
• For example, a “<“ inside an element will generate an error
• Because, the parser interprets it as the start of a new element
<message> salary < 1000 </note> incorrect !
• “<“ replaced with an entity reference “<”
<message> salary < 1000 </note>
15. Entity
Reference
Description
< <, greater than
> >, less than
& &, ampersand
' ‘ apostrophe
" “, quotation mark
Entity References
There are 5 pre-defined entity references in XML:
16. Element vs. Attributes
• Same information may be represented as element or attribute
Date as attribute
<note date="2008-01-10">
<to>Amit</to>
<from>Bobby</from>
</note>
Date as element
<note>
<date>2008-01-10</date>
<to>Amit</to>
<from>Bobby</from>
</note>
Expanded date element
<note>
<date>
<year>2008</year>
<month>01</month>
<day>10</day>
</date>
<to>Amit</to>
<from>Bobby</from>
17. XML Tree
• XML documents are formed as trees of elements
• An XML tree starts at a root element and branches from the
root to child elements
• All elements can have sub elements (child elements)
<root>
<child>
<subchild>.....</subchild>
</child>
</root>
• Siblings are children on the same level i.e. same parent node
• All elements can have text content and attributes
18. XML Syntax
XML documents must have a root element, which is the parent of
all other nodes
Optional prolog at the beginning
All XML Elements must have a closing tag
XML tags are case sensitive - Opening and closing tags must be
written with the same case
<Message>This is incorrect</message>
XML Elements must be properly nested
<b><i>This text is bold and italic</b></i> improper nesting
XML attributes must be in quotes
19. Structure of Well Formed XML
Begins with a declaration that it is an XML file
Optional definition about the type of XML data and what DTD it
follows (prolog)
<?xml version="1.0" encoding="UTF-8" standalone=“Yes”?>
• Attr ‘version’ indicates that document conforms to
standard version 1.0 specifications of XML
• Attr ‘encoding’ specifies character set used as UTF-8
• Attr ‘standalone’ indicates whether browser needs to read
internal (value yes) or external DTD (value no)
Content marked up using XML tags and comments
If syntactical rules followed - XML well formed
If adheres to DTD/Scheme – XML valid
20. Example of Well Formed XML
<title>, <author>, <year>, and <price> have text content because they
contain text
<bookstore> and <book> have element contents, because they
contain elements
<book> has an attribute (category="children").
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book category="children">
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
</bookstore>
4 child elements of parent “book”
“book” child element of
root element “bookstore”
prolog
root element
end of root element
21. Valid XML Document
Errors in XML document stop XML applications
XML documents must be validated prior to using them
A valid XML document must be well formed and also must conform
to a document type definition.
Two different document type definitions that can be used with XML:
• DTD - The original Document Type Definition
• XML Schema - An XML-based alternative to DTD
22. Document Type Definition (DTD)
Defines structure of the content of an XML – allows storing data
consistently
Defines the rules and the legal elements and attributes for an XML
document
Specifies elements that can be present, whether optional, their
attributes and arrangements with respect to each other
Allows users to create DTDs – gives a complete control over the
process of checking that the structure & contents of XML are OK
With a DTD, independent groups of people can agree on a standard
DTD for interchanging data.
An application can use a DTD to verify that XML data is valid
Elements that can be used in a particular XML be defined using
internal or external DTD
23. Building Blocks of XML
From DTD point of view, following are the building blocks:
Elements
Attributes
Comment
PCDATA
CDATA
24. PCDATA & CDATA
PCDATA - Parsed Character Data
Character data - text found between the start tag and
the end tag of an XML element
Will be parsed by a parser – examined whether to be
treated as entities or mark-ups
Should not contain any &, <, or > characters; these need
to be represented by the & < and > entities
CDATA – Character Data
Text that will NOT be parsed by a parser
Tags inside the text will NOT be treated as markup and
entities will not be expanded
25. Internal DTD
DTD included as part of the XML document
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE root_element [
....
....
]>
26. Internal DTD
<?xml version="1.0"?>
<!DOCTYPE note [
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
<note>
<to>Amit</to>
<from>Bobby</from>
<heading>Reminder</heading>
<body>Meet me during this weekend</body>
</note>
27. Interpretation of DTD
!DOCTYPE note defines root element of document as note
!ELEMENT note defines that the note element must contain four
elements: "to, from, heading, body"
!ELEMENT to defines the to element to be of type #PCDATA
!ELEMENT from defines the from element to be of type #PCDATA
!ELEMENT heading defines the heading element to be of type
#PCDATA
!ELEMENT body defines the body element to be of type #PCDATA
28. External DTD
DTD stored as separate file having the declarations
Can be applied across multiple XML documents
<!DOCTYPE root_element [PUBLIC/SYSTEM ] “dtd_filename”
"dtd_file_location“ >
PUBLIC – DTD file on public server, file location to be mentioned
SYSTEM – Private DTD identified by the SYSTEM keyword., means
accessible by single or group of users
<!DOCTYPE> definition within XML file contains a reference to
the DTD file
29. External DTD
<?xml version="1.0"?>
<!DOCTYPE note SYSTEM "note.dtd">
<note>
<to>Amit</to>
<from>Bobby</from>
<heading>Reminder</heading>
<body>Meet me during this weekend</body>
</note>
<!DOCTYPE note [
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
30. XML Naming Rules
XML elements must follow certain naming rules
• Element names case-sensitive
• Names must start with a letter or underscore
• Names cannot start with the letters xml (or XML, or Xml, etc)
• Names can contain letters, digits, hyphens, underscores, and periods
• Names cannot contain spaces or tabs
Any name can be used, no words are reserved (except xml)
31. Naming Best Practices
In a valid XML, only those defined in DTD will be processed
Hence, as soon as new element is added in XML, must be defined in
DTD also
XML tags should be easy to remember
Follow naming rules and best naming practices
Short and simple names eg. <book_title>, not like <the_title_of_the_book>
Descriptive names, eg. <person>, <firstname> etc.
Consistent naming styles eg. All lowercase , Camel case etc.
Avoid colons, semicolons, dashes
32. DTD – Elements & Types
Declared with an ELEMENT declaration using the following structure
<!ELEMENT element-name category>
or
<!ELEMENT element-name (element-content)>
Empty elements like <br/>
<!ELEMENT element-name EMPTY> eg. <!ELEMENT br EMPTY>
Elements with parsed character data (container elements):
<!ELEMENT element-name (#PCDATA)> eg. <!ELEMENT from (#PCDATA)>
Elements declared with the category keyword ANY, can contain any
combination of parsable data declared elsewhere in DTD
(unrestricted elements)
<!ELEMENT element-name ANY> eg. <!ELEMENT note ANY>
33. DTD – Elements with Children (Sequences)
Elements with one or more children are declared with the name of
the children elements inside parentheses:
<!ELEMENT element-name (child1)>
or
<!ELEMENT element-name (child1,child2,...)>
Example:
<!ELEMENT note (to, from, heading, body)>
When children are declared in a sequence separated by commas, the
children must appear in the same sequence in the document
Individual declaration of child elements must follow
34. Declaration of Number of Occurrences
Only one occurrence of an element:
<!ELEMENT element-name (child-name)>
eg. <!ELEMENT note(message)>
Child element "message" must occur once, and only once inside the "note" element
Minimum one occurrence of an element:
<!ELEMENT element-name (child-name+)>
eg. <!ELEMENT note (message+)>
Element "message" must occur one or more times inside the "note" element
35. Declaration of Number of Occurrences
Zero or more occurrences of an element:
<!ELEMENT element-name (child-name*)>
eg. <!ELEMENT note (message*)>
Child element "message" can occur zero or more times inside the "note" element
Zero or one occurrence of an element:
<!ELEMENT element-name (child-name?)>
eg. <!ELEMENT note (message?)>
Element "message" can occur zero or one time inside the "note" element
36. Declaration of Either/Or and Mixed Content
Either/or Content Declaration:
<!ELEMENT note (to,from,header,(message|body))>
"note" element must contain a "to" element, a "from" element, a "header"
element, and either a "message" or a "body" element
Mixed Content Declaration:
<!ELEMENT note (#PCDATA|to|from|header|message)*>
"note" element can contain zero or more occurrences of parsed character data,
"to", "from", "header", or "message" elements
37. DTD - Attributes
Declared with an ATTLIST declaration using the following structure
<!ATTLIST element-name attribute-name attribute-type attribute-value>
Eg. <!ATTLIST payment type CDATA "cheque">
for XML <payment type="cheque" />
Example 2
<!ELEMENT book(title, author)>
<!ELEMENT title(#PCDATA)>
<!ATTLIST title year_published(CDATA) cover(paperback|hardcover) >
<!ELEMENT author(#PCDATA)>
Attribute type could be:
CDATA – value is character data
(en1 | en2 | en3) - XML doc must choose one from an enumerated values
ID - value is unique id, should start with alphanumeric
IDREF or IDREFs - id of another element or list of such elements
Entity or entities - value is an entity or list of entities
38. DTD - Attributes
Enumerated attribute values
SYNTAX:
<!ATTLIST element-name attribute-name (en1|en2|..)
default-value>
DTD:
<!ATTLIST payment type (check|cash) "cash">
XML:
<payment type="check" />
or
<payment type="cash" />
(default value of type, if not defined is “cash”)
39. DTD - Attributes
Attribute value could be:
#DEFAULT default value #REQUIRED must be included
#FIXED fixed value #IMPLIED value optional
DTD: <!ELEMENT square EMPTY >
<!ATTLIST square width CDATA “0”> -- default value
Conforming element declaration in XML doc: <square width=“100” />
DTD <!ATTLIST person number CDATA #REQUIRED>
Valid XML:
<person number="5677" />
Invalid XML:
<person />
40. DTD - Attributes
Attribute value could be:
#DEFAULT default value #REQUIRED must be included
#FIXED fixed value #IMPLIED value optional
DTD <!ATTLIST sender company CDATA #FIXED "Microsoft">
Valid XML <sender company="Microsoft" />
Invalid XML <sender company=“IBM" />
<!ATTLIST element-name attribute-name attribute-type #IMPLIED>
DTD <!ATTLIST contact phone CDATA #IMPLIED>
Valid XML:
<contact phone="555-667788" /> and <contact />
43. When to use DTD/Schema
With a DTD, independent groups of people can agree to use a
standard DTD for interchanging data.
With a DTD, one can verify own data, as well as received from the
outside world
When not to use DTD/Schema
When experimenting with XML, or working with small XML files,
creating DTDs may be a waste of time
If developing applications, wait until the specification is stable
before you add a document definition.
In real world, computer systems and databases store data in different formats – challenge for developers to exchange data
Financials / B2B applications etc.
Empty elements can contain attributes
XML documents that conform to the syntax rules above are said to be "Well Formed" XML documents.
White space is preserved in XML, unlike in HTML
!Doctype declaration specifies beginning of dtd
ANY – elements declared elsewhere in DTD
software might stop working because of validation errors.