SGML (Standard Generalized
• It is an internationally agreed standard for data
• It is an international standard for the definition
of device independent, system independent
methods of representing texts in electronic
• XML stands for EXtensible Markup Language
• XML is a markup language much like HTML
• A simplified version of SGML
• More flexible and adaptable than HTML
• XML was designed to describe data
• XML tags are not predefined. You must define your
• XML uses a Document Type Definition (DTD) or
an XML Schema to describe the data
• XML is a W3C Recommendation.
World Wide Web Consortium published the first XML 1.0
standard definition in 1998.
Difference between XML and HTML
The main difference between XML and HTML
– XML was designed to carry data. (XML is not
a replacement for HTML)
XML and HTML were designed with different goals:
– XML was designed to describe data and to
focus on what data is.
HTML was designed to display data and to
focus on how data looks.
– HTML is about displaying information, while
XML is about describing information.
Why Is XML Important?
• Plain Text
– Easy to edit
– Useful for storing small amounts of data
– Possible to efficiently store large amounts of XML
data through an XML front end to a database
• Data Identification
– Tell you what kind of data you have
– Can be used in different ways by different
Why is XML important?
• Linkability -- XLink and XPointer
– Simple unidirectional hyperlinks
– Two-way links
– Multiple-target links
– “Expanding” links
• Easily Processed
– Regular and consistent notation
– Faster to access
– Easier to rearrange
• XML 1.0
Defines the syntax of XML
• XPointer, XLink
Defines a standard way to represent links between resources
Defines the standard stylesheet language for XML
• XML declaration is the first statement
• All XML elements must have a closing tag
• XML tags are case sensitive
• All XML elements must be properly nested
• All XML documents must have a root tag
• Attribute values must always be quoted
• With XML, white space is preserved
• Comments in XML: <!-- This is a comment -->
• Certain characters are reserved for parsing
There are two types of XML documents
• "Well Formed" XML document
--correct XML syntax
• "Valid" XML document
– “well formed”
– Conforms to the rules of a DTD (Document Type
• XML DTD
– defines the legal building blocks of an XML
– Can be inline in XML or as an external reference
• XML Schema
– an XML based alternative to DTD, more powerful
– Support namespace and data types
• XML documents do not carry information about how to
display the data
• We can add display information to XML with
– CSS (Cascading Style Sheets)
– XSL (eXtensible Stylesheet Language) --- preferred
XML support in IE 5.0+
Internet Explorer 5.0 has the following XML
• Viewing of XML documents
• Full support for W3C DTD standards
• Binding XML data to HTML elements
• Transforming and displaying XML with XSL
• Displaying XML with CSS
• Access to the XML DOM (Document Object Model)
*Netscape 6.0 also have full XML support
• XML uses the concept of document type and
hence a DTD (Document Type Definition) to
• XML with DTD is self descriptive
• XML separates data from display formats
• XML can be used as a format to exchange data
XML Syntax consists of
• XML Declaration
• XML Elements
• XML Attributes
• The first line of an XML document
should always consist of an XML
declaration defining the version of XML
Main Components of an XML
• Elements: <hello>
• Attributes: <item id=“33905”>
• Entities: < (<)
• Advanced Components
– CData Sections
– Processing Instructions
• XML attributes are used to describe XML
elements or to provide additional information
• Attributes provide additional information that
is not part of the data.
• <Book no=“99-2456” media=“CD”></Book>
• XML elements can have attributes in
name/value pairs as in HTML.
• Attributes must always be in quotes.
Either single or double quotes are valid,
though double quotes are most
• Attributes are always contained within
the start tag of an element.
Attributes Vs. Elements
Case 1 ( Attributes)
< Book no= “99-2356”type= “CD”>
Where elements scores over attributes
• Elements can describe structure but not
• Attributes are more difficult to manipulate
by program code than elements
• Attribute values are difficult to validate
against a DTD
• Its ability to describe data
• Its ability to structure data
• Separate display from structure
• Supported by industry
• Availability of tools
• The first line represents the XML document
declaration and it is mandatory.
• Every XML has a root element. In our example,
the second line is the root element -
• The root element can contain child elements. In
our example, Product is the child element of
• Each element can contain sub-elements.
– <P_CODE>,<P_PRICE> are sub-elements.
<?xml version="1.0" encoding= "ISO-8859-1" ?>
<title> XML </title>
<chapter> introduction to xml
<para>Features of XML</para>
<para>Elements must be enclosed in tags</para>
<para>Elements must be properly nested</para>
How do you get the data?
Documents, stylesheets, and other data can all be expressed in
Any application can
plug in via an API
This model can work locally or over a network.
Parsing, tree-building, and access can shift between
• All modern browsers have a built-in XML parser.
• An XML parser converts an XML document into
an XML DOM object - which can then be
• A DOM (Document Object Model) defines a
standard way for accessing and manipulating
• XML Namespaces provide a method to
avoid element name conflicts.
• This XML carries HTML table information:
• This XML carries information about a table
(a piece of furniture):
<name>African Coffee Table</name>
•If these XML fragments were added together, there
would be a name conflict.
•Both contain a <table>element, but the elements
have different content and meaning.
An XML parser will not know how to handle these
Solving the Name Conflict Using a Prefix
• Name conflicts in XML can easily be avoided
using a name prefix.
• This XML carries information about an HTML
table, and a piece of furniture:
<f:name>African Coffee Table</f:name>
• In the example above, there will be no
conflict because the two <table> elements
have different names.
XML Namespaces - The xmlns Attribute
• When using prefixes in XML, a so-called
namespace for the prefix must be defined.
• The namespace is defined by the xmlns
attribute in the start tag of an element.
• The namespace declaration has the
following syntax. xmlns:prefix="URI".
• In the example above, the xmlns attribute in the
<table> tag give the h: and f: prefixes a qualified
• When a namespace is defined for an element,
all child elements with the same prefix are
associated with the same namespace.
• Namespaces can be declared in the elements
where they are used or in the XML root element:
• Uniform Resource Identifier (URI)
• A Uniform Resource Identifier (URI) is a string
of characters which identifies an Internet
• The most common URI is the Uniform
Resource Locator (URL) which identifies an
Internet domain address. Another, not so
common type of URI is the Universal Resource
PCDATA - Parsed Character Data
• XML parsers normally parse all the text in
an XML document.
• When an XML element is parsed, the text
between the XML tags is also parsed:
<message>This text is also parsed</message>
The parser does this because XML elements can
contain other elements, as in this example, where
the <name> element contains two other elements
(first and last): and the parser will break it up into
sub-elements like this:
Parsed Character Data (PCDATA) is a term used about text
data that will be parsed by the XML parser.
CDATA - (Unparsed) Character Data
• The term CDATA is used about text data that should not
be parsed by the XML parser.
• Characters like "<" and "&" are illegal in XML elements.
• "<" will generate an error because the parser interprets it
as the start of a new element.
• "&" will generate an error because the parser interprets it
as the start of an character entity.
"&" characters. To avoid errors script code can be
defined as CDATA.
• Everything inside a CDATA section is ignored by
• A CDATA section starts with "<![CDATA[" and
ends with "]]>":
if (a < b && a < 0) then
In this example, everything inside
the CDATA section is ignored by the
• XML is a self-descriptive language
• XML is a powerful language to describe
structure data for web application
• XML is currently applied in many fields
• Many vendors already supports or will support
• XML Documents can be validated through the
use of DTD and XSD documents
• XML impacts B2B data exchanges, legacy
system integration, web page development,
database system integration.