SGML (Standard Generalized
Markup Language)
• It is an internationally agreed standard for data
representation.
• It is an international standard for the definition
of device independent, system independent
methods of representing texts in electronic
form.
Introduction
• XML stands for EXtensible Markup Language
• XML is a markup language much like HTML
• A simplified version of SGML
• More flexible and adaptable than HTML
• XML was designed to describe data
• XML tags are not predefined. You must define your
own tags
• XML uses a Document Type Definition (DTD) or
an XML Schema to describe the data
• XML is a W3C Recommendation.
World Wide Web Consortium published the first XML 1.0
standard definition in 1998.
Cont..
Difference between XML and HTML
The main difference between XML and HTML
– XML was designed to carry data. (XML is not
a replacement for HTML)
XML and HTML were designed with different goals:
– XML was designed to describe data and to
focus on what data is.
HTML was designed to display data and to
focus on how data looks.
– HTML is about displaying information, while
XML is about describing information.
Why Is XML Important?
• Plain Text
– Easy to edit
– Useful for storing small amounts of data
– Possible to efficiently store large amounts of XML
data through an XML front end to a database
• Data Identification
– Tell you what kind of data you have
– Can be used in different ways by different
applications
Why is XML important?
• Linkability -- XLink and XPointer
– Simple unidirectional hyperlinks
– Two-way links
– Multiple-target links
– “Expanding” links
• Easily Processed
– Regular and consistent notation
• Hierarchical
– Faster to access
– Easier to rearrange
XML Specifications
• XML 1.0
Defines the syntax of XML
• XPointer, XLink
Defines a standard way to represent links between resources
• XSL
Defines the standard stylesheet language for XML
XML Syntax
• XML declaration is the first statement
• All XML elements must have a closing tag
• XML tags are case sensitive
• All XML elements must be properly nested
• All XML documents must have a root tag
• Attribute values must always be quoted
• With XML, white space is preserved
• Comments in XML: <!-- This is a comment -->
• Certain characters are reserved for parsing
XML Validation
There are two types of XML documents
• "Well Formed" XML document
--correct XML syntax
• "Valid" XML document
– “well formed”
– Conforms to the rules of a DTD (Document Type
Definition)
• XML DTD
– defines the legal building blocks of an XML
document
– Can be inline in XML or as an external reference
• XML Schema
– an XML based alternative to DTD, more powerful
– Support namespace and data types
Displaying XML
• XML documents do not carry information about how to
display the data
• We can add display information to XML with
– CSS (Cascading Style Sheets)
– XSL (eXtensible Stylesheet Language) --- preferred
XML support in IE 5.0+
Internet Explorer 5.0 has the following XML
support:
• Viewing of XML documents
• Full support for W3C DTD standards
• Binding XML data to HTML elements
• Transforming and displaying XML with XSL
• Displaying XML with CSS
• Access to the XML DOM (Document Object Model)
*Netscape 6.0 also have full XML support
XML features
• XML uses the concept of document type and
hence a DTD (Document Type Definition) to
describe data
• XML with DTD is self descriptive
• XML separates data from display formats
• XML can be used as a format to exchange data
XML Syntax consists of
• XML Declaration
• XML Elements
• XML Attributes
• The first line of an XML document
should always consist of an XML
declaration defining the version of XML
Main Components of an XML
Document
• Elements: <hello>
• Attributes: <item id=“33905”>
• Entities: < (<)
• Advanced Components
– CData Sections
– Processing Instructions
XML Attributes
• XML attributes are used to describe XML
elements or to provide additional information
about elements.
• Attributes provide additional information that
is not part of the data.
Ex:
• <Book no=“99-2456” media=“CD”></Book>
XML Attributes
• XML elements can have attributes in
name/value pairs as in HTML.
• Attributes must always be in quotes.
Either single or double quotes are valid,
though double quotes are most
common.
• Attributes are always contained within
the start tag of an element.
Attributes Vs. Elements
Case 1 ( Attributes)
< Book no= “99-2356”type= “CD”>
< author>
< firstname>XXX</firstname>
<lastname>YYY</lastname>
</author>
</Book>
Where elements scores over attributes
• Elements can describe structure but not
attributes
• Attributes are more difficult to manipulate
by program code than elements
• Attribute values are difficult to validate
against a DTD
XML strengths
• Its ability to describe data
• Its ability to structure data
• Separate display from structure
• Supported by industry
• Availability of tools
An example of XML
<?xml version="1.0" encoding="ISO-8859-1"?>
<note>
<to>XXX</to>
<from>YYY</from>
<heading>XML</heading>
<body> Extensible Markup Language </body>
</note>
Cont.,
• The first line represents the XML document
declaration and it is mandatory.
• Every XML has a root element. In our example,
the second line is the root element -
<ProductList>
• The root element can contain child elements. In
our example, Product is the child element of
ProductList
• Each element can contain sub-elements.
– <P_CODE>,<P_PRICE> are sub-elements.
Example
<?xml version="1.0" encoding= "ISO-8859-1" ?>
<book>
<title> XML </title>
<chapter> introduction to xml
<para>Markup languages</para>
<para>Features of XML</para>
</chapter>
<chapter>XML syntax
<para>Elements must be enclosed in tags</para>
<para>Elements must be properly nested</para>
</chapter>
</book>
How do you get the data?
XML
data
Parser Information
structure
(tree+links)
Documents, stylesheets, and other data can all be expressed in
XML.
DOM Interface
Any application can
plug in via an API
called “Document
Object Model”
DTD/Schema
This model can work locally or over a network.
Parsing, tree-building, and access can shift between
client/server
XML Parser
• All modern browsers have a built-in XML parser.
• An XML parser converts an XML document into
an XML DOM object - which can then be
manipulated with a JavaScript.
XML DOM
• A DOM (Document Object Model) defines a
standard way for accessing and manipulating
XML documents.
XML Namespaces
• XML Namespaces provide a method to
avoid element name conflicts.
• This XML carries HTML table information:
<table>
<tr>
<td>Apples</td>
<td>Bananas</td>
</tr>
</table>
• This XML carries information about a table
(a piece of furniture):
<table>
<name>African Coffee Table</name>
<width>80</width>
<length>120</length>
</table>
•If these XML fragments were added together, there
would be a name conflict.
•Both contain a <table>element, but the elements
have different content and meaning.
An XML parser will not know how to handle these
differences.
Solving the Name Conflict Using a Prefix
• Name conflicts in XML can easily be avoided
using a name prefix.
• This XML carries information about an HTML
table, and a piece of furniture:
<h:table>
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>
<f:table>
<f:name>African Coffee Table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
• In the example above, there will be no
conflict because the two <table> elements
have different names.
XML Namespaces - The xmlns Attribute
• When using prefixes in XML, a so-called
namespace for the prefix must be defined.
• The namespace is defined by the xmlns
attribute in the start tag of an element.
• The namespace declaration has the
following syntax. xmlns:prefix="URI".
• In the example above, the xmlns attribute in the
<table> tag give the h: and f: prefixes a qualified
namespace.
• When a namespace is defined for an element,
all child elements with the same prefix are
associated with the same namespace.
• Namespaces can be declared in the elements
where they are used or in the XML root element:
URI
• Uniform Resource Identifier (URI)
• A Uniform Resource Identifier (URI) is a string
of characters which identifies an Internet
Resource.
• The most common URI is the Uniform
Resource Locator (URL) which identifies an
Internet domain address. Another, not so
common type of URI is the Universal Resource
Name (URN).
PCDATA - Parsed Character Data
• XML parsers normally parse all the text in
an XML document.
• When an XML element is parsed, the text
between the XML tags is also parsed:
<message>This text is also parsed</message>
<name><first>Bill</first><last>Gates</last></name>
The parser does this because XML elements can
contain other elements, as in this example, where
the <name> element contains two other elements
(first and last): and the parser will break it up into
sub-elements like this:
<name>
<first>Bill</first>
<last>Gates</last>
</name
Parsed Character Data (PCDATA) is a term used about text
data that will be parsed by the XML parser.
CDATA - (Unparsed) Character Data
• The term CDATA is used about text data that should not
be parsed by the XML parser.
• Characters like "<" and "&" are illegal in XML elements.
• "<" will generate an error because the parser interprets it
as the start of a new element.
• "&" will generate an error because the parser interprets it
as the start of an character entity.
• Some text, like JavaScript code, contains a lot of "<" or
"&" characters. To avoid errors script code can be
defined as CDATA.
• Everything inside a CDATA section is ignored by
the parser.
• A CDATA section starts with "<![CDATA[" and
ends with "]]>":
<script>
<![CDATA[
function matchwo(a,b)
{
if (a < b && a < 0) then
{
return 1;
}
else
{
return 0;
}
}
]]>
</script>
In this example, everything inside
the CDATA section is ignored by the
parser
Conclusion
• XML is a self-descriptive language
• XML is a powerful language to describe
structure data for web application
• XML is currently applied in many fields
• Many vendors already supports or will support
XML
• XML Documents can be validated through the
use of DTD and XSD documents
• XML impacts B2B data exchanges, legacy
system integration, web page development,
database system integration.