2. SGML (Standard Generalized
Markup Language)
• It is an internationally agreed standard for data
representation.
• It is an international standard for the definition
of device independent, system independent
methods of representing texts in electronic
form.
3. Introduction
• XML stands for EXtensible Markup Language
• XML is a markup language much like HTML
• A simplified version of SGML
• More flexible and adaptable than HTML
• XML was designed to describe data
4. • XML tags are not predefined. You must define your
own tags
• XML uses a Document Type Definition (DTD) or
an XML Schema to describe the data
• XML is a W3C Recommendation.
World Wide Web Consortium published the first XML 1.0
standard definition in 1998.
Cont..
5. Difference between XML and HTML
The main difference between XML and HTML
– XML was designed to carry data. (XML is not
a replacement for HTML)
XML and HTML were designed with different goals:
– XML was designed to describe data and to
focus on what data is.
HTML was designed to display data and to
focus on how data looks.
– HTML is about displaying information, while
XML is about describing information.
6. Why Is XML Important?
• Plain Text
– Easy to edit
– Useful for storing small amounts of data
– Possible to efficiently store large amounts of XML
data through an XML front end to a database
• Data Identification
– Tell you what kind of data you have
– Can be used in different ways by different
applications
7. Why is XML important?
• Linkability -- XLink and XPointer
– Simple unidirectional hyperlinks
– Two-way links
– Multiple-target links
– “Expanding” links
• Easily Processed
– Regular and consistent notation
• Hierarchical
– Faster to access
– Easier to rearrange
8. XML Specifications
• XML 1.0
Defines the syntax of XML
• XPointer, XLink
Defines a standard way to represent links between resources
• XSL
Defines the standard stylesheet language for XML
9. XML Syntax
• XML declaration is the first statement
• All XML elements must have a closing tag
• XML tags are case sensitive
• All XML elements must be properly nested
• All XML documents must have a root tag
• Attribute values must always be quoted
• With XML, white space is preserved
• Comments in XML: <!-- This is a comment -->
• Certain characters are reserved for parsing
10. XML Validation
There are two types of XML documents
• "Well Formed" XML document
--correct XML syntax
• "Valid" XML document
– “well formed”
– Conforms to the rules of a DTD (Document Type
Definition)
• XML DTD
– defines the legal building blocks of an XML
document
– Can be inline in XML or as an external reference
• XML Schema
– an XML based alternative to DTD, more powerful
– Support namespace and data types
11. Displaying XML
• XML documents do not carry information about how to
display the data
• We can add display information to XML with
– CSS (Cascading Style Sheets)
– XSL (eXtensible Stylesheet Language) --- preferred
12. XML support in IE 5.0+
Internet Explorer 5.0 has the following XML
support:
• Viewing of XML documents
• Full support for W3C DTD standards
• Binding XML data to HTML elements
• Transforming and displaying XML with XSL
• Displaying XML with CSS
• Access to the XML DOM (Document Object Model)
*Netscape 6.0 also have full XML support
13. XML features
• XML uses the concept of document type and
hence a DTD (Document Type Definition) to
describe data
• XML with DTD is self descriptive
• XML separates data from display formats
• XML can be used as a format to exchange data
14. XML Syntax consists of
• XML Declaration
• XML Elements
• XML Attributes
• The first line of an XML document
should always consist of an XML
declaration defining the version of XML
16. Main Components of an XML
Document
• Elements: <hello>
• Attributes: <item id=“33905”>
• Entities: < (<)
• Advanced Components
– CData Sections
– Processing Instructions
17. XML Attributes
• XML attributes are used to describe XML
elements or to provide additional information
about elements.
• Attributes provide additional information that
is not part of the data.
Ex:
• <Book no=“99-2456” media=“CD”></Book>
18. XML Attributes
• XML elements can have attributes in
name/value pairs as in HTML.
• Attributes must always be in quotes.
Either single or double quotes are valid,
though double quotes are most
common.
• Attributes are always contained within
the start tag of an element.
19. Attributes Vs. Elements
Case 1 ( Attributes)
< Book no= “99-2356”type= “CD”>
< author>
< firstname>XXX</firstname>
<lastname>YYY</lastname>
</author>
</Book>
21. Where elements scores over attributes
• Elements can describe structure but not
attributes
• Attributes are more difficult to manipulate
by program code than elements
• Attribute values are difficult to validate
against a DTD
22. XML strengths
• Its ability to describe data
• Its ability to structure data
• Separate display from structure
• Supported by industry
• Availability of tools
24. An example of XML
<?xml version="1.0" encoding="ISO-8859-1"?>
<note>
<to>XXX</to>
<from>YYY</from>
<heading>XML</heading>
<body> Extensible Markup Language </body>
</note>
26. Cont.,
• The first line represents the XML document
declaration and it is mandatory.
• Every XML has a root element. In our example,
the second line is the root element -
<ProductList>
• The root element can contain child elements. In
our example, Product is the child element of
ProductList
• Each element can contain sub-elements.
– <P_CODE>,<P_PRICE> are sub-elements.
27. Example
<?xml version="1.0" encoding= "ISO-8859-1" ?>
<book>
<title> XML </title>
<chapter> introduction to xml
<para>Markup languages</para>
<para>Features of XML</para>
</chapter>
<chapter>XML syntax
<para>Elements must be enclosed in tags</para>
<para>Elements must be properly nested</para>
</chapter>
</book>
29. How do you get the data?
XML
data
Parser Information
structure
(tree+links)
Documents, stylesheets, and other data can all be expressed in
XML.
DOM Interface
Any application can
plug in via an API
called “Document
Object Model”
DTD/Schema
This model can work locally or over a network.
Parsing, tree-building, and access can shift between
client/server
30. XML Parser
• All modern browsers have a built-in XML parser.
• An XML parser converts an XML document into
an XML DOM object - which can then be
manipulated with a JavaScript.
XML DOM
• A DOM (Document Object Model) defines a
standard way for accessing and manipulating
XML documents.
31. XML Namespaces
• XML Namespaces provide a method to
avoid element name conflicts.
• This XML carries HTML table information:
<table>
<tr>
<td>Apples</td>
<td>Bananas</td>
</tr>
</table>
32. • This XML carries information about a table
(a piece of furniture):
<table>
<name>African Coffee Table</name>
<width>80</width>
<length>120</length>
</table>
•If these XML fragments were added together, there
would be a name conflict.
•Both contain a <table>element, but the elements
have different content and meaning.
An XML parser will not know how to handle these
differences.
33. Solving the Name Conflict Using a Prefix
• Name conflicts in XML can easily be avoided
using a name prefix.
• This XML carries information about an HTML
table, and a piece of furniture:
<h:table>
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>
<f:table>
<f:name>African Coffee Table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
34. • In the example above, there will be no
conflict because the two <table> elements
have different names.
35. XML Namespaces - The xmlns Attribute
• When using prefixes in XML, a so-called
namespace for the prefix must be defined.
• The namespace is defined by the xmlns
attribute in the start tag of an element.
• The namespace declaration has the
following syntax. xmlns:prefix="URI".
37. • In the example above, the xmlns attribute in the
<table> tag give the h: and f: prefixes a qualified
namespace.
• When a namespace is defined for an element,
all child elements with the same prefix are
associated with the same namespace.
• Namespaces can be declared in the elements
where they are used or in the XML root element:
38. URI
• Uniform Resource Identifier (URI)
• A Uniform Resource Identifier (URI) is a string
of characters which identifies an Internet
Resource.
• The most common URI is the Uniform
Resource Locator (URL) which identifies an
Internet domain address. Another, not so
common type of URI is the Universal Resource
Name (URN).
39. PCDATA - Parsed Character Data
• XML parsers normally parse all the text in
an XML document.
• When an XML element is parsed, the text
between the XML tags is also parsed:
<message>This text is also parsed</message>
40. <name><first>Bill</first><last>Gates</last></name>
The parser does this because XML elements can
contain other elements, as in this example, where
the <name> element contains two other elements
(first and last): and the parser will break it up into
sub-elements like this:
<name>
<first>Bill</first>
<last>Gates</last>
</name
Parsed Character Data (PCDATA) is a term used about text
data that will be parsed by the XML parser.
41. CDATA - (Unparsed) Character Data
• The term CDATA is used about text data that should not
be parsed by the XML parser.
• Characters like "<" and "&" are illegal in XML elements.
• "<" will generate an error because the parser interprets it
as the start of a new element.
• "&" will generate an error because the parser interprets it
as the start of an character entity.
• Some text, like JavaScript code, contains a lot of "<" or
"&" characters. To avoid errors script code can be
defined as CDATA.
42. • Everything inside a CDATA section is ignored by
the parser.
• A CDATA section starts with "<![CDATA[" and
ends with "]]>":
<script>
<![CDATA[
function matchwo(a,b)
{
if (a < b && a < 0) then
{
return 1;
}
else
{
return 0;
}
}
]]>
</script>
In this example, everything inside
the CDATA section is ignored by the
parser
43. Conclusion
• XML is a self-descriptive language
• XML is a powerful language to describe
structure data for web application
• XML is currently applied in many fields
• Many vendors already supports or will support
XML
• XML Documents can be validated through the
use of DTD and XSD documents
• XML impacts B2B data exchanges, legacy
system integration, web page development,
database system integration.