Introduction to XML

INTRODUCTION TO XML
XML document structure – Well-formed and
valid documents – Namespaces – DTD – XML
Schema – X-Files
Prepared By : Prabu.U

XML is not…
A replacement for HTML
(but HTML can be generated from XML)
A presentation format
(but XML can be converted into one)
A programming language
(but it can be used with almost any language)
A network transfer protocol
(but XML may be transferred over a network)
A database
(but XML may be stored into a database)

XML by Example
<article>
<author>Gerhard Weikum</author>
<title>The Web in 10 Years</title>
</article>
 Easy to understand for human users
 Very expressive (semantics along with the data)
 Well structured, easy to read and write from
programs
This looks nice, but…

XML by Example
<t108>
<x87>Gerhard Weikum</x87>
<g10>The Web in 10 Years</g10>
</t108>
 Hard to understand for human users
 Not expressive (no semantics along with the data)
 Well structured, easy to read and write from
programs
… this is XML, too:

XML by Example
<data>
ch37fhgks73j5mv9d63h5mgfkds8d984lgnsmcns983
</data>
 Impossible to understand for human users
 Not expressive (no semantics along with the data)
 Unstructured, read and write only with special
programs
… and what about this XML document:
The actual benefit of using XML highly depends on the
design of the application.

Possible Advantages of Using XML
 Truly Portable Data
 Easily readable by human users
 Very expressive (semantics near data)
 Very flexible and customizable (no finite tag set)
 Easy to use from programs (libs available)
 Easy to convert into other representations
(XML transformation languages)
 Many additional standards and tools
 Widely used and supported

XML DOCUMENT STRUCTURE
 XML Declaration
 Document Type Declaration
 Elements data
 Attributes data
 XML Content.

XML Document Structure
 The major portions of an XML document
include the following:
 The XML declaration
 The Document Type Declaration
 The element data
 The attribute data
 The character data or XML content

Components of XML Declaration
Component Description
<?xml Starts the beginning of the processing instruction (in this case,
for the XML declaration).
Version=”xxx” Describes the specific version of XML being used in the
document (in this case, version 1.0 of the W3C specification).
Future iterations could be 2.0, 1.1, and so on.
standalone=”xxx” This standalone option defines whether documents are allowed
to contain external markup declarations. This option can be set
to “yes” or “no”.
encoding=”xxx” Indicates the character encoding that the document uses. The
default is “US-ASCII” but can be set to any value that XML
processors recognize and can support. The most common
alternate setting is “UTF-8”.

Valid XML Declaration
<?xml version=”1.0” standalone=”yes”?>
<?xml version=”1.0” standalone=”no”?>
<?xml version=”1.0” encoding=”UTF-8” standalone=”no”?>

Document Type Declaration
 The Document Type Declaration (DOCTYPE) gives a name to the
XML content and provides a means to guarantee the document’s
validity, either by including or specifying a link to a Document Type
Definition (DTD).
 A DOCTYPE can identify the constraints on the validity of the
document by making a reference to an external DTD subset and/or
include the DTD internally within the document by means of an
internal DTD subset.

Document Type Declaration
 The general forms of Document Type Declarations follow the forms
identified
<!DOCTYPE NAME SYSTEM “file”>
<!DOCTYPE NAME [ ]>
<!DOCTYPE NAME SYSTEM “file” [ ]>

Components of Document Type Declaration
Component Description
< The start of the XML tag (in this case, the beginning of the
Document Type Declaration).
!DOCTYPE The beginning of the Document Type Declaration.
NAME Specifies the name of the document type being defined.
This must comply with XML naming rules.
SYSTEM Specifies that the following system identifier will be read
and processed.
“file” Specifies the name of the file to be processed by the system.
[ Starts an internal DTD subset.
] Ends the internal DTD subset.
> The end of the XML tag (in this case, the end of the
Document Type Declaration).

XML Elements
 XML elements are either a matched pair of XML tags or single XML
tags that are “self-closing.”
 Matching XML tags consist of markup tags that contain the same
content, except that the ending tag is prefixed with a forward slash.
 For example, shirt element begins with <shirt> and ends with
</shirt>.
 Everything between these tags is additional XML text that has either
been defined by a DTD or can exist by virtue of the document
merely being well formed.

XML Attributes
 Within elements, additional information can be communicated to
XML processors that modify the nature of the encapsulated content.
 For example, we may have specified a <price> element, but how do
we know what currency this applies to?
 Although it’s possible to create a <currency> subtag, another more
viable approach is to use an attribute.

XML Attributes Examples
 Attributes are name/value pairs contained within the start element
that can specify text strings that modify the context of the element.
 An example of possible attributes for shirt is shown below.
<price currency=”USD”>…</price>
<on_sale start_date=”10-15-2001”/>

Significant Features of Attributes
 One of the significant features of attributes is that the content
described by them can follow strict rules as to their value.
 Attributes can be required, optional, or contain a fixed value.
 Required or optional attributes can either contain freeform text or
contain one of a set list of enumerated values.
 Fixed attributes, if present, must contain a specific value.

Entity References
 Entity references are delimited by an ampersand at the beginning
and a semicolon at the ending. The content contained between the
delimiters is the entity that will be replaced.
 For example, the < entity inserts the less-than sign (<) into a
document.

Comments
 Comments are quite simple to include in a document.
 The character sequence  ends the
comment.
 Comments can be placed anywhere in a document and are not
considered to be part of the textual content of an XML document.
 Example:

<animal>Elephant</animal>

Processing Instructions
 Processing instructions (PIs) perform a similar function as
comments in that they are not a textual part of an XML document
but provide information to applications as to how the content
should be processed.
 Unlike comments, XML processors are required to pass along PIs.
 Processing instructions have the following form:
<?instruction options?>

Processing Instructions
 The instruction name, called the PI target, is a special identifier that
the processing application is intended to understand.
 Example of a Processing Instruction
<?send-message “process complete”?>

XML Content Model
 A content model provides a framework around which the
extensibility features of XML can be taken advantage of, if at all.
 At the very least, the model provides an indication of the intent of
the document creator as to the explicit extensibility of the document,
because users can extend a document using an internal DTD subset
if they are so inclined.
 However, by doing so, the users are “overriding” the content model
as intended by the document creator.

Open Content Model
 An “open” content model enables a user to add additional elements
and attributes to a document without them having to be explicitly
declared in a DTD or schema.
 In an open content model, users can take full advantage of the
extensibility of XML without having to make changes to a DTD. As
expected, the use of a DTD precludes an open content model.
 In fact, you cannot have an open content model when using a DTD,
except if a user chooses to override the DTD by using an internal
DTD subset.
 However, new schema formats, such as XML Schema, provide this
mechanism.

Closed Content Model
 A “closed” content model restricts elements and attributes to only
those that are specified in the DTD or schema.
 By definition, a DTD is a closed content model because it describes
what may exclusively appear in the content of the element.
 In a closed model, the XML document creator maintains strict
control of specifically which elements and attributes as well as the
order that markup may appear in a given compliant document.
 Closed models are helpful when you’re enforcing strict document
exchange and provide a means to guarantee that incoming data
complies with data requirements.Prepared By : Prabu.U

Mixed Content Model
 A “mixed” content model enables individual elements to allow an
arbitrary mixture of text and additional elements.
 These mixed elements are useful when freeform fields, with possible
XML or other markup data are to be included.
 This allows the majority of the document to remain closed while
portions of the document are noted as extensible.
 Mixed models represent a good compromise that can allow for
strictness while providing a limited means for extensibility.

Handling Whitespaces in XML
 Whitespace is the term used for character spaces, tabs, linefeeds, and
carriage returns in documents.
 Issues around the handling of these seemingly “invisible” characters
are important for many reasons.
 It is hard to tell whether whitespace should be ignored or passed “as
is” to documents

Significance of Whitespaces
 XML processors can determine whether whitespace is significant is
by knowing the content model of the XML document.
 Basically, in a mixed content model, whitespace is significant
because the application is not sure as to whether or not the
whitespace will be used in processing, but in an open or closed
model, it is not.
 However, the rule for XML processors is that they must pass all
characters that are not markup intact to the application.
 Validating processors also inform applications about the significance
of the various whitespace characters.Prepared By : Prabu.U

Rules of XML Structure
 All xml elements must have a closing tag
 Xml tags are case sensitive
 All xml elements must have proper nesting
 All xml documents must contain a single root element
 Attribute values must be quoted
 Attributes may only appear once in the same start tag
 Attribute values cannot contain references to external entities
 All entities except amp, lt, gt, apos, and quot must be declared
before they are used

Well-Formed XML?
• No, CHILD2 and CHILD3 do not nest properly
<?xml Version=“1.0” ?>
<PARENT>
<CHILD1>This is element 1</CHILD1>
<CHILD2><CHILD3>Number 3</CHILD2></CHILD3>
</PARENT>

Well-Formed XML?
• No, there are two root elements
<PARENT>
</PARENT>
<PARENT>
<CHILD1>This is another element 1</CHILD1>
</PARENT>

Well-Formed XML?
• Yes
<PARENT>
<CHILD2/>
<CHILD3></CHILD3>
</PARENT>

NAMESPACES
 In XML the namespace is defined by using an attribute that is called
xmlns.
 The main rules for XML namespace definition are:
 The value of the attribute specifies the location that the namespace
could be found.
 The namespace attribute can appear in the start-tag of an element, in
which case all children of that element would be associated to the
same namespace.
 The namespace can appear in the root element, in which case the
whole document would be associated to the defined namespace.Prepared By : Prabu.U

NAMESPACES
 Within an XML document, namespaces can be declared using one of
two methods: a default declaration or an explicit declaration.
 Which method to use is completely open and left up to you; either
way will suffice.

Default Declaration
 A default namespace declaration specifies a namespace to use for all
child elements of the current element that do not have a namespace
prefix associated with them.
 For instance, in the following XML document, a default declaration
for the <Customer> element is defined by using the xmlns attribute
on the parent element without specifying or attaching a prefix to the
namespace:

Default Declaration
<Customer xmlns=”http://www.eps-software.com/po”>
<Name>Travis Vandersypen</Name>
<Order>
<Product>
<Name>Hot Dog Buns</Name>
</Product>
</Order>
</Customer>

Explicit Declaration
 Sometimes, however, it may be necessary and more readable to
explicitly declare an element’s namespace.
 This is accomplished much the same way in which a default
namespace is declared, except a prefix is associated with the xmlns
attribute.

Explicit Declaration
<po:Customer xmlns:po=”http://www.eps-software.com/po”>
<po:Name>Travis Vandersypen</po:Name>
<po:Order>
<po:Product>
<po:Name>Hot Dog Buns</po:Name>
</po:Product>
</po:Order>
</po:Customer>

Identifying the Scope of Namespaces
 By default, all child elements within a parent element, unless
indicated otherwise by referencing another namespace, appear
within the parent’s namespace.
 This allows all child elements to “inherit” their parent element’s
namespace.
 However, this “inherited” namespace can be overwritten by
specifying a new namespace on a particular child element.

Identifying the Scope of Namespaces
<Customer xmlns=”http://www.eps-software.com/customer”>
<Name>Travis Vandersypen</Name>
<Order xmlns=”http://www.eps-software.com/order”>
<Product>
<Name>Hot Dog Buns</Name>
</Product>
</Order>
</Customer>

Structure of Document Type Definition
 The Document Type Declaration
 DTD Elements
 DTD Element Rules
 Content Rules
 The ANY Rule
 The EMPTY Rule
 The #PCDATA Rule
 Structure Rules
 The “Element Only” Rule
 The “Mixed” Rule
 Element Symbols

Structure of Document Type Definition
 DTD Attributes
 Attribute Types
 DTD Entities
 Predefined Entities
 External Entities
 Non-Text External Entities and Notations
 Parameter Entities
 More DTD Directives
 The IGNORE Keyword
 The INCLUDE Keyword
 Comments Within a DTD
 DTD Drawbacks and Alternatives

The Document Type Declaration
 In order to reference a DTD from an XML document, a
Document Type Declaration must be included in the XML
document.
 There may be one Document Type Declaration per XML
document. The syntax is as follows:
<!DOCTYPE rootelement
SYSTEM | PUBLIC DTDlocation [ internalDTDelements ]
>

 The exclamation mark (!) is used to signify the beginning of the
declaration.
 DOCTYPE is the keyword used to denote this as a Document
Type Definition.
 rootelement is the name of the root element or document
element of the XML document.

 SYSTEM and PUBLIC are keywords used to designate that the
DTD is contained in an external document. Although the use of
these keywords is optional, to reference an external DTD you
would have to use one or the other.
 internalDTDelements are internal DTD declarations. These
declarations will always be placed within opening ([) and
closing (]) brackets.

DTD Elements
 Each element in the DTD should be defined with the following
syntax:
<!ELEMENT elementname rule >
 ELEMENT is the tag name that specifies that this is an element
definition.
 elementname is the name of the element.
 rule is the definition to which the element’s data content must
conform.

DTD Elements
contactlist.dtd
<!ELEMENT contactlist (fullname, address, phone, email) >
<!ELEMENT fullname (#PCDATA)>
<!ELEMENT address (addressline1, addressline2)>
<!ELEMENT addressline1 (#PCDATA)>
<!ELEMENT addressline2 (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
<!ELEMENT email (#PCDATA)>

DTD Elements
The below XML document is a valid document because it follows the
rules laid for contactlist.dtd.
contactlist.xml
<?xml version=”1.0”?>
<!DOCTYPE contactlist SYSTEM “contactlist.dtd”>
<contactlist>
<fullname>Bobby Soninlaw</fullname>
<address>
<addressline1>101 South Street</addressline1>
<addressline2>Apartment #2</addressline2>
</address>
<phone>(405) 555-1234</phone>
<email>bs@mail.com</email>
</contactlist>

DTD Element Rules
 All data contained in an element must follow a set rule.
 As stated previously, the rule is the definition to which the
element’s data content must conform.
 There are two basic types of rules that elements must fall
into.
 The first type of rule deals with content.
 The second type of rule deals with structure.

Content Rules
 The content rules for elements deal with the actual data
that defined elements may contain.
 These rules include the ANY rule, the EMPTY rule, and
the #PCDATA rule.

ANY Rule
 An element may be defined using the ANY rule. The
element may contain other elements and/or normal character
data. An element using the ANY rule would appear as
follows:
<!ELEMENT elementname ANY>
 The drawback to this rule is that it is so wide open that it
defeats the purpose of validation.
 A DTD that defines all its elements using the ANY rule will
always be valid as long as the XML is well formed.Prepared By : Prabu.U

EMPTY Rule
 This rule is the exact opposite of the ANY rule. An
element that is defined with this rule will contain no data.
 However, an element with the EMPTY rule could still
contain attributes (more on attributes in a bit).
 The following element is an example of the EMPTY rule:
<!ELEMENT elementname EMPTY>

EMPTY Rule
 This concept is seen a lot in HTML. There are many tags such
as the break tag (<br />) and the paragraph tag (<p />) that
follow this rule.
 Neither one of these tags contains any data, but both are very
important in HTML documents.
 The best example of an empty tag used in HTML is the image
tag (<img>).
 Even though the image tag does not contain any data, it does
have attributes that describe the location and display of an
image for a Web browser.Prepared By : Prabu.U

#PCDATA Rule
 The #PCDATA rule indicates that parsed character data will
be contained in the element.
 Parsed character data is data that may contain normal
markup and will be interpreted and parsed by any XML
parser accessing the document.
 The following element demonstrates the #PCDATA rule:
<!ELEMENT elementname (#PCDATA)>

#PCDATA Rule
 It is possible in an element using the #PCDATA rule to use
the CDATA keyword to prevent the character data from
being parsed.
CDATA
<sample>
<data>
<![CDATA[<tag>This will not be parsed</tag>]]>
</data>
</sample>

Structure Rules
 Whereas the content rules deal with the actual content of
the data contained in defined elements, structure rules deal
with how that data may be organized.
 There are two types of structure rules.
 The first is the “element only” rule.
 The second rule is the “mixed” rule.

Element Only Rule
 The “element only” rule specifies that only elements may
appear as children of the current element.
 The child element sequences should be separated by
commas and listed in the order they should appear.
 If there are to be options for which elements will appear,
the listed elements should be separated by the pipe symbol
(|).

Element Only Rule
 The following element definition demonstrates the
“element only” rule:
<!ELEMENT elementname (element1, element2, element3)>
 The element defined here will have a single child element:
either element1 or element2.
<!ELEMENT elementname (element1 | element2)>

Mixed Rule
 The “mixed” rule is used to help define elements that may
have both character data (#PCDATA) and child elements in
the data they contain.
 A list of options or a sequential list will be enclosed by
parentheses. Options will be separated by the pipe symbol
(|), whereas sequential lists will be separated by commas.

Mixed Rule
 The following element is an example of the “mixed” rule:
<!ELEMENT elementname
(#PCDATA | childelement1 | childelement2)*>
 The asterisk symbol used in these examples indicates that
an item may occur zero or more times.
<!ELEMENT Son (#PCDATA | Name | Age)*>
 This definition defines an element, Son, for which there
may be character data, elements, or both.

Mixed Rule
 A man might have a son, but he might not.
 If there is no son, then normal character data (such as
“N/A”) could be used to describe this condition.
<Son>
N/A
</Son>
<Son>
Adopted Son
<Name>Bobby</Name>
<Age>12</Age>
</Son>

Element Symbols
 In addition to the normal rules that apply to element
definitions, element symbols can be used to control the
occurrence of data.
Symbol Definition
Asterisk (*) The data will appear zero or more times (0, 1, 2, …).
Here’s an example:
<!ELEMENT children (name*)>
Comma (,) Provides separation of elements in a sequence. Here’s an
example:
<!ELEMENT address (street, city, state, zip)>
In this example, the element address will have four child
elements: street, city, state, and zip.

Symbol Definition
Parentheses [( )] The parentheses are used to contain the rule for an
element.
Parentheses may also be used to group a sequence,
subsequence, or a set of alternatives in a rule.
<!ELEMENT address (street, city, (state |province), zip)>
Pipe (|) Separates choices in a set of options. Here’s an example:
<!ELEMENT dessert (cake | pie)>
The element dessert will have one child element: either
cake or pie.
Plus sign (+) Signifies that the data must appear one or more times (1,
2, 3, …).
<!ELEMENT appliances (refrigerator+)>
The appliances element will have one or more
refrigerator child elements.Prepared By : Prabu.U

Symbol Definition
Question mark (?) Data will appear either zero times or one time in the
element.
<!ELEMENT employment (company?)>
The element employment will have either zero occurrences
or one occurrence of the child element company.
No symbol When no symbol is used (other than parentheses), this
signifies that the data must appear once in the XML file.
<!ELEMENT contact (name)>
The element contact will have one child element: name.

Limited Use of Symbols
<!ELEMENT contactlist (contact) >
<!ELEMENT contact (name, age, sex, address, city, state, zip, children) >
<!ELEMENT name (#PCDATA) >
<!ELEMENT age (#PCDATA) >
<!ELEMENT sex (#PCDATA) >
<!ELEMENT address (#PCDATA) >
<!ELEMENT city (#PCDATA) >
<!ELEMENT state (#PCDATA) >
<!ELEMENT zip (#PCDATA) >
<!ELEMENT children (child) >
<!ELEMENT child (childname, childage, childsex) >
<!ELEMENT childname (#PCDATA) >
<!ELEMENT childage (#PCDATA) >
<!ELEMENT childsex (#PCDATA) >Prepared By : Prabu.U

Broader Use of Symbols
<!ELEMENT contactlist (contact+) >
<!ELEMENT contact (name, age?, sex, address?, city?, state?, zip?, children?) >
<!ELEMENT name (#PCDATA) >
<!ELEMENT age (#PCDATA) >
<!ELEMENT sex (#PCDATA) >
<!ELEMENT address (#PCDATA) >
<!ELEMENT city (#PCDATA) >
<!ELEMENT state (#PCDATA) >
<!ELEMENT zip (#PCDATA) >
<!ELEMENT children (child*) >
<!ELEMENT child (childname, childage?, childsex) >
<!ELEMENT childname (#PCDATA) >
<!ELEMENT childage (#PCDATA) >
<!ELEMENT childsex (#PCDATA) >Prepared By : Prabu.U

DTD Attributes
XML attributes are name/value pairs that are used as
metadata to describe XML elements.
XML attributes are very similar to HTML attributes.
In HTML, src is an attribute of the img tag, as shown in the
following example:
<img src=”images/imagename.gif” width=”10” height=”20”>

DTD Attributes
<image src=”images/” width=”10” height=”20”>
imagename.gif
</image>
 src, width, and height are presented as attributes of the XML
element image.
 This is very similar to the way that these attributes are used in
HTML.
 The only difference is that the src attribute merely contains the
relative path of the image’s directory and not the actual name of
the image file. Prepared By : Prabu.U

Attribute list declarations in a DTD
 Attribute list declarations in a DTD will have the following
syntax:
<!ATTLIST elementname attributename type defaultbehavior
defaultvalue>
 ATTLIST is the tag name that specifies that this definition will
be for an attribute list.
 elementname is the name of the element that the attribute will
be attached to.

Attribute list declarations in a DTD
 attributename is the actual name of the attribute.
 type indicates which of the 10 valid kinds of attributes this
attribute definition will be.
 defaultbehavior dictates whether the attribute will be required,
optional, or fixed in value. This setting determines how a
validating parser should relate to this attribute.
 defaultvalue is the value of the attribute if no value is explicitly
set.

ATTLIST Declaration
<!ATTLIST name
sex CDATA #REQUIRED
age CDATA #IMPLIED
race CDATA #IMPLIED >
 The three attributes are character data (CDATA).
 Only one of the attributes, sex, is required (#REQUIRED).
 The other two attributes, age and race, are optional
(#IMPLIED)

ATTLIST Declaration
 An XML element using the attribute list declared here would
appear as follows:
<name sex=”male” age=”30” race=”Caucasian”>Michael
Qualls</name>
 The name element contains the value “Michael Qualls”. It also
has three attributes of Michael Qualls: sex, age, and race.

DTD Entities
 Entities in DTDs are storage units. They can also be considered
placeholders.
 Entities are special markups that contain content for insertion
into the XML document.
 An entity’s content could be well-formed XML, normal text,
binary data, a database record, and so on.
 The main purpose of an entity is to hold content, and there is
virtually no limit on the type of content an entity can hold.

DTD Entities
 The general syntax of an entity is as follows:
<!ENTITY entityname [SYSTEM | PUBLIC] entitycontent>
 ENTITY is the tag name that specifies that this definition will be for an entity.
 entityname is the name by which the entity will be referred in the XML
document.
 entitycontent is the actual contents of the entity—the data for which the entity
is serving as a placeholder.
 SYSTEM and PUBLIC are optional keywords. Either one can be added to the
definition of an entity to indicate that the entity refers to external content.

Using Internal Entities
<!DOCTYPE library [
<!ENTITY cpy “Copyright 2000”>
<!ELEMENT library (book+)>
<!ELEMENT book (title,author,copyright)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ELEMENT copyright (#PCDATA)>
]>

Using Internal Entities
<library>
<book>
<title>How to Win Friends</title>
<author>Joe Charisma</author>
<copyright>&cpy;</copyright>
</book>
<book>
<title>Make Money Fast</title>
<author>Jimmy QuickBuck</author>
<copyright>&cpy;</copyright>
</book>
</library>

Predefined Entities
Entity Content
& &
< <
> >
" “
' ‘

Using Predefined Entities
<icecream>
<flavor>Cherry Garcia</flavor>
<vendor>Ben & Jerry’s</vendor>
</icecream>

External Entities
External entities are used to reference external
content.
External entities get their content by referencing
it via a URL placed in the entitycontent portion of
the entity declaration.

Using External Entities
<!DOCTYPE employees [
<!ENTITY bob SYSTEM “http://srvr/emps/bob.xml”>
<!ENTITY nancy SYSTEM “http://srvr/emps/nancy.xml”>
<!ELEMENT employees (clerk)>
<!ELEMENT clerk (#PCDATA)>
]>
<employees>
<clerk>&bob;</clerk>
<clerk>&nancy;</clerk>
</employees>

Non-Text External Entities and Notations
<!NOTATION notationname [SYSTEM | PUBLIC ] dataformat>
<!ENTITY myimage SYSTEM “myimage.gif” NDATA gif>
The NDATA keyword is used to alert the parser that the entity
content should be sent unparsed to the output document.

Using External Non-Text Entities
<!NOTATION gif SYSTEM “image/gif” >
<!ENTITY employeephoto SYSTEM
“images/employees/MichaelQ.gif” NDATA gif >
<!ELEMENT employee (name, sex, title, years) >
<!ATTLIST employee pic ENTITY #IMPLIED >
…
<employee pic=”employeephoto”>
…
</employee> Prepared By : Prabu.U

Parameter Entities
 Parameter entities can be useful when you have to use a
lot of repetitive or lengthy text in a DTD.
<!ENTITY % entityname entitycontent>

Using Parameter Entities
<!ENTITY % pc “(#PCDATA)”>
<!ELEMENT name %pc;>
<!ELEMENT age %pc;>
<!ELEMENT weight %pc;>

More DTD Directives
The IGNORE Keyword
<![ IGNORE
This is the part of the DTD ignored
]]>
<!ELEMENT Employee
<![ IGNORE (#PCDATA) ]]> (Name, Address, Phone) >

More DTD Directives
The INCLUDE Keyword
<![ INCLUDE
This is the part of the DTD included
]]>
Comments Within a DTD

<!ELEMENT rootelement (element1, element2)>
<!ELEMENT element1 (#PCDATA)>

Drawbacks of DTD
 First and foremost, DTDs are composed of non-XML
syntax. Given that one of the central tenets of XML is that
it be totally extensible, it may not seem to make a lot of
sense that this is the case for DTDs.
 There can only be a single DTD per document. It is true
that there can be internal and external subsets of DTDs,
but there can only be a single DTD referenced
 DTDS are not object oriented. There is no inheritance in
DTDs. DTDs do not support namespaces very well.

Drawbacks of DTD
 For a namespace to be used, the entire namespace must be
defined within the DTD.
 DTDs have weak data typing and no support for the XML
DOM.
 Finally, and possibly most important from a security
standpoint, is the ability of the internal DTD subset to
override the external DTD subset.

XML Schema
The XML Schema Definition Language is an
XML language for describing and constraining
the content of XML documents.
XML Schema is a W3C recommendation.
XML Schema defines what it means for an XML
document to be valid.

Usage of XML Schemas
It support data types.
It uses XML syntax.
It has secure data communication.
XML Schemas are extensible.
When using XML Schemas most of the errors can be
taken care of by the validating software.

XML Schema Definition
Elements that can appear in a document.
Attributes that can appear in a document.
The elements that are child elements.
The order of child elements.
The number of child elements.
The criteria whether an element is empty or can
include text.
Data types for elements and attributes.
Default and fixed values for elements and
attributes.

XML Schema Data Types
XSD String: String data types are used for values that
contains character strings.
XSD Date: Date and time data types are used for
values that contain date and time.
XSD Numeric: Numeric data types are used for
numeric values.
XSD Misc: Other miscellaneous data types like
boolean, base64Binary, hexBinary, float, double etc.

Types of Indicators
i) Order indicators
a) All Indicator
b)Choice Indicator
c) Sequence Indicator
ii) Occurrence Indicators
iii) Group Indicators

Order indicators
a) All Indicator
The <all> indicator specifies, by default, that the child
elements can appear in any order and that each child
element must occur once and only once.
<xs:element name="book">
<xs:complexType>
<xs:all>
<xs:element name="title" type="xs:string"/>
<xs:element name="author" type="xs:string"/>
</xs:all>
</xs:complexType>
</xs:element>

Order indicators
b)Choice Indicator
The <choice> indicator specifies that either one child
element or another can occur.
<xs:element name="person">
<xs:complexType>
<xs:choice>
<xs:element name="employee" type="employee"/>
<xs:element name="member" type="member"/>
</xs:choice>
</xs:complexType>
</xs:element>

Order indicators
c) Sequence Indicator
The <sequence> indicator specifies that the child
elements must appear in a specific order.
<xs:complexType>
<xs:sequence>
</xs:sequence>
</xs:complexType>
</xs:element>

Occurrence indicators
 Occurrence indicators are used to define how often an element can
occur. For Order and group indicators, the default value for maxOccurs
and minOccurs is 1.
 The <maxOccurs> indicator specifies the maximum number of times an
element can occur.
 The <minOccurs> indicator specifies the minimum number of times an
element can occur.
<xs:complexType>
<xs:sequence>
<xs:element name="vendor" type="xs:string" maxOccurs="2"
minOccurs="0"/>
</xs:sequence>
</xs:complexType>
</xs:element> Prepared By : Prabu.U

Group indicators
Group indicators are used to define related sets of
elements. Element groups are defined with group
declaration.
<xs:group name="persongroup">
<xs:sequence>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="lasstname" type="xs:string"/>
<xs:element name="birthday" type="xs:string"/>
</xs:sequence>
</xs:group>

companyschema.xsd
<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified"
elementFormDefault="qualified"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="employees">
<xs:complexType>
<xs:sequence>
<xs:element maxOccurs="unbounded" name="employee">
<xs:complexType>
<xs:sequence>
<xs:element name="Name" type="xs:string" />
<xs:element name="Salary" type="xs:unsignedShort" />
<xs:element name="ProId" type="xs:unsignedByte" />
<xs:element name="ProName" type="xs:string" />
<xs:element name="SBUName" type="xs:string" />
<xs:element name="SBULoc" type="xs:string" />

</xs:sequence>
<xs:attribute name="code" type="xs:unsignedByte" use="required" />
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>

companyschema.xml
<?xml version="1.0" encoding="utf-8"?>
<employees>
<employee code="001">
<Name>Arun</Name>
<Salary>20000</Salary>
<ProId>101</ProId>
<ProName>Mainframe</ProName>
<SBUName>Programmer</SBUName>
<SBULoc>Chennai</SBULoc>
</employee>
</employees>

Introduction to XML

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introduction to XML

Similar to Introduction to XML (20)

More from Prabu U

More from Prabu U (20)

Recently uploaded

Recently uploaded (20)

Introduction to XML