1. INTRODUCTION TO XML
XML document structure – Well-formed and
valid documents – Namespaces – DTD – XML
Schema – X-Files
Prepared By : Prabu.U
2. XML is not…
A replacement for HTML
(but HTML can be generated from XML)
A presentation format
(but XML can be converted into one)
A programming language
(but it can be used with almost any language)
A network transfer protocol
(but XML may be transferred over a network)
A database
(but XML may be stored into a database)
Prepared By : Prabu.U
3. XML by Example
<article>
<author>Gerhard Weikum</author>
<title>The Web in 10 Years</title>
</article>
Easy to understand for human users
Very expressive (semantics along with the data)
Well structured, easy to read and write from
programs
This looks nice, but…
Prepared By : Prabu.U
4. XML by Example
<t108>
<x87>Gerhard Weikum</x87>
<g10>The Web in 10 Years</g10>
</t108>
Hard to understand for human users
Not expressive (no semantics along with the data)
Well structured, easy to read and write from
programs
… this is XML, too:
Prepared By : Prabu.U
5. XML by Example
<data>
ch37fhgks73j5mv9d63h5mgfkds8d984lgnsmcns983
</data>
Impossible to understand for human users
Not expressive (no semantics along with the data)
Unstructured, read and write only with special
programs
… and what about this XML document:
The actual benefit of using XML highly depends on the
design of the application.
Prepared By : Prabu.U
6. Possible Advantages of Using XML
Truly Portable Data
Easily readable by human users
Very expressive (semantics near data)
Very flexible and customizable (no finite tag set)
Easy to use from programs (libs available)
Easy to convert into other representations
(XML transformation languages)
Many additional standards and tools
Widely used and supported
Prepared By : Prabu.U
7. XML DOCUMENT STRUCTURE
XML Declaration
Document Type Declaration
Elements data
Attributes data
XML Content.
Prepared By : Prabu.U
8. XML Document Structure
The major portions of an XML document
include the following:
The XML declaration
The Document Type Declaration
The element data
The attribute data
The character data or XML content
Prepared By : Prabu.U
9. Components of XML Declaration
Component Description
<?xml Starts the beginning of the processing instruction (in this case,
for the XML declaration).
Version=”xxx” Describes the specific version of XML being used in the
document (in this case, version 1.0 of the W3C specification).
Future iterations could be 2.0, 1.1, and so on.
standalone=”xxx” This standalone option defines whether documents are allowed
to contain external markup declarations. This option can be set
to “yes” or “no”.
encoding=”xxx” Indicates the character encoding that the document uses. The
default is “US-ASCII” but can be set to any value that XML
processors recognize and can support. The most common
alternate setting is “UTF-8”.
Prepared By : Prabu.U
10. Valid XML Declaration
<?xml version=”1.0” standalone=”yes”?>
<?xml version=”1.0” standalone=”no”?>
<?xml version=”1.0” encoding=”UTF-8” standalone=”no”?>
Prepared By : Prabu.U
11. Document Type Declaration
The Document Type Declaration (DOCTYPE) gives a name to the
XML content and provides a means to guarantee the document’s
validity, either by including or specifying a link to a Document Type
Definition (DTD).
A DOCTYPE can identify the constraints on the validity of the
document by making a reference to an external DTD subset and/or
include the DTD internally within the document by means of an
internal DTD subset.
Prepared By : Prabu.U
12. Document Type Declaration
The general forms of Document Type Declarations follow the forms
identified
<!DOCTYPE NAME SYSTEM “file”>
<!DOCTYPE NAME [ ]>
<!DOCTYPE NAME SYSTEM “file” [ ]>
Prepared By : Prabu.U
13. Components of Document Type Declaration
Component Description
< The start of the XML tag (in this case, the beginning of the
Document Type Declaration).
!DOCTYPE The beginning of the Document Type Declaration.
NAME Specifies the name of the document type being defined.
This must comply with XML naming rules.
SYSTEM Specifies that the following system identifier will be read
and processed.
“file” Specifies the name of the file to be processed by the system.
[ Starts an internal DTD subset.
] Ends the internal DTD subset.
> The end of the XML tag (in this case, the end of the
Document Type Declaration).
Prepared By : Prabu.U
14. XML Elements
XML elements are either a matched pair of XML tags or single XML
tags that are “self-closing.”
Matching XML tags consist of markup tags that contain the same
content, except that the ending tag is prefixed with a forward slash.
For example, shirt element begins with <shirt> and ends with
</shirt>.
Everything between these tags is additional XML text that has either
been defined by a DTD or can exist by virtue of the document
merely being well formed.
Prepared By : Prabu.U
15. XML Attributes
Within elements, additional information can be communicated to
XML processors that modify the nature of the encapsulated content.
For example, we may have specified a <price> element, but how do
we know what currency this applies to?
Although it’s possible to create a <currency> subtag, another more
viable approach is to use an attribute.
Prepared By : Prabu.U
16. XML Attributes Examples
Attributes are name/value pairs contained within the start element
that can specify text strings that modify the context of the element.
An example of possible attributes for shirt is shown below.
<price currency=”USD”>…</price>
<on_sale start_date=”10-15-2001”/>
Prepared By : Prabu.U
17. Significant Features of Attributes
One of the significant features of attributes is that the content
described by them can follow strict rules as to their value.
Attributes can be required, optional, or contain a fixed value.
Required or optional attributes can either contain freeform text or
contain one of a set list of enumerated values.
Fixed attributes, if present, must contain a specific value.
Prepared By : Prabu.U
18. Entity References
Entity references are delimited by an ampersand at the beginning
and a semicolon at the ending. The content contained between the
delimiters is the entity that will be replaced.
For example, the < entity inserts the less-than sign (<) into a
document.
Prepared By : Prabu.U
19. Comments
Comments are quite simple to include in a document.
The character sequence <!-- begins a comment and --> ends the
comment.
Comments can be placed anywhere in a document and are not
considered to be part of the textual content of an XML document.
Example:
<!-- The below element talks about an Elephant I once owned... -->
<animal>Elephant</animal>
Prepared By : Prabu.U
20. Processing Instructions
Processing instructions (PIs) perform a similar function as
comments in that they are not a textual part of an XML document
but provide information to applications as to how the content
should be processed.
Unlike comments, XML processors are required to pass along PIs.
Processing instructions have the following form:
<?instruction options?>
Prepared By : Prabu.U
21. Processing Instructions
The instruction name, called the PI target, is a special identifier that
the processing application is intended to understand.
Example of a Processing Instruction
<?send-message “process complete”?>
Prepared By : Prabu.U
22. XML Content Model
A content model provides a framework around which the
extensibility features of XML can be taken advantage of, if at all.
At the very least, the model provides an indication of the intent of
the document creator as to the explicit extensibility of the document,
because users can extend a document using an internal DTD subset
if they are so inclined.
However, by doing so, the users are “overriding” the content model
as intended by the document creator.
Prepared By : Prabu.U
23. Open Content Model
An “open” content model enables a user to add additional elements
and attributes to a document without them having to be explicitly
declared in a DTD or schema.
In an open content model, users can take full advantage of the
extensibility of XML without having to make changes to a DTD. As
expected, the use of a DTD precludes an open content model.
In fact, you cannot have an open content model when using a DTD,
except if a user chooses to override the DTD by using an internal
DTD subset.
However, new schema formats, such as XML Schema, provide this
mechanism.
Prepared By : Prabu.U
24. Closed Content Model
A “closed” content model restricts elements and attributes to only
those that are specified in the DTD or schema.
By definition, a DTD is a closed content model because it describes
what may exclusively appear in the content of the element.
In a closed model, the XML document creator maintains strict
control of specifically which elements and attributes as well as the
order that markup may appear in a given compliant document.
Closed models are helpful when you’re enforcing strict document
exchange and provide a means to guarantee that incoming data
complies with data requirements.Prepared By : Prabu.U
25. Mixed Content Model
A “mixed” content model enables individual elements to allow an
arbitrary mixture of text and additional elements.
These mixed elements are useful when freeform fields, with possible
XML or other markup data are to be included.
This allows the majority of the document to remain closed while
portions of the document are noted as extensible.
Mixed models represent a good compromise that can allow for
strictness while providing a limited means for extensibility.
Prepared By : Prabu.U
26. Handling Whitespaces in XML
Whitespace is the term used for character spaces, tabs, linefeeds, and
carriage returns in documents.
Issues around the handling of these seemingly “invisible” characters
are important for many reasons.
It is hard to tell whether whitespace should be ignored or passed “as
is” to documents
Prepared By : Prabu.U
27. Significance of Whitespaces
XML processors can determine whether whitespace is significant is
by knowing the content model of the XML document.
Basically, in a mixed content model, whitespace is significant
because the application is not sure as to whether or not the
whitespace will be used in processing, but in an open or closed
model, it is not.
However, the rule for XML processors is that they must pass all
characters that are not markup intact to the application.
Validating processors also inform applications about the significance
of the various whitespace characters.Prepared By : Prabu.U
28. Rules of XML Structure
All xml elements must have a closing tag
Xml tags are case sensitive
All xml elements must have proper nesting
All xml documents must contain a single root element
Attribute values must be quoted
Attributes may only appear once in the same start tag
Attribute values cannot contain references to external entities
All entities except amp, lt, gt, apos, and quot must be declared
before they are used
Prepared By : Prabu.U
29. Well-Formed XML?
• No, CHILD2 and CHILD3 do not nest properly
<?xml Version=“1.0” ?>
<PARENT>
<CHILD1>This is element 1</CHILD1>
<CHILD2><CHILD3>Number 3</CHILD2></CHILD3>
</PARENT>
Prepared By : Prabu.U
30. Well-Formed XML?
• No, there are two root elements
<?xml Version=“1.0” ?>
<PARENT>
<CHILD1>This is element 1</CHILD1>
</PARENT>
<PARENT>
<CHILD1>This is another element 1</CHILD1>
</PARENT>
Prepared By : Prabu.U
31. Well-Formed XML?
• Yes
<?xml Version=“1.0” ?>
<PARENT>
<CHILD1>This is element 1</CHILD1>
<CHILD2/>
<CHILD3></CHILD3>
</PARENT>
Prepared By : Prabu.U
32. NAMESPACES
In XML the namespace is defined by using an attribute that is called
xmlns.
The main rules for XML namespace definition are:
The value of the attribute specifies the location that the namespace
could be found.
The namespace attribute can appear in the start-tag of an element, in
which case all children of that element would be associated to the
same namespace.
The namespace can appear in the root element, in which case the
whole document would be associated to the defined namespace.Prepared By : Prabu.U
33. NAMESPACES
Within an XML document, namespaces can be declared using one of
two methods: a default declaration or an explicit declaration.
Which method to use is completely open and left up to you; either
way will suffice.
Prepared By : Prabu.U
34. Default Declaration
A default namespace declaration specifies a namespace to use for all
child elements of the current element that do not have a namespace
prefix associated with them.
For instance, in the following XML document, a default declaration
for the <Customer> element is defined by using the xmlns attribute
on the parent element without specifying or attaching a prefix to the
namespace:
Prepared By : Prabu.U
36. Explicit Declaration
Sometimes, however, it may be necessary and more readable to
explicitly declare an element’s namespace.
This is accomplished much the same way in which a default
namespace is declared, except a prefix is associated with the xmlns
attribute.
Prepared By : Prabu.U
38. Identifying the Scope of Namespaces
By default, all child elements within a parent element, unless
indicated otherwise by referencing another namespace, appear
within the parent’s namespace.
This allows all child elements to “inherit” their parent element’s
namespace.
However, this “inherited” namespace can be overwritten by
specifying a new namespace on a particular child element.
Prepared By : Prabu.U
39. Identifying the Scope of Namespaces
<Customer xmlns=”http://www.eps-software.com/customer”>
<Name>Travis Vandersypen</Name>
<Order xmlns=”http://www.eps-software.com/order”>
<Product>
<Name>Hot Dog Buns</Name>
</Product>
</Order>
</Customer>
Prepared By : Prabu.U
40. Structure of Document Type Definition
The Document Type Declaration
DTD Elements
DTD Element Rules
Content Rules
The ANY Rule
The EMPTY Rule
The #PCDATA Rule
Structure Rules
The “Element Only” Rule
The “Mixed” Rule
Element Symbols
Prepared By : Prabu.U
41. Structure of Document Type Definition
DTD Attributes
Attribute Types
DTD Entities
Predefined Entities
External Entities
Non-Text External Entities and Notations
Parameter Entities
More DTD Directives
The IGNORE Keyword
The INCLUDE Keyword
Comments Within a DTD
DTD Drawbacks and Alternatives
Prepared By : Prabu.U
42. The Document Type Declaration
In order to reference a DTD from an XML document, a
Document Type Declaration must be included in the XML
document.
There may be one Document Type Declaration per XML
document. The syntax is as follows:
<!DOCTYPE rootelement
SYSTEM | PUBLIC DTDlocation [ internalDTDelements ]
>
Prepared By : Prabu.U
43. The Document Type Declaration
The exclamation mark (!) is used to signify the beginning of the
declaration.
DOCTYPE is the keyword used to denote this as a Document
Type Definition.
rootelement is the name of the root element or document
element of the XML document.
Prepared By : Prabu.U
44. The Document Type Declaration
SYSTEM and PUBLIC are keywords used to designate that the
DTD is contained in an external document. Although the use of
these keywords is optional, to reference an external DTD you
would have to use one or the other.
internalDTDelements are internal DTD declarations. These
declarations will always be placed within opening ([) and
closing (]) brackets.
Prepared By : Prabu.U
45. DTD Elements
Each element in the DTD should be defined with the following
syntax:
<!ELEMENT elementname rule >
ELEMENT is the tag name that specifies that this is an element
definition.
elementname is the name of the element.
rule is the definition to which the element’s data content must
conform.
Prepared By : Prabu.U
47. DTD Elements
The below XML document is a valid document because it follows the
rules laid for contactlist.dtd.
contactlist.xml
<?xml version=”1.0”?>
<!DOCTYPE contactlist SYSTEM “contactlist.dtd”>
<contactlist>
<fullname>Bobby Soninlaw</fullname>
<address>
<addressline1>101 South Street</addressline1>
<addressline2>Apartment #2</addressline2>
</address>
<phone>(405) 555-1234</phone>
<email>bs@mail.com</email>
</contactlist>
Prepared By : Prabu.U
48. DTD Element Rules
All data contained in an element must follow a set rule.
As stated previously, the rule is the definition to which the
element’s data content must conform.
There are two basic types of rules that elements must fall
into.
The first type of rule deals with content.
The second type of rule deals with structure.
Prepared By : Prabu.U
49. Content Rules
The content rules for elements deal with the actual data
that defined elements may contain.
These rules include the ANY rule, the EMPTY rule, and
the #PCDATA rule.
Prepared By : Prabu.U
50. ANY Rule
An element may be defined using the ANY rule. The
element may contain other elements and/or normal character
data. An element using the ANY rule would appear as
follows:
<!ELEMENT elementname ANY>
The drawback to this rule is that it is so wide open that it
defeats the purpose of validation.
A DTD that defines all its elements using the ANY rule will
always be valid as long as the XML is well formed.Prepared By : Prabu.U
51. EMPTY Rule
This rule is the exact opposite of the ANY rule. An
element that is defined with this rule will contain no data.
However, an element with the EMPTY rule could still
contain attributes (more on attributes in a bit).
The following element is an example of the EMPTY rule:
<!ELEMENT elementname EMPTY>
Prepared By : Prabu.U
52. EMPTY Rule
This concept is seen a lot in HTML. There are many tags such
as the break tag (<br />) and the paragraph tag (<p />) that
follow this rule.
Neither one of these tags contains any data, but both are very
important in HTML documents.
The best example of an empty tag used in HTML is the image
tag (<img>).
Even though the image tag does not contain any data, it does
have attributes that describe the location and display of an
image for a Web browser.Prepared By : Prabu.U
53. #PCDATA Rule
The #PCDATA rule indicates that parsed character data will
be contained in the element.
Parsed character data is data that may contain normal
markup and will be interpreted and parsed by any XML
parser accessing the document.
The following element demonstrates the #PCDATA rule:
<!ELEMENT elementname (#PCDATA)>
Prepared By : Prabu.U
54. #PCDATA Rule
It is possible in an element using the #PCDATA rule to use
the CDATA keyword to prevent the character data from
being parsed.
CDATA
<sample>
<data>
<![CDATA[<tag>This will not be parsed</tag>]]>
</data>
</sample>
Prepared By : Prabu.U
55. Structure Rules
Whereas the content rules deal with the actual content of
the data contained in defined elements, structure rules deal
with how that data may be organized.
There are two types of structure rules.
The first is the “element only” rule.
The second rule is the “mixed” rule.
Prepared By : Prabu.U
56. Element Only Rule
The “element only” rule specifies that only elements may
appear as children of the current element.
The child element sequences should be separated by
commas and listed in the order they should appear.
If there are to be options for which elements will appear,
the listed elements should be separated by the pipe symbol
(|).
Prepared By : Prabu.U
57. Element Only Rule
The following element definition demonstrates the
“element only” rule:
<!ELEMENT elementname (element1, element2, element3)>
The element defined here will have a single child element:
either element1 or element2.
<!ELEMENT elementname (element1 | element2)>
Prepared By : Prabu.U
58. Mixed Rule
The “mixed” rule is used to help define elements that may
have both character data (#PCDATA) and child elements in
the data they contain.
A list of options or a sequential list will be enclosed by
parentheses. Options will be separated by the pipe symbol
(|), whereas sequential lists will be separated by commas.
Prepared By : Prabu.U
59. Mixed Rule
The following element is an example of the “mixed” rule:
<!ELEMENT elementname
(#PCDATA | childelement1 | childelement2)*>
The asterisk symbol used in these examples indicates that
an item may occur zero or more times.
<!ELEMENT Son (#PCDATA | Name | Age)*>
This definition defines an element, Son, for which there
may be character data, elements, or both.
Prepared By : Prabu.U
60. Mixed Rule
A man might have a son, but he might not.
If there is no son, then normal character data (such as
“N/A”) could be used to describe this condition.
<Son>
N/A
</Son>
<Son>
Adopted Son
<Name>Bobby</Name>
<Age>12</Age>
</Son>
Prepared By : Prabu.U
61. Element Symbols
In addition to the normal rules that apply to element
definitions, element symbols can be used to control the
occurrence of data.
Symbol Definition
Asterisk (*) The data will appear zero or more times (0, 1, 2, …).
Here’s an example:
<!ELEMENT children (name*)>
Comma (,) Provides separation of elements in a sequence. Here’s an
example:
<!ELEMENT address (street, city, state, zip)>
In this example, the element address will have four child
elements: street, city, state, and zip.
Prepared By : Prabu.U
62. Symbol Definition
Parentheses [( )] The parentheses are used to contain the rule for an
element.
Parentheses may also be used to group a sequence,
subsequence, or a set of alternatives in a rule.
Here’s an example:
<!ELEMENT address (street, city, (state |province), zip)>
Pipe (|) Separates choices in a set of options. Here’s an example:
<!ELEMENT dessert (cake | pie)>
The element dessert will have one child element: either
cake or pie.
Plus sign (+) Signifies that the data must appear one or more times (1,
2, 3, …).
Here’s an example:
<!ELEMENT appliances (refrigerator+)>
The appliances element will have one or more
refrigerator child elements.Prepared By : Prabu.U
63. Symbol Definition
Question mark (?) Data will appear either zero times or one time in the
element.
Here’s an example:
<!ELEMENT employment (company?)>
The element employment will have either zero occurrences
or one occurrence of the child element company.
No symbol When no symbol is used (other than parentheses), this
signifies that the data must appear once in the XML file.
Here’s an example:
<!ELEMENT contact (name)>
The element contact will have one child element: name.
Prepared By : Prabu.U
64. Limited Use of Symbols
<!ELEMENT contactlist (contact) >
<!ELEMENT contact (name, age, sex, address, city, state, zip, children) >
<!ELEMENT name (#PCDATA) >
<!ELEMENT age (#PCDATA) >
<!ELEMENT sex (#PCDATA) >
<!ELEMENT address (#PCDATA) >
<!ELEMENT city (#PCDATA) >
<!ELEMENT state (#PCDATA) >
<!ELEMENT zip (#PCDATA) >
<!ELEMENT children (child) >
<!ELEMENT child (childname, childage, childsex) >
<!ELEMENT childname (#PCDATA) >
<!ELEMENT childage (#PCDATA) >
<!ELEMENT childsex (#PCDATA) >Prepared By : Prabu.U
65. Broader Use of Symbols
<!ELEMENT contactlist (contact+) >
<!ELEMENT contact (name, age?, sex, address?, city?, state?, zip?, children?) >
<!ELEMENT name (#PCDATA) >
<!ELEMENT age (#PCDATA) >
<!ELEMENT sex (#PCDATA) >
<!ELEMENT address (#PCDATA) >
<!ELEMENT city (#PCDATA) >
<!ELEMENT state (#PCDATA) >
<!ELEMENT zip (#PCDATA) >
<!ELEMENT children (child*) >
<!ELEMENT child (childname, childage?, childsex) >
<!ELEMENT childname (#PCDATA) >
<!ELEMENT childage (#PCDATA) >
<!ELEMENT childsex (#PCDATA) >Prepared By : Prabu.U
66. DTD Attributes
XML attributes are name/value pairs that are used as
metadata to describe XML elements.
XML attributes are very similar to HTML attributes.
In HTML, src is an attribute of the img tag, as shown in the
following example:
<img src=”images/imagename.gif” width=”10” height=”20”>
Prepared By : Prabu.U
67. DTD Attributes
<image src=”images/” width=”10” height=”20”>
imagename.gif
</image>
src, width, and height are presented as attributes of the XML
element image.
This is very similar to the way that these attributes are used in
HTML.
The only difference is that the src attribute merely contains the
relative path of the image’s directory and not the actual name of
the image file. Prepared By : Prabu.U
68. Attribute list declarations in a DTD
Attribute list declarations in a DTD will have the following
syntax:
<!ATTLIST elementname attributename type defaultbehavior
defaultvalue>
ATTLIST is the tag name that specifies that this definition will
be for an attribute list.
elementname is the name of the element that the attribute will
be attached to.
Prepared By : Prabu.U
69. Attribute list declarations in a DTD
attributename is the actual name of the attribute.
type indicates which of the 10 valid kinds of attributes this
attribute definition will be.
defaultbehavior dictates whether the attribute will be required,
optional, or fixed in value. This setting determines how a
validating parser should relate to this attribute.
defaultvalue is the value of the attribute if no value is explicitly
set.
Prepared By : Prabu.U
70. ATTLIST Declaration
<!ATTLIST name
sex CDATA #REQUIRED
age CDATA #IMPLIED
race CDATA #IMPLIED >
The three attributes are character data (CDATA).
Only one of the attributes, sex, is required (#REQUIRED).
The other two attributes, age and race, are optional
(#IMPLIED)
Prepared By : Prabu.U
71. ATTLIST Declaration
An XML element using the attribute list declared here would
appear as follows:
<name sex=”male” age=”30” race=”Caucasian”>Michael
Qualls</name>
The name element contains the value “Michael Qualls”. It also
has three attributes of Michael Qualls: sex, age, and race.
Prepared By : Prabu.U
72. DTD Entities
Entities in DTDs are storage units. They can also be considered
placeholders.
Entities are special markups that contain content for insertion
into the XML document.
An entity’s content could be well-formed XML, normal text,
binary data, a database record, and so on.
The main purpose of an entity is to hold content, and there is
virtually no limit on the type of content an entity can hold.
Prepared By : Prabu.U
73. DTD Entities
The general syntax of an entity is as follows:
<!ENTITY entityname [SYSTEM | PUBLIC] entitycontent>
ENTITY is the tag name that specifies that this definition will be for an entity.
entityname is the name by which the entity will be referred in the XML
document.
entitycontent is the actual contents of the entity—the data for which the entity
is serving as a placeholder.
SYSTEM and PUBLIC are optional keywords. Either one can be added to the
definition of an entity to indicate that the entity refers to external content.
Prepared By : Prabu.U
74. Using Internal Entities
<?xml version=”1.0”?>
<!DOCTYPE library [
<!ENTITY cpy “Copyright 2000”>
<!ELEMENT library (book+)>
<!ELEMENT book (title,author,copyright)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ELEMENT copyright (#PCDATA)>
]>
Prepared By : Prabu.U
75. Using Internal Entities
<library>
<book>
<title>How to Win Friends</title>
<author>Joe Charisma</author>
<copyright>&cpy;</copyright>
</book>
<book>
<title>Make Money Fast</title>
<author>Jimmy QuickBuck</author>
<copyright>&cpy;</copyright>
</book>
</library>
Prepared By : Prabu.U
78. External Entities
External entities are used to reference external
content.
External entities get their content by referencing
it via a URL placed in the entitycontent portion of
the entity declaration.
Prepared By : Prabu.U
79. Using External Entities
<?xml version=”1.0”?>
<!DOCTYPE employees [
<!ENTITY bob SYSTEM “http://srvr/emps/bob.xml”>
<!ENTITY nancy SYSTEM “http://srvr/emps/nancy.xml”>
<!ELEMENT employees (clerk)>
<!ELEMENT clerk (#PCDATA)>
]>
<employees>
<clerk>&bob;</clerk>
<clerk>&nancy;</clerk>
</employees>
Prepared By : Prabu.U
80. Non-Text External Entities and Notations
<!NOTATION notationname [SYSTEM | PUBLIC ] dataformat>
<!ENTITY myimage SYSTEM “myimage.gif” NDATA gif>
The NDATA keyword is used to alert the parser that the entity
content should be sent unparsed to the output document.
Prepared By : Prabu.U
81. Using External Non-Text Entities
<!NOTATION gif SYSTEM “image/gif” >
<!ENTITY employeephoto SYSTEM
“images/employees/MichaelQ.gif” NDATA gif >
<!ELEMENT employee (name, sex, title, years) >
<!ATTLIST employee pic ENTITY #IMPLIED >
…
<employee pic=”employeephoto”>
…
</employee> Prepared By : Prabu.U
82. Parameter Entities
Parameter entities can be useful when you have to use a
lot of repetitive or lengthy text in a DTD.
<!ENTITY % entityname entitycontent>
Prepared By : Prabu.U
83. Using Parameter Entities
<!ENTITY % pc “(#PCDATA)”>
<!ELEMENT name %pc;>
<!ELEMENT age %pc;>
<!ELEMENT weight %pc;>
Prepared By : Prabu.U
84. More DTD Directives
The IGNORE Keyword
<![ IGNORE
This is the part of the DTD ignored
]]>
<!ELEMENT Employee
<![ IGNORE (#PCDATA) ]]> (Name, Address, Phone) >
Prepared By : Prabu.U
85. More DTD Directives
The INCLUDE Keyword
<![ INCLUDE
This is the part of the DTD included
]]>
Comments Within a DTD
<!-- This is a comment -->
<!ELEMENT rootelement (element1, element2)>
<!ELEMENT element1 (#PCDATA)>
Prepared By : Prabu.U
86. Drawbacks of DTD
First and foremost, DTDs are composed of non-XML
syntax. Given that one of the central tenets of XML is that
it be totally extensible, it may not seem to make a lot of
sense that this is the case for DTDs.
There can only be a single DTD per document. It is true
that there can be internal and external subsets of DTDs,
but there can only be a single DTD referenced
DTDS are not object oriented. There is no inheritance in
DTDs. DTDs do not support namespaces very well.
Prepared By : Prabu.U
87. Drawbacks of DTD
For a namespace to be used, the entire namespace must be
defined within the DTD.
DTDs have weak data typing and no support for the XML
DOM.
Finally, and possibly most important from a security
standpoint, is the ability of the internal DTD subset to
override the external DTD subset.
Prepared By : Prabu.U
88. XML Schema
The XML Schema Definition Language is an
XML language for describing and constraining
the content of XML documents.
XML Schema is a W3C recommendation.
XML Schema defines what it means for an XML
document to be valid.
Prepared By : Prabu.U
89. Usage of XML Schemas
It support data types.
It uses XML syntax.
It has secure data communication.
XML Schemas are extensible.
When using XML Schemas most of the errors can be
taken care of by the validating software.
Prepared By : Prabu.U
90. XML Schema Definition
Elements that can appear in a document.
Attributes that can appear in a document.
The elements that are child elements.
The order of child elements.
The number of child elements.
The criteria whether an element is empty or can
include text.
Data types for elements and attributes.
Default and fixed values for elements and
attributes.
Prepared By : Prabu.U
91. XML Schema Data Types
XSD String: String data types are used for values that
contains character strings.
XSD Date: Date and time data types are used for
values that contain date and time.
XSD Numeric: Numeric data types are used for
numeric values.
XSD Misc: Other miscellaneous data types like
boolean, base64Binary, hexBinary, float, double etc.
Prepared By : Prabu.U
92. Types of Indicators
i) Order indicators
a) All Indicator
b)Choice Indicator
c) Sequence Indicator
ii) Occurrence Indicators
iii) Group Indicators
Prepared By : Prabu.U
93. Order indicators
a) All Indicator
The <all> indicator specifies, by default, that the child
elements can appear in any order and that each child
element must occur once and only once.
<xs:element name="book">
<xs:complexType>
<xs:all>
<xs:element name="title" type="xs:string"/>
<xs:element name="author" type="xs:string"/>
</xs:all>
</xs:complexType>
</xs:element>
Prepared By : Prabu.U
94. Order indicators
b)Choice Indicator
The <choice> indicator specifies that either one child
element or another can occur.
<xs:element name="person">
<xs:complexType>
<xs:choice>
<xs:element name="employee" type="employee"/>
<xs:element name="member" type="member"/>
</xs:choice>
</xs:complexType>
</xs:element>
Prepared By : Prabu.U
95. Order indicators
c) Sequence Indicator
The <sequence> indicator specifies that the child
elements must appear in a specific order.
<xs:element name="book">
<xs:complexType>
<xs:sequence>
<xs:element name="title" type="xs:string"/>
<xs:element name="author" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
Prepared By : Prabu.U
96. Occurrence indicators
Occurrence indicators are used to define how often an element can
occur. For Order and group indicators, the default value for maxOccurs
and minOccurs is 1.
The <maxOccurs> indicator specifies the maximum number of times an
element can occur.
The <minOccurs> indicator specifies the minimum number of times an
element can occur.
<xs:element name="book">
<xs:complexType>
<xs:sequence>
<xs:element name="title" type="xs:string"/>
<xs:element name="author" type="xs:string"/>
<xs:element name="vendor" type="xs:string" maxOccurs="2"
minOccurs="0"/>
</xs:sequence>
</xs:complexType>
</xs:element> Prepared By : Prabu.U
97. Group indicators
Group indicators are used to define related sets of
elements. Element groups are defined with group
declaration.
<xs:group name="persongroup">
<xs:sequence>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="lasstname" type="xs:string"/>
<xs:element name="birthday" type="xs:string"/>
</xs:sequence>
</xs:group>
Prepared By : Prabu.U