2. • element donutBox can have zero or one jelly
elements, followed by zero or more lemon
elements, followed by one or more creme or
sugar elements or exactly one glazed element
3. EMPTY, Mixed Content and ANY
• Elements must be further refined by
specifying the types of content they contain
• content specification types for describing non-
element content
• EMPTY
– declares empty elements.
– empty elements do not contain character data or
child elements
<!ELEMENT oven EMPTY>
Markup - <oven />
4. • Mixed Content
– may contain any combination of elements and PCDATA
<!ELEMENT myMessage ( #PCDATA | message )*>
<myMessage>Here is some text, some
<message>other text</message>and
<message>even more text</message>.
</myMessage>
– as
• ANY
– can contain any content, including PCDATA, elements
or a combination of elements and PCDATA
– Can also be empty elements
5. Attribute Declarations
• Elements may have attributes
• An attribute declaration specifies an attribute list for an
element by using ATTLIST keyword.
• For an element, attribute names must be unique
• It is not necessary to declare all the attributes of an
element at one place
• XML allows us to put them in multiple declaration lists that
are merged by the XML parser as it reads in the DTD.
Syntax:
<!ATTLIST element_name attname1 atttype attdesc1
attname2 atttype attdesc2
>
6. 1. the declaration starts with the string <!ATTLIST
2. name of the element to which the attributes belong
3. some number of attribute declarations
4. each attribute declaration consists of an attribute name
5. attributes data type or enumerated values
6. description of the attribute behavior
7. terminating delimiter
7. • Example 1:
<!ELEMENT x EMPTY> // declares EMPTY element x.
<!ATTLIST x y CDATA #REQUIRED> //attribute declaration specifies
that y is an attribute of x
– CDATA indicates that y can contain any character text.
– #REQUIRED specifies that the attribute must be provided for
element x.
XML document:
<x y=“something”/>
8. • Example 2:
<!ELEMENT myMessage ( message )>
<!ELEMENT message ( #PCDATA )>
<!ATTLIST message id CDATA #REQUIRED>
• XML Document:
<myMessage>
<message id = "445">
Welcome to XML!
</message>
</myMessage>
9. • Example 3:
<!ATTLIST memo
id ID #REQUIRED (1)
security (high | low) "high" (2)
keywords NMTOKENS #IMPLIED (3)
>
(1) The first attribute, id, is type ID. #REQUIRED means that this
attribute must be specified by the author.
(2) The second attribute, security, can take either of two values,
high or low; & the default is high
(3) The third attribute, keywords, takes a value of type
NMTOKENS, #IMPLIED means that this attribute is optional
and has no default value.
10. An attribute declaration does the following three things:
• It gives the attribute a name.
• It specifies the data type of the attribute, or a
list of permissible values it can take.
• It describes the behavior of the attribute:
whether there is a default value, or if the
author is required to supply a value.
13. Attribute Declarations: Attribute Behavior
• Default value assigned:
– If the user doesn't supply a value for this attribute, the
XML processor assumes a default value is supplied in the
DTD.
– Specifying a default value in the DTD is often a good idea
when one value is most common
<!ATTLIST message importance (high | medium | low) "medium“ >
– When the user omits the importance attribute in the
<message> element, the XML processor uses the default
value.
– (high | medium | low) - enumerated value list
14. Contd..
• Attribute is optional (#IMPLIED)
– to declare an attribute to be optional, the XML
processor does not assign any default value
– the attribute is effectively absent
– #IMPLIED keyword is used to say that there is no
default value and that use of the attribute is optional
<!ATTLIST reservation frills (aisle-seat|meal-included|pillow) #IMPLIED
>
15. Contd..
• User must supply a value (#REQUIRED)
– if there is no good default value, and the attribute
can't be blank, you should declare it to be
#REQUIRED.
– leaving out the attribute or giving it a blank value will
result in a parser error
<!ATTLIST book isbn CDATA #REQUIRED>
16. Contd..
• Value already set, user cannot change it (#FIXED)
– when an attribute must always have the same value,
in that case we have to use the keyword #FIXED
– in addition to using the keyword #FIXED, we need to
supply the fixed value.
<!ATTLIST car color (beige, white, black, red, blue, silver) #FIXED
"black“ >
<!ATTLIST address zip #FIXED “248007” >
17. Attribute Declarations: Attribute Types
• Attribute Types can take three kinds of value
– Strings {do not impose any constraint on attribute
values} (CDATA)
– Tokenized Attributes {impose constraint on attribute
values} (ID, IDREF, ENTITY, NMTOKEN)
– Enumerated Attributes {most restrictive. Can take
only one of the values listed in the attribute
declaration}
18. • CDATA (Character Data)
– loosest of the attribute types
– any character data can be used, including character entities
and internal general entities
– other markup, such as elements or PI, cannot appear in
attribute values for this or any other attribute type
<!ATTLIST circle radius CDATA "12 inches">
Some Examples are:
dimensions="35x12x9 mm"
company="O'Reilly & Associates"
text=" 5 + 7 = 3 * 4 "
– If the attribute value includes quotes, you need to use either a
character entity for the quotes inside the value or different
quotes around the value
– Any whitespace character in the value is converted into a
space character
19. • NMTOKEN(name token)
– May contain alphanumeric and/or ideographic characters and
the punctuation marks , -, and .
– All allowed characters can be first character
– XML Name: Only letters, ideographs, and can be first
character
– a string of characters that begins with a letter and can contain
numbers, letters and certain punctuation.
<!ATTLIST part number NMTOKEN #REQUIRED>
Some examples are:
skin="reptilian"
file="README.txt"
version="v3.4-b"
– Any whitespace in the value is removed by the XML processor.
20. • NMTOKENS (name token list)
– a sequence of one or more NMTOKEN’s separated
by spaces
– XML processor removes leading and trailing
whitespace, and truncates other whitespace into a
single space character
<!ATTLIST article keywords NMTOKENS #IMPLIED>
Some examples are:
name="Greg Travis"
format="thin blue border"
21. • ID (unique identifier)
– value must be XML Name
– a special type of attribute that gives an element a
label guaranteed to be unique in the document.
– No two elements are allowed to have the same ID
attribute value (Guaranteed to be unique in the document)
– same behavior and syntax as NMTOKEN
<!ATTLIST record ID #REQUIRED>
Some examples are:
id="ISBN-12456-98-123"
label="JAVABOOK.CHAPTER.INTRO"
22. • IDREF (identifier reference)
– similar to the unique identifier type
– it refers to the ID of another element
– If there is no element with the specified ID value, the
parser reports an error
– used for internal links such as cross-references
<!ATTLIST employee hod IDREF #REQUIRED>
• IDREFS (identifier reference list)
– More than one value of ID type attribute
– A space-separated list of IDREF values, following the
same pattern as NMTOKENS
– Each IDREF in the list must match an ID attribute value
in the document, or the parser will complain.
<!ATTLIST bookset refs IDREFS #REQUIRED>
23. • ENTITY (entity name)
– accepts a general entity name as a value
– use it after declaring an entity in the DTD
<!ATTLIST bulletlist icon ENTITY #IMPLIED>
<!ENTITY bluedot SYSTEM "icons/bluedot.png">
Example:
<bulletlist icon="bluedot">
• ENTITIES (entity name list)
– a list of entity names separated by spaces
<!ATTLIST album filelist ENTITIES #REQUIRED>
24. • Enumerated value list
– a list of keywords that we define
– an attribute with an enumerated value list is useful when
there is a small set of possible values
– instead of declaring this type of attribute with a keyword in
the type field, we specify a list of values in parentheses,
separated by |.
<!ATTLIST part instock ( true | false ) #IMPLIED>
<!ATTLIST schedule day ( mon | tue | wed | thu | fri | sat | sun
) #REQUIRED >
– An attribute can have only one of the values in the list at a
time <schedule day="fri">
– If we want to declare a default value, it has to be one of the
choices in the list
<!ATTLIST shape type (circle | square | triangle) "square">
25. • NOTATION (notation list)
– Like an NMTOKENS attribute, the value of a
NOTATION attribute consists of a sequence of name
tokens
– An attribute of this type matches one or more
notation types
– notation types are instructions for how to process
formatted or non-XML data
– E.g.: a notation can be defined to preserve leading
and trailing whitespace in an element where they
would ordinarily be removed by the parser
26. Notations and Unparsed Data
• XML is designed primarily to be a container for
textual information
• Not ideal for storing binary data such as image
bitmaps or compressed text, but it isn't totally
incompatible.
• If XML could handle only text, it would be too
limited to be of practical value.
• provides a mechanism for coexisting with non-
XML data, that is called a notation specification.
27. Contd..
• A notation is a special label to tell the XML
processor what kind of data it's looking at.
• Use of Notations
– labeling non textual data
– label textual data that has a specific format, such as date
• Notation Declarations:
– Syntax for declaring a notation type is
<!NOTATION name identifier>
– name is the name of a notation type
– identifier is an external identifier that has some meaning
to the XML processor
– meaning is dependent on application
28. Notation External Identifiers
Example Meaning
SYSTEM "application/x-troff" The MIME type for troff-encoded text
SYSTEM "ISO 8601:1988" An international standard number for
date formats (e.g., 1994-02-03)
SYSTEM "http://www.w3.org/TR/NOTE-
datetime"
A URL to a technical document on the
Internet about date formats
PUBLIC "-//ORA//NON-SGML Preferred
Date Format//EN"
"http://www.oreilly.com/xml/dates.html"
A formal public identifier for an online
resource
SYSTEM "/usr/local/bin/xv" A software program on the local system
that should be called to process unparsed
data
• all these examples are possible
• important thing is that the identifier be unique and it convey to the XML processor
enough information to process the data
• The notation declaration creates a label, which we use in conjunction with the
declaration of an attribute or an unparsed external entity
29. Unparsed Entities
• External general entities import XML data from
other files
• unparsed entity is another kind of entity that
imports non-XML data
• Declaration is similar to external general entity,
except the keyword NDATA and a notation type
follow the system or public identifier
<!ENTITY song "jingle_bells.wav" NDATA audio-wav>
31. <?xml version="1.0"?>
<!DOCUMENT doc [
<!ELEMENT doc ANY>
<!NOTATION jpeg SYSTEM "image/jpeg">
<!ENTITY bob "pictures/bob.jpeg" NDATA jpeg>
]>
<doc>
&bob;
</doc>
• Note: Do not embed an unparsed entity directly in the
XML document.
• This example document is not well-formed
– Instead pass the entity to an element through an attribute
32. Labeling element formats with notations
• Notations can help specify how character data should be interpreted
<!DOCUMENT record [
<!ELEMENT doc (title, listing+)>
<!ELEMENT title (#PCDATA)*>
<!ELEMENT listing (#PCDATA)*>
<!ATTLIST listing format NOTATION (scheme-lisp | ansi-c) #REQUIRED>
<!NOTATION scheme-lisp SYSTEM "IEEE 1178-1990">
<!NOTATION ansi-c SYSTEM "ISO/IEC 9899:1999">
]>
<doc>
<title>Factorial Function</title>
<listing format="scheme-lisp">
(defun fact (lambda (n) (if (= n 1) 1 (fact (- n 1)))))
</listing>
<listing format="ansi-c">
int fact( int n ) {
if( n == 1 ) return 1;
return n * fact( n - 1 );
}
</listing>
</doc>
33. ENTITY Declarations
• General entity
– A simple substitution for parsed text.
– <!ENTITY abc "The ABC Group">
– Specify the general entity as &abc;
34. • External general entity
– contain text from an external source
1. <!ENTITY man PUBLIC "-//Acme Gadgets//TEXT Manual 23//EN“
"http://www.acme-gadgets.com/manuals/prod23.htm">
2. <!ENTITY man SYSTEM "/pub/docs/manuals/prod23.htm">
– Reference the entity as &man;
• Nonparsed external entity
– contain non-XML data from an external source
1. <!ENTITY logo PUBLIC "-//Acme Gadgets//NON-XML Logo//EN“
"http://www.acme-gadgets.com/images/logo.gif" NDATA gif>
2. <!ENTITY logo SYSTEM "images/logo.gif" NDATA gif>
– Reference the entity as &logo;
35. • Parameter entity
– A simple substitution for DTD text
<!ENTITY % paratext "(#PCDATA | emph | acronym)*">
– Reference the entity as %paratext;
– cannot contain XML text, nor can a parameter entity
reference appear inside an XML document
– Can be used in either internal or external subset???
<!ENTITY % content "para | note | warning">
<!ENTITY % id.att "id ID #REQUIRED">
<!ELEMENT chapter (title, epigraph, (%content;)+)>
<!ATTLIST chapter %id.att;>
<!ELEMENT appendix (title, (%content;)+)>
<!ATTLIST appendix %id.att;>
– shows how parameter entities simplify the design and
maintenance of a DTD
36. • External parameter entity
– containing a DTD or part of a DTD from an external
source
– use to import parts of a DTD that reside in different
files. technique is called modularizing
<!ENTITY % inline-elements SYSTEM "inlines.mod">
<!ENTITY % ISOamsa PUBLIC "ISO 8879:1986//ENTITIES Added Math
Symbols: Arrow Relations//EN//XML“
"/usr/local/sgml/isoents/isoamsa.ent">
%inline-elements;
%ISOamsa;
40. Conditional Sections
• Special form of markup used in a DTD that marks a
region of text for inclusion or exclusion in the DTD
• If we know a piece of DTD may someday be an
unwanted option
– we can make it a conditional section and let the end user
decide whether to keep it in or not
• Similar to CDATA marked sections
– Use square bracket delimiters
– CDATA keyword is replaced with either INCLUDE or IGNORE.
• Syntax
<![switch[DTD text]]>
where switch is like a on/off switch, activating the DTD text if its
value is INCLUDE, or marking it inactive if it's set to IGNORE.
41. • Example 1:
<![INCLUDE[
<!-- these declarations will be included -->
<!ELEMENT foo (bar, caz, bub?)>
<!ATTLIST foo crud CDATA #IMPLIED)>
]]>
<![IGNORE[
<!-- these declarations will be ignored -->
<!ELEMENT blah (#PCDATA)*>
<!ELEMENT glop (flub|zuc) 'zuc')>
]]>
• Example 2:
<!ENTITY % optional.stuff "INCLUDE">
<![%optional.stuff;[
<!-- these declarations may or may not be included -->
<!ELEMENT foo (bar, caz, bub?)>
<!ATTLIST foo crud CDATA #IMPLIED)>
]]>
42. • technique is especially powerful when you declare the entity
inside a document subset
• Example 3:
<![%use-disclaimer;[
<!ENTITY disclaimer "This is Beta software. We can't promise it is
free of bugs.">
]]>
<!ENTITY disclaimer "">
• The actual value of the entity depends on whether use-disclaimer
has been set to INCLUDE
• XML Doc
<!DOCTYPE manual SYSTEM "manual.dtd" [
<!ENTITY % use-disclaimer "IGNORE">
]>
<manual>
<title>User Guide for Techno-Wuzzy</title>
&disclaimer;
...
44. DTD vs XSD
• DTD has a simple syntax for content definition
• DTD has limitations when using XML for a variety
of complex purposes
• W3C recommended "XML Schema" as a schema
definition language to replace DTD.
• XML schema, commonly known as an XML
Schema Definition (XSD), describes what a given
XML document can contain.
45. XML - XSD
• The XML schema defines
– the shape or structure of the XML document,
– rules for data content
– semantics such as
• what fields an element can contain,
• which sub elements it can contain and
• how many items can be present.
– the type and values that can be placed in each
element or attribute.
– XML data constraints (facets) includes rules such
as min and max length.