SlideShare une entreprise Scribd logo
1  sur  34
DTD
(Document Type Definition)
     Imposing Structure on
       XML Documents
     (W3Schools on DTDs)

                             1
Motivation
• A DTD adds syntactical requirements in
  addition to the well-formed requirement
• It helps in eliminating errors when
  creating or editing XML documents
• It clarifies the intended semantics
• It simplifies the processing of XML
  documents

                                            2
An Example
• In an address book, where can a phone
  number appear?
  – Under <person>, under <name> or under
    both?
• If we have to check for all possibilities,
  processing takes longer and it may not
  be clear to whom a phone belongs



                                               3
Document Type Definitions

• Document Type Definitions (DTDs)
  impose structure on XML documents
• There is some relationship between a
  DTD and a schema, but it is not close –
  hence the need for additional “typing”
  systems (XML schemas)
• The DTD is a syntactic specification

                                            4
Example: An Address Book
<person>
   <name> Homer Simpson </name>         Exactly one name
   <greet> Dr. H. Simpson </greet>     At most one greeting
   <addr>1234 Springwater Road </addr>    As many address
                                          lines as needed
   <addr> Springfield USA, 98765 </addr>
                                          (in order)
   <tel> (321) 786 2543 </tel>
                                   Mixed telephones
   <fax> (321) 786 2544 </fax>     and faxes
   <tel> (321) 786 2544 </tel>
                                                 As many
   <email> homer@math.springfield.edu </email>
                                                 as needed
</person>
                                                             5
Specifying the Structure


• name          to specify a name element
• greet?        to specify an optional
                (0 or 1) greet elements
• name, greet? to specify a name followed by
               an optional greet



                                               6
Specifying the Structure
              (cont’d)

• addr*       to specify 0 or more address
              lines
• tel | fax   a tel or a fax element
• (tel | fax)* 0 or more repeats of tel or fax
• email*      0 or more email elements



                                                 7
Specifying the Structure
            (cont’d)
• So the whole structure of a person entry
  is specified by

     name, greet?, addr*, (tel | fax)*, email*


• This is known as a regular expression


                                                 8
Element Type Definition
• for each element type E, a declaration of the form:
•            <!ELEMENT E P>
•      where P is a regular expression, i.e.,
• P ::= EMPTY | ANY | #PCDATA | E’ |
•                    P1, P2 | P1 | P2 | P? | P+ | P*
   –   E’: element type
   –   P1 , P2: concatenation
   –   P1 | P2: disjunction
   –   P?: optional
   –   P+: one or more occurrences
   –   P*: the Kleene closure


                                                   9
Summary of Regular Expressions

• A         The tag (i.e., element) A occurs
• e1,e2     The expression e1 followed by
            e2
•   e*      0 or more occurrences of e
•   e?      Optional: 0 or 1 occurrences
•   e+      1 or more occurrences
•   e1 | e2 either e1 or e2
•   (e)     grouping
                                           10
The Definition of an Element Consists of
     Exactly One of the Following
 • A regular expression (as defined
   earlier)
 • EMPTY means that the element has no
   content
 • ANY means that content can be any
   mixture of PCDATA and elements
   defined in the DTD
 • Mixed content which is defined as
   described on the next slide
 • (#PCDATA)
                                         11
The Definition of Mixed Content
• Mixed content is described by a
  repeatable OR group
     (#PCDATA | element-name | …)*
  – Inside the group, no regular expressions –
    just element names
  – #PCDATA must be first followed by 0 or
    more element names, separated by |
  – The group can be repeated 0 or more
    times
                                                 12
An Address-Book XML Document
      with an Internal DTD
<?xml version="1.0" encoding="UTF-8"?>
                                              The name of
<!DOCTYPE addressbook [
                                               the DTD is
  <!ELEMENT addressbook (person*)>
                                              addressbook
  <!ELEMENT person
     (name, greet?, address*, (fax | tel)*, email*)>
  <!ELEMENT name (#PCDATA)>
  <!ELEMENT greet (#PCDATA)>                 The syntax
  <!ELEMENT address       (#PCDATA)> of a DTD is
  <!ELEMENT tel       (#PCDATA)>             not XML
  <!ELEMENT fax       (#PCDATA)>             syntax
  <!ELEMENT email (#PCDATA)>
]>    “Internal” means that the DTD and the
       XML Document are in the same file
                                       13
The Rest of the
Address-Book XML Document

<addressbook>
 <person>
     <name> Jeff Cohen </name>
     <greet> Dr. Cohen </greet>
     <email> jc@penny.com </email>
 </person>
</addressbook>


                                     14
Regular Expressions
• Each regular expression determines a
corresponding finite-state automaton
• Let’s start with a simpler example: A double
                                       circle
           name, addr*, email          denotes an
                   addr                accepting
                                       state

           name            email


 This suggests a simple parsing program
                                             15
Another Example
name,address*,(tel | fax)*,email*
     address          tel           email
                tel
 name                       email
               fax

                      fax       email




                                            16
Some Things are Hard to Specify
Each employee element should contain name,
age and ssn elements in some order

<!ELEMENT employee
  ( (name, age, ssn) | (age, ssn, name) |
    (ssn, name, age) | ...
  )>

Suppose that there were many more fields!

                                             17
Some Things are Hard to Specify
           (cont’d)

<!ELEMENT employee
  ( (name, age, ssn) | (age, ssn, name) |
    (ssn, name, age) | ...
  )>
        There are n! different
        orders of n elements
Suppose there were many more fields!

       It is not even polynomial
                                            18
Specifying Attributes in the DTD
<!ELEMENT height (#PCDATA)>
<!ATTLIST height
   dimension CDATA #REQUIRED
   accuracy CDATA #IMPLIED >


The dimension attribute is required
The accuracy attribute is optional

CDATA is the “type” of the attribute – it means
“character data,” and may take any literal string
as a value
                                                    19
The Format of an Attribute Definition

• <!ATTLIST element-name attr-name
  attr-type default-value>
• The default value is given inside quotes
• attribute types:
  – CDATA
  – ID, IDREF, IDREFS
  –…

                                         20
Summary of Attribute
           Default Values
• #REQUIRED means that the attribute must
  by included in the element
• #IMPLIED
• #FIXED “value”
  – The given value (inside quotes) is the only
    possible one
• “value”
  – The default value of the attribute if none is given


                                                          21
Recursive DTDs
<DOCTYPE genealogy [
   <!ELEMENT genealogy (person*)>     Each person
   <!ELEMENT person (                 should have
             name,                    a father and a
             dateOfBirth,             mother. This
             person,      -- mother   leads to either
             person )> -- father      infinite data or
   ...                                a person that
]>                                    is a descendent
What is the problem with this?        of herself.
A parser does not notice it!
                                                  22
Recursive DTDs (cont’d)
<DOCTYPE genealogy [
   <!ELEMENT genealogy (person*)>     If a person only
   <!ELEMENT person (                 has a father,
             name,                    how can you
             dateOfBirth,             tell that he has
             person?,     -- mother   a father and
             person? )> -- father     does not have
   ...                                a mother?
]>
What is now the problem with this?

                                                  23
Using ID and IDREF Attributes
 <!DOCTYPE family [
 <!ELEMENT family (person)*>
 <!ELEMENT person (name)>
 <!ELEMENT name (#PCDATA)>
 <!ATTLIST person
           id       ID   #REQUIRED
           mother IDREF #IMPLIED
           father IDREF #IMPLIED
           children IDREFS #IMPLIED>
]>

                                       24
IDs and IDREFs
• ID attribute: unique within the entire document.
   – An element can have at most one ID attribute.
   – No default (fixed default) value is allowed.
      • #required: a value must be provided
      • #implied: a value is optional
• IDREF attribute: its value must be some other
  element’s ID value in the document.
• IDREFS attribute: its value is a set, each element of
  the set is the ID value of some other element in the
  document.
  <person id=“898” father=“332” mother=“336”
       children=“982 984 986”>


                                                          25
Some Conforming Data
<family>
    <person id=“lisa” mother=“marge” father=“homer”>
        <name> Lisa Simpson </name>
    </person>
    <person id=“bart” mother=“marge” father=“homer”>
        <name> Bart Simpson </name>
    </person>
    <person id=“marge” children=“bart lisa”>
        <name> Marge Simpson </name>
    </person>
    <person id=“homer” children=“bart lisa”>
        <name> Homer Simpson </name>
    </person>
</family>

                                                       26
ID References do not Have Types
• The attributes mother and father are
  references to IDs of other elements
• However, those are not necessarily
  person elements!
• The mother attribute is not necessarily a
  reference to a female person



                                          27
An Alternative Specification
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE family [
   <!ELEMENT family (person)*>
   <!ELEMENT person (name, mother?, father?, children?)>
   <!ATTLIST person id ID #REQUIRED>
   <!ELEMENT name (#PCDATA)>
   <!ELEMENT mother EMPTY>
   <!ATTLIST mother idref IDREF #REQUIRED>
   <!ELEMENT father EMPTY>
   <!ATTLIST father idref IDREF #REQUIRED>
   <!ELEMENT children EMPTY>
   <!ATTLIST children idrefs IDREFS #REQUIRED>
]>

                                                           28
The Revised Data
<family>                                <person id="bart">
  <person id="marge">                     <name> Bart
    <name> Marge                        Simpson </name>
     Simpson </name>                      <mother idref="marge"/>
    <children idrefs="bart lisa"/>        <father idref="homer"/>
  </person>                             </person>
  <person id="homer">                   <person id="lisa">
    <name> Homer                          <name> Lisa
     Simpson </name>                       Simpson </name>
    <children idrefs="bart lisa"/>        <mother idref="marge"/>
  </person>                               <father idref="homer"/>
                                        </person>
                                     </family>
                                                              29
Consistency of ID and IDREF
         Attribute Values
• If an attribute is declared as ID
   – The associated value must be distinct, i.e., different
     elements (in the given document) must have
     different values for the ID attribute (no confusion)
      • Even if the two elements have different element names
• If an attribute is declared as IDREF
   – The associated value must exist as the value of
     some ID attribute (no dangling “pointers”)
• Similarly for all the values of an IDREFS
  attribute
• ID, IDREF and IDREFS attributes are not typed
                                                                30
Adding a DTD to the Document
• A DTD can be internal
  – The DTD is part of the document file
• or external
  – The DTD and the document are on
    separate files
  – An external DTD may reside
    • In the local file system
       (where the document is)
    • In a remote file system
                                           31
Connecting a Document with its DTD
• An internal DTD:
 <?xml version="1.0"?>
 <!DOCTYPE db [<!ELEMENT ...> … ]>
 <db> ... </db>
• A DTD from the local file system:
 <!DOCTYPE db SYSTEM "schema.dtd">
• A DTD from a remote file system:
 <!DOCTYPE db SYSTEM
 "http://www.schemaauthority.com/schema.dtd">
                                            32
Well-Formed XML Documents
• An XML document (with or without a DTD) is
  well-formed if
  – Tags are syntactically correct
  – Every tag has an end tag
  – Tags are properly nested
                                 An XML document
                                 must be well formed
  – There is a root tag
  – A start tag does not have two occurrences of the
    same attribute


                                                       33
Valid Documents

• A well-formed XML document isvalid if
  it conforms to its DTD, that is,
  – The document conforms to the regular-
    expression grammar,
  – The types of attributes are correct, and
  – The constraints on references are satisfied

                                               34

Contenu connexe

En vedette (19)

Room 6 so far..
Room 6 so far..Room 6 so far..
Room 6 so far..
 
Flatstanley2
Flatstanley2Flatstanley2
Flatstanley2
 
Flatstanley2
Flatstanley2Flatstanley2
Flatstanley2
 
Digitalliteracy
DigitalliteracyDigitalliteracy
Digitalliteracy
 
Persuasiveadverts
PersuasiveadvertsPersuasiveadverts
Persuasiveadverts
 
Victoria janice
Victoria janiceVictoria janice
Victoria janice
 
Angelina,selena,inquiry
Angelina,selena,inquiryAngelina,selena,inquiry
Angelina,selena,inquiry
 
Guadencio
GuadencioGuadencio
Guadencio
 
Ethan and yul ry
Ethan and yul ryEthan and yul ry
Ethan and yul ry
 
Ben and jason
Ben and jasonBen and jason
Ben and jason
 
Problem makers
Problem makersProblem makers
Problem makers
 
Denver
DenverDenver
Denver
 
Flatstanley2
Flatstanley2Flatstanley2
Flatstanley2
 
Bond cars 2
Bond cars 2Bond cars 2
Bond cars 2
 
Handwriting
HandwritingHandwriting
Handwriting
 
Legend of the 7 whales
Legend of the 7 whalesLegend of the 7 whales
Legend of the 7 whales
 
Buzz
BuzzBuzz
Buzz
 
Stormbreaker start here
Stormbreaker start hereStormbreaker start here
Stormbreaker start here
 
Aussie rules
Aussie rulesAussie rules
Aussie rules
 

Similaire à Dtd

Similaire à Dtd (20)

Unit iv xml
Unit iv xmlUnit iv xml
Unit iv xml
 
XML's validation - DTD
XML's validation - DTDXML's validation - DTD
XML's validation - DTD
 
it8074-soa-uniti-.pdf
it8074-soa-uniti-.pdfit8074-soa-uniti-.pdf
it8074-soa-uniti-.pdf
 
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASESSCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
 
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASESSCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
 
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASESSCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
 
DTD
DTDDTD
DTD
 
2-DTD.ppt
2-DTD.ppt2-DTD.ppt
2-DTD.ppt
 
Xml dtd
Xml dtdXml dtd
Xml dtd
 
XML DTD DOCUMENT TYPE DEFINITION
XML DTD DOCUMENT TYPE DEFINITIONXML DTD DOCUMENT TYPE DEFINITION
XML DTD DOCUMENT TYPE DEFINITION
 
DTD1.pptx
DTD1.pptxDTD1.pptx
DTD1.pptx
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
It8074 soa-unit i
It8074 soa-unit iIt8074 soa-unit i
It8074 soa-unit i
 
uptu web technology unit 2 Xml2
uptu web technology unit 2 Xml2uptu web technology unit 2 Xml2
uptu web technology unit 2 Xml2
 
Xml2
Xml2Xml2
Xml2
 
1st lecture.ppt
1st lecture.ppt1st lecture.ppt
1st lecture.ppt
 
Xml2
Xml2Xml2
Xml2
 
XML
XMLXML
XML
 
Dtd
DtdDtd
Dtd
 
Xml basics concepts
Xml basics conceptsXml basics concepts
Xml basics concepts
 

Dernier

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Dernier (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

Dtd

  • 1. DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs) 1
  • 2. Motivation • A DTD adds syntactical requirements in addition to the well-formed requirement • It helps in eliminating errors when creating or editing XML documents • It clarifies the intended semantics • It simplifies the processing of XML documents 2
  • 3. An Example • In an address book, where can a phone number appear? – Under <person>, under <name> or under both? • If we have to check for all possibilities, processing takes longer and it may not be clear to whom a phone belongs 3
  • 4. Document Type Definitions • Document Type Definitions (DTDs) impose structure on XML documents • There is some relationship between a DTD and a schema, but it is not close – hence the need for additional “typing” systems (XML schemas) • The DTD is a syntactic specification 4
  • 5. Example: An Address Book <person> <name> Homer Simpson </name> Exactly one name <greet> Dr. H. Simpson </greet> At most one greeting <addr>1234 Springwater Road </addr> As many address lines as needed <addr> Springfield USA, 98765 </addr> (in order) <tel> (321) 786 2543 </tel> Mixed telephones <fax> (321) 786 2544 </fax> and faxes <tel> (321) 786 2544 </tel> As many <email> homer@math.springfield.edu </email> as needed </person> 5
  • 6. Specifying the Structure • name to specify a name element • greet? to specify an optional (0 or 1) greet elements • name, greet? to specify a name followed by an optional greet 6
  • 7. Specifying the Structure (cont’d) • addr* to specify 0 or more address lines • tel | fax a tel or a fax element • (tel | fax)* 0 or more repeats of tel or fax • email* 0 or more email elements 7
  • 8. Specifying the Structure (cont’d) • So the whole structure of a person entry is specified by name, greet?, addr*, (tel | fax)*, email* • This is known as a regular expression 8
  • 9. Element Type Definition • for each element type E, a declaration of the form: • <!ELEMENT E P> • where P is a regular expression, i.e., • P ::= EMPTY | ANY | #PCDATA | E’ | • P1, P2 | P1 | P2 | P? | P+ | P* – E’: element type – P1 , P2: concatenation – P1 | P2: disjunction – P?: optional – P+: one or more occurrences – P*: the Kleene closure 9
  • 10. Summary of Regular Expressions • A The tag (i.e., element) A occurs • e1,e2 The expression e1 followed by e2 • e* 0 or more occurrences of e • e? Optional: 0 or 1 occurrences • e+ 1 or more occurrences • e1 | e2 either e1 or e2 • (e) grouping 10
  • 11. The Definition of an Element Consists of Exactly One of the Following • A regular expression (as defined earlier) • EMPTY means that the element has no content • ANY means that content can be any mixture of PCDATA and elements defined in the DTD • Mixed content which is defined as described on the next slide • (#PCDATA) 11
  • 12. The Definition of Mixed Content • Mixed content is described by a repeatable OR group (#PCDATA | element-name | …)* – Inside the group, no regular expressions – just element names – #PCDATA must be first followed by 0 or more element names, separated by | – The group can be repeated 0 or more times 12
  • 13. An Address-Book XML Document with an Internal DTD <?xml version="1.0" encoding="UTF-8"?> The name of <!DOCTYPE addressbook [ the DTD is <!ELEMENT addressbook (person*)> addressbook <!ELEMENT person (name, greet?, address*, (fax | tel)*, email*)> <!ELEMENT name (#PCDATA)> <!ELEMENT greet (#PCDATA)> The syntax <!ELEMENT address (#PCDATA)> of a DTD is <!ELEMENT tel (#PCDATA)> not XML <!ELEMENT fax (#PCDATA)> syntax <!ELEMENT email (#PCDATA)> ]> “Internal” means that the DTD and the XML Document are in the same file 13
  • 14. The Rest of the Address-Book XML Document <addressbook> <person> <name> Jeff Cohen </name> <greet> Dr. Cohen </greet> <email> jc@penny.com </email> </person> </addressbook> 14
  • 15. Regular Expressions • Each regular expression determines a corresponding finite-state automaton • Let’s start with a simpler example: A double circle name, addr*, email denotes an addr accepting state name email This suggests a simple parsing program 15
  • 16. Another Example name,address*,(tel | fax)*,email* address tel email tel name email fax fax email 16
  • 17. Some Things are Hard to Specify Each employee element should contain name, age and ssn elements in some order <!ELEMENT employee ( (name, age, ssn) | (age, ssn, name) | (ssn, name, age) | ... )> Suppose that there were many more fields! 17
  • 18. Some Things are Hard to Specify (cont’d) <!ELEMENT employee ( (name, age, ssn) | (age, ssn, name) | (ssn, name, age) | ... )> There are n! different orders of n elements Suppose there were many more fields! It is not even polynomial 18
  • 19. Specifying Attributes in the DTD <!ELEMENT height (#PCDATA)> <!ATTLIST height dimension CDATA #REQUIRED accuracy CDATA #IMPLIED > The dimension attribute is required The accuracy attribute is optional CDATA is the “type” of the attribute – it means “character data,” and may take any literal string as a value 19
  • 20. The Format of an Attribute Definition • <!ATTLIST element-name attr-name attr-type default-value> • The default value is given inside quotes • attribute types: – CDATA – ID, IDREF, IDREFS –… 20
  • 21. Summary of Attribute Default Values • #REQUIRED means that the attribute must by included in the element • #IMPLIED • #FIXED “value” – The given value (inside quotes) is the only possible one • “value” – The default value of the attribute if none is given 21
  • 22. Recursive DTDs <DOCTYPE genealogy [ <!ELEMENT genealogy (person*)> Each person <!ELEMENT person ( should have name, a father and a dateOfBirth, mother. This person, -- mother leads to either person )> -- father infinite data or ... a person that ]> is a descendent What is the problem with this? of herself. A parser does not notice it! 22
  • 23. Recursive DTDs (cont’d) <DOCTYPE genealogy [ <!ELEMENT genealogy (person*)> If a person only <!ELEMENT person ( has a father, name, how can you dateOfBirth, tell that he has person?, -- mother a father and person? )> -- father does not have ... a mother? ]> What is now the problem with this? 23
  • 24. Using ID and IDREF Attributes <!DOCTYPE family [ <!ELEMENT family (person)*> <!ELEMENT person (name)> <!ELEMENT name (#PCDATA)> <!ATTLIST person id ID #REQUIRED mother IDREF #IMPLIED father IDREF #IMPLIED children IDREFS #IMPLIED> ]> 24
  • 25. IDs and IDREFs • ID attribute: unique within the entire document. – An element can have at most one ID attribute. – No default (fixed default) value is allowed. • #required: a value must be provided • #implied: a value is optional • IDREF attribute: its value must be some other element’s ID value in the document. • IDREFS attribute: its value is a set, each element of the set is the ID value of some other element in the document. <person id=“898” father=“332” mother=“336” children=“982 984 986”> 25
  • 26. Some Conforming Data <family> <person id=“lisa” mother=“marge” father=“homer”> <name> Lisa Simpson </name> </person> <person id=“bart” mother=“marge” father=“homer”> <name> Bart Simpson </name> </person> <person id=“marge” children=“bart lisa”> <name> Marge Simpson </name> </person> <person id=“homer” children=“bart lisa”> <name> Homer Simpson </name> </person> </family> 26
  • 27. ID References do not Have Types • The attributes mother and father are references to IDs of other elements • However, those are not necessarily person elements! • The mother attribute is not necessarily a reference to a female person 27
  • 28. An Alternative Specification <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE family [ <!ELEMENT family (person)*> <!ELEMENT person (name, mother?, father?, children?)> <!ATTLIST person id ID #REQUIRED> <!ELEMENT name (#PCDATA)> <!ELEMENT mother EMPTY> <!ATTLIST mother idref IDREF #REQUIRED> <!ELEMENT father EMPTY> <!ATTLIST father idref IDREF #REQUIRED> <!ELEMENT children EMPTY> <!ATTLIST children idrefs IDREFS #REQUIRED> ]> 28
  • 29. The Revised Data <family> <person id="bart"> <person id="marge"> <name> Bart <name> Marge Simpson </name> Simpson </name> <mother idref="marge"/> <children idrefs="bart lisa"/> <father idref="homer"/> </person> </person> <person id="homer"> <person id="lisa"> <name> Homer <name> Lisa Simpson </name> Simpson </name> <children idrefs="bart lisa"/> <mother idref="marge"/> </person> <father idref="homer"/> </person> </family> 29
  • 30. Consistency of ID and IDREF Attribute Values • If an attribute is declared as ID – The associated value must be distinct, i.e., different elements (in the given document) must have different values for the ID attribute (no confusion) • Even if the two elements have different element names • If an attribute is declared as IDREF – The associated value must exist as the value of some ID attribute (no dangling “pointers”) • Similarly for all the values of an IDREFS attribute • ID, IDREF and IDREFS attributes are not typed 30
  • 31. Adding a DTD to the Document • A DTD can be internal – The DTD is part of the document file • or external – The DTD and the document are on separate files – An external DTD may reside • In the local file system (where the document is) • In a remote file system 31
  • 32. Connecting a Document with its DTD • An internal DTD: <?xml version="1.0"?> <!DOCTYPE db [<!ELEMENT ...> … ]> <db> ... </db> • A DTD from the local file system: <!DOCTYPE db SYSTEM "schema.dtd"> • A DTD from a remote file system: <!DOCTYPE db SYSTEM "http://www.schemaauthority.com/schema.dtd"> 32
  • 33. Well-Formed XML Documents • An XML document (with or without a DTD) is well-formed if – Tags are syntactically correct – Every tag has an end tag – Tags are properly nested An XML document must be well formed – There is a root tag – A start tag does not have two occurrences of the same attribute 33
  • 34. Valid Documents • A well-formed XML document isvalid if it conforms to its DTD, that is, – The document conforms to the regular- expression grammar, – The types of attributes are correct, and – The constraints on references are satisfied 34