Publicité
Publicité

Contenu connexe

Publicité
Publicité

1 xml fundamentals

  1. XML Extensible Markup Language Prepared By, Dr.K.G.Saranya Assistant Professor (S.Gr), Department of CSE, PSG College of Technology, Coimbatore-4.
  2. SGML (Standard Generalized Markup Language) • It is an internationally agreed standard for data representation. • It is an international standard for the definition of device independent, system independent methods of representing texts in electronic form.
  3. Introduction • XML stands for EXtensible Markup Language • XML is a markup language much like HTML • A simplified version of SGML • More flexible and adaptable than HTML • XML was designed to describe data
  4. • XML tags are not predefined. You must define your own tags • XML uses a Document Type Definition (DTD) or an XML Schema to describe the data • XML is a W3C Recommendation. World Wide Web Consortium published the first XML 1.0 standard definition in 1998. Cont..
  5. Difference between XML and HTML The main difference between XML and HTML – XML was designed to carry data. (XML is not a replacement for HTML) XML and HTML were designed with different goals: – XML was designed to describe data and to focus on what data is. HTML was designed to display data and to focus on how data looks. – HTML is about displaying information, while XML is about describing information.
  6. Why Is XML Important? • Plain Text – Easy to edit – Useful for storing small amounts of data – Possible to efficiently store large amounts of XML data through an XML front end to a database • Data Identification – Tell you what kind of data you have – Can be used in different ways by different applications
  7. Why is XML important? • Linkability -- XLink and XPointer – Simple unidirectional hyperlinks – Two-way links – Multiple-target links – “Expanding” links • Easily Processed – Regular and consistent notation • Hierarchical – Faster to access – Easier to rearrange
  8. XML Specifications • XML 1.0 Defines the syntax of XML • XPointer, XLink Defines a standard way to represent links between resources • XSL Defines the standard stylesheet language for XML
  9. XML Syntax • XML declaration is the first statement • All XML elements must have a closing tag • XML tags are case sensitive • All XML elements must be properly nested • All XML documents must have a root tag • Attribute values must always be quoted • With XML, white space is preserved • Comments in XML: <!-- This is a comment --> • Certain characters are reserved for parsing
  10. XML Validation There are two types of XML documents • "Well Formed" XML document --correct XML syntax • "Valid" XML document – “well formed” – Conforms to the rules of a DTD (Document Type Definition) • XML DTD – defines the legal building blocks of an XML document – Can be inline in XML or as an external reference • XML Schema – an XML based alternative to DTD, more powerful – Support namespace and data types
  11. Displaying XML • XML documents do not carry information about how to display the data • We can add display information to XML with – CSS (Cascading Style Sheets) – XSL (eXtensible Stylesheet Language) --- preferred
  12. XML support in IE 5.0+ Internet Explorer 5.0 has the following XML support: • Viewing of XML documents • Full support for W3C DTD standards • Binding XML data to HTML elements • Transforming and displaying XML with XSL • Displaying XML with CSS • Access to the XML DOM (Document Object Model) *Netscape 6.0 also have full XML support
  13. XML features • XML uses the concept of document type and hence a DTD (Document Type Definition) to describe data • XML with DTD is self descriptive • XML separates data from display formats • XML can be used as a format to exchange data
  14. XML Syntax consists of • XML Declaration • XML Elements • XML Attributes • The first line of an XML document should always consist of an XML declaration defining the version of XML
  15. General Structure <root> <child> <subchild>…….</subchild> </child> </root>
  16. Main Components of an XML Document • Elements: <hello> • Attributes: <item id=“33905”> • Entities: &lt; (<) • Advanced Components – CData Sections – Processing Instructions
  17. XML Attributes • XML attributes are used to describe XML elements or to provide additional information about elements. • Attributes provide additional information that is not part of the data. Ex: • <Book no=“99-2456” media=“CD”></Book>
  18. XML Attributes • XML elements can have attributes in name/value pairs as in HTML. • Attributes must always be in quotes. Either single or double quotes are valid, though double quotes are most common. • Attributes are always contained within the start tag of an element.
  19. Attributes Vs. Elements Case 1 ( Attributes) < Book no= “99-2356”type= “CD”> < author> < firstname>XXX</firstname> <lastname>YYY</lastname> </author> </Book>
  20. Case 2 ( Elements) • <Book> • <no>99-2356</no> • <type>CD</type> • < author> • < firstname>XXX</firstname> • <lastname>YYY</lastname> • </author> • </Book>
  21. Where elements scores over attributes • Elements can describe structure but not attributes • Attributes are more difficult to manipulate by program code than elements • Attribute values are difficult to validate against a DTD
  22. XML strengths • Its ability to describe data • Its ability to structure data • Separate display from structure • Supported by industry • Availability of tools
  23. XML applications • B2B • EDI • Journal publishing • Database development
  24. An example of XML <?xml version="1.0" encoding="ISO-8859-1"?> <note> <to>XXX</to> <from>YYY</from> <heading>XML</heading> <body> Extensible Markup Language </body> </note>
  25. Contents of the ProductList.xml Document
  26. Cont., • The first line represents the XML document declaration and it is mandatory. • Every XML has a root element. In our example, the second line is the root element - <ProductList> • The root element can contain child elements. In our example, Product is the child element of ProductList • Each element can contain sub-elements. – <P_CODE>,<P_PRICE> are sub-elements.
  27. Example <?xml version="1.0" encoding= "ISO-8859-1" ?> <book> <title> XML </title> <chapter> introduction to xml <para>Markup languages</para> <para>Features of XML</para> </chapter> <chapter>XML syntax <para>Elements must be enclosed in tags</para> <para>Elements must be properly nested</para> </chapter> </book>
  28. XMLArchitecture
  29. How do you get the data? XML data Parser Information structure (tree+links) Documents, stylesheets, and other data can all be expressed in XML. DOM Interface Any application can plug in via an API called “Document Object Model” DTD/Schema This model can work locally or over a network. Parsing, tree-building, and access can shift between client/server
  30. XML Parser • All modern browsers have a built-in XML parser. • An XML parser converts an XML document into an XML DOM object - which can then be manipulated with a JavaScript. XML DOM • A DOM (Document Object Model) defines a standard way for accessing and manipulating XML documents.
  31. XML Namespaces • XML Namespaces provide a method to avoid element name conflicts. • This XML carries HTML table information: <table> <tr> <td>Apples</td> <td>Bananas</td> </tr> </table>
  32. • This XML carries information about a table (a piece of furniture): <table> <name>African Coffee Table</name> <width>80</width> <length>120</length> </table> •If these XML fragments were added together, there would be a name conflict. •Both contain a <table>element, but the elements have different content and meaning. An XML parser will not know how to handle these differences.
  33. Solving the Name Conflict Using a Prefix • Name conflicts in XML can easily be avoided using a name prefix. • This XML carries information about an HTML table, and a piece of furniture: <h:table> <h:tr> <h:td>Apples</h:td> <h:td>Bananas</h:td> </h:tr> </h:table> <f:table> <f:name>African Coffee Table</f:name> <f:width>80</f:width> <f:length>120</f:length> </f:table>
  34. • In the example above, there will be no conflict because the two <table> elements have different names.
  35. XML Namespaces - The xmlns Attribute • When using prefixes in XML, a so-called namespace for the prefix must be defined. • The namespace is defined by the xmlns attribute in the start tag of an element. • The namespace declaration has the following syntax. xmlns:prefix="URI".
  36. <root> <h:table xmlns:h="http://www.w3.org/TR/html4/"> <h:tr> <h:td>Apples</h:td> <h:td>Bananas</h:td> </h:tr> </h:table> <f:table xmlns:f="http://www.w3schools.com/furniture"> <f:name>African Coffee Table</f:name> <f:width>80</f:width> <f:length>120</f:length> </f:table> </root> XML Namespaces - The xmlns Attribute
  37. • In the example above, the xmlns attribute in the <table> tag give the h: and f: prefixes a qualified namespace. • When a namespace is defined for an element, all child elements with the same prefix are associated with the same namespace. • Namespaces can be declared in the elements where they are used or in the XML root element:
  38. URI • Uniform Resource Identifier (URI) • A Uniform Resource Identifier (URI) is a string of characters which identifies an Internet Resource. • The most common URI is the Uniform Resource Locator (URL) which identifies an Internet domain address. Another, not so common type of URI is the Universal Resource Name (URN).
  39. PCDATA - Parsed Character Data • XML parsers normally parse all the text in an XML document. • When an XML element is parsed, the text between the XML tags is also parsed: <message>This text is also parsed</message>
  40. <name><first>Bill</first><last>Gates</last></name> The parser does this because XML elements can contain other elements, as in this example, where the <name> element contains two other elements (first and last): and the parser will break it up into sub-elements like this: <name> <first>Bill</first> <last>Gates</last> </name Parsed Character Data (PCDATA) is a term used about text data that will be parsed by the XML parser.
  41. CDATA - (Unparsed) Character Data • The term CDATA is used about text data that should not be parsed by the XML parser. • Characters like "<" and "&" are illegal in XML elements. • "<" will generate an error because the parser interprets it as the start of a new element. • "&" will generate an error because the parser interprets it as the start of an character entity. • Some text, like JavaScript code, contains a lot of "<" or "&" characters. To avoid errors script code can be defined as CDATA.
  42. • Everything inside a CDATA section is ignored by the parser. • A CDATA section starts with "<![CDATA[" and ends with "]]>": <script> <![CDATA[ function matchwo(a,b) { if (a < b && a < 0) then { return 1; } else { return 0; } } ]]> </script> In this example, everything inside the CDATA section is ignored by the parser
  43. Conclusion • XML is a self-descriptive language • XML is a powerful language to describe structure data for web application • XML is currently applied in many fields • Many vendors already supports or will support XML • XML Documents can be validated through the use of DTD and XSD documents • XML impacts B2B data exchanges, legacy system integration, web page development, database system integration.
Publicité