SlideShare une entreprise Scribd logo
1  sur  125
Télécharger pour lire hors ligne
UQI810TM
Project module
2003
M R Shaheedullah
Student No. 02975062
An XML Approach Using Forms and Style
Sheets to Generate Materials Test Reports
Abstract
An application that may benefit from the advantages of XML is the creation of
materials test reports. An XML schema known as MatML already exists for
materials properties. Two form based data entry methods were compared, a
HTML v4 with Perl v5.6 system, and the Microsoft InfoPath software package.
The resulting XML documents from these systems were formatted to produce
reports using style sheets.
A feasibility study explored the techniques and highlighted a number of issues,
in particular the requirement to use a combination of extensible style language
transformation (XSLT) and cascading style sheets (CSS) to format the XML
document. A framework for the review of the MatML schema was developed,
attempting to focus on its effectiveness and efficiency. The review concluded
that MatML provides a logical description of the materials domain and the code
is relatively efficient.
The two systems of creating XML documents were found to have contrasting
characteristics. The HTML/Perl system has good flexibility for manipulating
data, at the cost of significant development time. This system is available at
no cost. Implementation on multiple platforms, and use over the internet are
also feasible. A form was developed in Microsoft InfoPath very simply with the
use of a graphical design mode.
The XML based approach to report generation has benefits in that the XML
document can be transformed into a formatted document with a relatively
simple style sheet, and the file format can potentially facilitate data interchange.
Table of Contents
1 INTRODUCTION ....................................................................................................................... 1
2 RESEARCH AND PROJECT STRATEGY ............................................................................. 6
3 LITERATURE SURVEY.......................................................................................................... 10
4 REVIEW OF SOFTWARE TOOLS........................................................................................ 19
5 FEASIBILITY STUDY ............................................................................................................. 23
6 REVIEW OF MATML ............................................................................................................. 43
7 PROJECT DESCRIPTION ...................................................................................................... 53
8 FORM CREATION FOR MATML EXAMPLE 2 DATA, USING HTML......................... 56
9 FORM CREATION FOR MATML EXAMPLE 2 DATA, USING MICROSOFT
INFOPATH.......................................................................................................................................... 62
10 REPORT GENERATION USING XSLT AND CSS.............................................................. 68
11 ASSESSMENT OF PROJECT DATA..................................................................................... 75
12 DISCUSSION............................................................................................................................. 82
13 SUMMARY OF FINDINGS ..................................................................................................... 91
14 REVIEW..................................................................................................................................... 93
15 REFERENCES .......................................................................................................................... 95
APPENDIX A: ORIGINAL MATML EXAMPLE 2 XML DOCUMENT
APPENDIX B: HTML V4 CODE TO PRODUCE WEB-FORM FOR MATML EXAMPLE 2
APPENDIX C: PERL V5.0.6 SCRIPT TO TRANSFORM HTML FORM ELEMENTS INTO
MATML DOCUMENT
APPENDIX D: XML DOCUMENT BASED ON MATML EXAMPLE 2, PRODUCED BY
HTML/PERL
APPENDIX E: XML DOCUMENT BASED ON MATML EXAMPLE 2, PRODUCED BY
MICROSOFT INFOPATH
APPENDIX F: XSLT FILE TO TRANSFORM XML DOCUMENTS BASED ON MATML
EXAMPLE 2 TO A ONE-PAGE REPORT
Table of Figures
Figure 1 Schematic map of the dissertation.............................................................................. 9
Figure 2 Hierarchy of entities for basketball statistics............................................................. 27
Figure 3 Web-form for basketball data.................................................................................... 30
Figure 4 Microsoft InfoPath form for basketball data, bballinfopathform.xml (from template
bball.xsn)................................................................................................................................. 33
Figure 5 Output of bball.xml with CSS style sheet (basket.css), in Microsoft Internet Explorer
v6.0.......................................................................................................................................... 36
Figure 6 Output of XML document bball.xml with XSLT style sheet, basket.xsl, in Microsoft
Internet Explorer v6.0.............................................................................................................. 39
Figure 7 First level hierarchy of MatML................................................................................... 45
Figure 8 Partially expanded hierarchy of MatML, ‘BulkDetails’ element................................. 46
Figure 9 Partially Expanded Hierarchy of MatML ................................................................... 47
Figure 10 A schematic diagram for the domain of materials property data ............................ 48
Figure 11 XSV v1.4 software validation of MatML schema .................................................... 51
Figure 12 Index of links for HTML form................................................................................... 59
Figure 13 Material and test details for HTML form ................................................................. 60
Figure 14 PropertyData element, HTML form......................................................................... 61
Figure 15 Screenshot of Microsoft InfoPath in design mode.................................................. 65
Figure 16 Material and test details, InfoPath form .................................................................. 66
Figure 17 PropertyData element, InfoPath form ..................................................................... 67
Figure 18 PropertyDetails and ParameterDetails elements.................................................... 67
Figure 19 HTML/Perl generated XML document, formatted with XSLT/CSS, as displayed in
Microsoft Internet Explorer v6.0, print view............................................................................. 71
Figure 20 HTML/Perl generated XML document formatted with XSLT/CSS, as displayed in
Netscape Navigator v7.02, partial print view........................................................................... 72
Figure 21 Microsoft InfoPath generated XML document formatted with XSLT/CSS, as
displayed in Microsoft Internet Explorer v6.0, print view......................................................... 73
Figure 22 Microsoft InfoPath generated XML document with XSLT style sheet, as displayed in
Netscape Navigator v7.02, partial print view........................................................................... 74
Figure 23 Output of the results from XSV, for the HTML/Perl generate XML document........ 77
Figure 24 Output of the results from XSV, for the Microsoft InfoPath XML document. .......... 77
Figure 25 Output of the results from XSV, for the modified Microsoft InfoPath XML document.
................................................................................................................................................ 78
Figure 26 Word-processed materials test report .................................................................... 81
Figure 27 Schematic diagram to show possible routes to report generation.......................... 87
1
1 Introduction
1.1 What is XML?
XML (XML activity, World Wide Web Consortium, 2000) is an abbreviation for
eXtensible Markup Language. This markup language facilitates a user-
definable, hierarchical data structure. The device independence and
customisability of XML give it the potential to profoundly affect data formats and
usage. In particular, XML can be used to create entire user-defined markup
languages, complete with a schema to define constraints. Database
management systems, web based information systems, and any application
that uses a hierarchical data structure may be able to utilise XML. The nature
of XML is discussed in more detail in the Literature Survey, section 3.
1.2 What are material test reports?
In the manufacturing industry, the raw material used to make components must
often meet certain property requirements. These requirements may include
mechanical, chemical or micro-structural properties. The nature of the
properties required is dependent on the application. For example motor engine
parts may need to be fatigue resistant whereas a pipeline may require corrosion
resistance. Hence, when raw material is produced, test reports can be
prepared that selected properties of the material.
1.3 How can XML be used with material test reports?
Methods for creating a test report could include manually preparing the
document (using data extracted from a separate database), or use of a
software database package with a report generator. The use of XML
documents has a number of benefits, described in section 3.2. In particular, a
2
schema can be defined to represent a specific domain. Such a schema has
been developed for materials properties, MatML (Begley and Sturrock, 2000).
MatML is discussed in section 3 and reviewed in section 6.
A convenient method of placing data in an XML document would be to use
forms. XML documents can be formatted using a technology termed style
sheets (Style activity, World Wide Web Consortium, 2003). Forms and style
sheets could therefore potentially be used to create materials test reports from
raw data, using XML documents.
1.4 What technologies could be used to create materials test
reports from XML documents?
The proposed method of storing materials test report data in an XML file and
subsequently producing a report consists of the following steps;
1. create a form to enter data to an XML document
2. apply formatting to the XML document for display as a report in a web
browser.
1.4.1 Creating forms
Forms are a convenient method of entering data for end-users. Form element
types include text boxes, option lists and check boxes. Database software
packages often use forms for data entry. An example is Microsoft Access 2000
(Microsoft, 1999), which has a form design component. Microsoft Access 2000
does not however support the creation of XML documents.
XForms (XForms activity, World Wide Web Consortium, 2003) discussed in
section 3.6, is a proposed specification for a form based technology for XML.
3
However, the specification is not yet finalised and hence not fully
implementable in software.
HTML (HTML activity, World Wide Web Consortium 1999) is commonly used
to provide web-based forms, via a web browser. A number of scripting
technologies are available to manipulate the form data, including Perl (The Perl
Foundation, 2003). The combination of HTML and Perl can potentially,
therefore create a form based data entry system for XML documents. The
development time for a HTML/Perl system is expected to be less than that for
a language such as Java or C++, as the web-browser would provide most of
the interface and form elements via relatively simple HTML code.
Microsoft InfoPath 2003 beta 2 edition (Microsoft (b), 2003) is a software
package, in beta test state at the time of writing, which provides a graphical
interface for XML-based form design. A short description of its capabilities is
given in section 4.1.
As HTML with Perl is a widely available development system, and Microsoft
have vast influence in the business software market, a comparison of these
two methods of creating forms for XML documents would be a very useful
exercise.
1.4.2 Formatting the XML file to produce report
The final step of formatting the XML document to produce a report, is to be
performed by applying style sheets to the XML document. Scripting languages
such as Perl could be used to read and format the XML document, but the XML
data would need to be parsed. This is achievable but adds complexity to the
formatting process. Style sheet technologies can recognise XML components
4
directly, and are specifically designed to apply formatting to markup documents
such as those produced with HTML and XML. Two style sheet languages that
have support in later versions of Microsoft and Netscape web browsers are
Cascading Style Sheets (CSS) and eXtensible Style sheet Language
Transformations (XSLT) which is part of eXtensible Style sheet Language,
XSL. The specifications of both these languages are defined by the World
Wide Web Consortium. The specification of CSS is given in CSS activity, World
Wide Web Consortium, 1998 and that for XSLT is given in XSL activity, World
Wide Web Consortium, 1999. The style sheet languages are discussed in
section 3.6.
1.5 Proposed title for research:
An XML Approach Using Forms and Style Sheets to Generate Materials Test
Reports.
1.6 Research questions:
1. How does the use of HTML and Perl for data entry and XML document
production compare with the use of Microsoft InfoPath for the same
purpose?
2. Are style sheets capable of generating a formatted report from a suitable
MatML v3.0 data file?
3. What are the benefits and limitations of using XML documents and style
sheets to generate materials test reports?
4. How can a review be performed on an XML Schema such as MatML?
5. Is it possible to produce a valid MatML v3.0 data file using HTML and
Perl?
5
1.7 Research aims:
1. Develop a framework for review of the MatML schema.
2. Compare the benefits and problems of two approaches to data entry for
XML document creation;
I. use of HTML forms and Perl script
II. use of Microsoft InfoPath
Comparison criteria will include the following;
 development time
 cost of software
 flexibility in handling of data
 suitability or commercial application
 feasibility of use on multiple platforms
3. Assess the benefits and problems of using XML documents with style
sheets to generate reports.
This comparison will be conducted in a more general context, with
criteria that include;
 ease of use
 use of XSLT for formatting
 use of the XML file format
 commercial implementation issues.
6
2 Research and project strategy
2.1 Paradigms and methodology
Hussey & Hussey (1997)1 describe two research paradigms, the positivist and
phenomenological. These authors report that the positivistic paradigm is an
approach used by social scientists, is similar to that used by natural scientists.
This approach is said by the above authors to be objective, scientific and to
assume that laws provide the basis for explanation. The phenomenological
approach is said by the above authors to be qualitative, humanistic and
subjectivist. It is asserted that the researcher has an effect on and cannot be
separated from the research (in social science).
The concepts in Hussey and Hussey (1997) are described mainly in the context
of business, socially based, research (see definitions of positivist and
phenomenological paradigms above). An attempt will be made to map the
concepts for this, technological dissertation proposal.
A number of research concepts (in research associated with human systems)
are described by Hussey and Hussey (1997), including validity, reliability and
triangulation. Reliability could be portrayed as a measure of the repeatability
of research findings. Validity is stated by Hussey & Hussey (1997) to be the
extent to which the research findings accurately represent what is really
happening in a situation. This could be described as the effectiveness of the
research method, with respect to the stated aims of the researcher.
1 Hussey and Hussey (1997) is the recommended text for the research methods module,
UQQ803HM at UWE.
7
Triangulation is the use of different research methods in the same investigation.
Where appropriate (perhaps in the study of human systems), use of
triangulation could reduce bias effects.
This investigation is naturalistic and hence objective in nature. It could be said
that there is an element of data triangulation in the multiple comparison
techniques for examining the XML documents generated by the two form
systems described in 1.4.1. Validity for this objective system could be expected
to be high. Good reliability would result in the findings of the research being
applicable to multiple projects in form-based XML document production and
formatting. There are no particular ethical issues envisaged for the
investigation proposed here.
2.2 Plan
2.2.1 Feasibility study
An initial feasibility study will be performed. This will be essentially qualitative,
the purpose being to explore approaches, highlight problems and investigate
achievability of project aims.
2.2.2 Review of MatML v3.0
Use an analytical approach to provide and implement a framework for the
review of MatML v3.0.
2.2.3 Project development
On completion of the feasibility study, the development of the application
system will begin. This will involve;
1. production of a web-based interface in HTML, for entering data
8
2. development of Perl code that will write data into an XML file
3. development of a Microsoft InfoPath application, for entering data
4. development of a style sheet system that will transform the document
file into a report format.
2.2.4 Assessment of data
This assessment of documents produced in project, to involve
a. comparing project XML documents with the source XML
document
b. testing of XML documents with a validation tool
c. comparison of formatted, transformed XML document with a
word-processed report based on the same data.
A schematic of the intended map of the dissertation is shown in Figure 1.
9
Figure 1 Schematic map of the dissertation
Review of MatMLHTML/Perl
Microsoft
InfoPath
Report generation:
Style sheet
Form creation:
Comparison
Literature Survey Feasibility StudyResearch questions
Objectives/Preparation
Analyse XML approach
to report creation
Key: activity objective
process analytical
10
3 Literature Survey
3.1 Introduction to markup languages
XML (XML activity, World Wide Web Consortium, 2000) was created as a
markup technology. A markup is information embedded with text that
describes the document structure. SGML (Standard Generalized Markup
Language) is a sophisticated international markup language, developed as an
early standard for structuring documents stored on a computer. SGML is now
defined by an international standard, (International Organisation for
Standardisation, 1986). SGML can be used to create other markup languages,
by using a specification known as a DTD (document type definition).
Hypertext Markup Language (HTML activity, World Wide Web Consortium,
1999), abbreviated to HTML, is derived from SGML, and can be described as
a DTD of SGML. HTML is a much simpler language than SGML that is
accessible to end-users. HTML provides many features to format information
for display, but cannot create user-defined markups.
XML is another, more recent DTD of SGML. XML is a data-centric markup
technology that has the ability to create user-defined markup tags. The
purpose of these tags is to organise hierarchical data. The data can then be
displayed or processed.
Constraints can be applied to XML data with validation rule systems. One is
document type definitions, DTD (XML activity, World Wide Web Consortium,
1998). DTD for XML is a specific system for applying constraints to XML
elements. A more sophisticated constraint system is provided by XML Schema
(XML schema activity, World Wide Web Consortium, 2003). User-defined
11
schema, such as MatML can be created using XML Schema. Validation
systems are discussed in section 3.3.
The body that approves generally accepted markup languages for the internet
is currently the World Wide Web Consortium (1994), also known by the
abbreviated name W3C. The World Wide Web Consortium is an international
industry consortium that creates specifications for open standards on the web.
3.2 Benefits of XML
The World Wide Web Consortium (XML activity, 2003) state a number of
benefits of XML, including;
 enables internationalised media-independent electronic publishing
 provides underpinnings of the Semantic Web, enabling a new level of
interoperability and information exchange
 encourages industries to define platform-independent protocols for the
exchange of data, including electronic commerce
 allows people to display information the way they want it, under style
sheet control.
Many authors including Barillot and Achard (2000), Lie and Saarela (1999), and
Ceri et al (2000) focus on the ability of XML to add semantics to content. This
is a key attribute of XML. The user-defined, descriptive tags, hierarchical data
structure and the ability to add constraints and validation rules with a DTD (or
XML Schema) could be considered as the foundation for the semantic nature
of XML. The structure and meaning provided by the semantics of XML are
intended to be machine readable, and from this results other advantages
attributed to XML, for example XML as a language for data interchange.
12
Begley and Sturrock (2000) identify three principal features that XML brings to
the web;
 extensibility: users can identify their own tags and attributes in their
documents
 structure: users can define their own DTD
 validation: users can test the conformance of their documents to the
structure defined by the DTD.
These points are considered an apt summary of the advantages provided by
XML for users intending to create hierarchical data structures. It should be
noted that whilst XML is often quoted with respected to web-based information
systems, the XML language is not exclusively for web internet usage. An
information system that can handle a hierarchical data structure is likely to be
usable with XML. Barillot and Achard (2000) claim that XML is a powerful
language for data interconnection, and its operating system independence
makes it a universal hub between databases.
3.3 DTD and schema for XML
A DTD can be produced to define rules for the XML document. These include,
for example, what child elements a parent element must have, whether an
element is able to have child elements and whether an element must consist
of plain text.
The World Wide Web Consortium (XML activity, 2003) comment that
automated processing of XML documents requires more rigorous and
comprehensive facilities than are available with the DTD mechanism of
declaring constraints on the use of markup.
13
Elliot (2003) and Needleman (2001) call attention to some specific limitations
of DTDs. DTDs are said to enforce basic structural rules on documents, but
not provide fine control over the format and data types of elements (Elliot, 2003,
p254). Needleman (2001) refers, in particular to DTDs limited support for data
types.
XML Schema is a more recent language designed for users to validate and
constrain XML applications. XML applications based on XML Schema will be
denoted by ‘schema’. The World Wide Web Consortium produced a
requirements document in 1999, (XML schema activity, World Wide Web
Consortium, 1999), which states that XML Schema will address specific goals
beyond DTD functionality, including integration of structural schemas with
primitive data types, and inheritance. The XML Schema language was
delivered by the World Wide Web Consortium as a formal recommendation in
2001 (XML schema activity, World Wide Web Consortium (a), (b), 2001).
Needleman (2001) reports that XML Schema does add support for many types
of data, including Boolean, integer, date etc, in addition to supporting user-
defined types. XML schemas are defined using XML notation.
Use of XML Schema has a particular advantage with respect to data validation.
Often, applications use complex code to validate data documents. Needleman
(2001) comments that an XML aware parser would be able to perform data
validation prior to the XML data file reaching the application that had to process
it.
3.4 Applications of XML
XML is considered to be a format for data interchange. However, a number of
commentators, including Ceri et al (2000) and Barillot and Achard (2000), point
14
out that XML provides syntax but does not directly provide semantic
interoperability. These commentators relate that the semantic descriptions
necessary for data interchange are defined for specific domains, for which
there is a ‘community agreement’. This type of agreement is not defined, but
is expected to be a formal or informal accord between major bodies such as
large corporations, government departments and standards institutes.
MathML is a markup language that is supported by the World Wide Web
Consortium and they have issued a recommendation for it (Math activity, World
Wide Web Consortium, 2001). In general, a successful markup language will
require widespread acceptance by those that may use it. Examples of markup
languages that have been developed recently include ThermoML. ThermoML
is described by Frenkel et al (2003) as an XML-based approach for storage
and exchange of thermophysical and thermochemical property data. A
complete schema has been developed according to Frenkel et al, and
institutions such as the Thermodynamic Research Centre at the National
Institute for of Standards and Technology (NIST, USA) along with the Journal
of Chemical Engineering Data have been involved.
NIST is also reported to be involved in the development of MatML, a language
for describing materials property data (Begley and Sturrock, 2000). According
to Begley and Sturrock, the next steps for MatML would be acceptance testing
and dissemination through various standards and professional organisations.
XML can also be used with database management systems (DBMS).
Seligman (2001) states that XML and database technology complement rather
than compete with each other. Seligman points out that DBMS provide specific
facilities including highly tuned query and transaction processing, recovery,
15
indexes, integrity and triggers. These facilities go beyond the structure of the
data. Many DBMS now provide import and export facilities to an XML data
format.
3.5 Is XML self describing?
Ceri et al (2000) state that XML is self-describing, as semantic annotations
provide information about the content of the document. Simeon (2003)
however, asserts that XML is not self-describing, due to the dependence of the
internal format (corresponding to an external XML description) on the XML
Schema (or DTD). Nambiar et al (2002) point out that SGML parsers require
stricter adherence to the DTD, and report this as a shortcoming for everyday
use. The use of a DTD or schema with XML is optional. Whilst an XML data
file may not be fully self-describing in isolation, a unit consisting of an XML data
file with a DTD or XML Schema would be self-describing. Further, according
to the World Wide Web Consortium (XML schema activity, World Wide Web
Consortium, 1999) XML schema, document their own meaning, usage and
function, implying that they are self-describing.
3.6 Forms for XML
XForms is a planned specification by the World Wide Web Consortium
(XForms activity, World Wide Web Consortium, 2003) in the process of
development. XForms is intended to be the successor to HTML forms
(w3cschools (a)), and as such, implementation in future web-browsers can be
expected.
The aim of XForms is to separate the purpose of a form from its presentation.
The form would consist of components that describe the functions of the form,
16
and control its appearance (XForms activity, World Wide Web Consortium,
2003). XForms operates on the basis of XML instance data, represented as
form controls (O’Reilly, 2003). XForms also provides some basic functions
such as ‘sum’ and string controls. (w3Schools (b)).
The current web browsers Netscape Navigator v7.02 and Microsoft Internet
Explorer v6.0 do not support XForms.
Khare (2000) mentions the aim of separating data, logic and presentation and
in this respect asks the question;
“Can Xform Transform the Web?”
Khare points out that XForms needs to account for various technologies,
including XML schema, style sheets and multimedia. According to Khare, if
web authors will be required to learn all these associated technologies in order
to use XForms, the XForms approach will be hindered.
3.7 Style sheets
The World Wide Web Consortium, promotes the use of style sheets to format
web-based documents (Style activity, World Wide Web Consortium, 2003).
Style sheet technology is part of the movement that separates web content
from display, has practical advantages in its usage, and is the primary method
accessible to the end user of formatting XML data for web browser display.
A style sheet is a set of syntactical rules that applies visual style to web page
elements. Prior to style sheets, custom visual attributes such as typefaces,
colour and font size were defined in HTML tags for every instance of the
element. Style sheets, using a style sheet language, enable definitions for the
17
visual attributes to be separated from the web content. A single style sheet
can be applied to multiple documents.
A style sheet language that is supported by Microsoft Internet Explorer v6.0
and Netscape Navigator v7.02 is CSS (Cascading Style Sheet) language (CSS
activity, World Wide Web Consortium, 1998). The term cascading refers to
order of precedence from the designer (highest) to the user, then to the
browser. CSS can apply formatting to HTML and XML documents. A second
style sheet language is XSL (eXtensible Style Languages), as defined in XSL
activity, World Wide Web Consortium, 1999). This currently has less browser
support than CSS, but has the additional functionality of being able to transform
XML documents. This latter functionality is known as XSLT (eXtensible Style
Language Transformations) and is a subset of XSL. For example, an XML
document could be transformed to a HTML document using XSL. Both CSS
and XSL can be used to format XML data.
Thomas (1999) describes features in the CSS specification that are expected
to be of practical use to web developers. These include the use of paged
media, which will be used for printable web pages not designed to scroll, and
page position control. The latter feature may become significant, as according
to Thomas, many web designers use tables for position control, a purpose for
which it was not designed.
Some commentators, including Marden and Munson (1999), report that CSS
technology has limitations. A further comment is made that CSS does not
support arbitrary mathematical expressions for values. The authors comment
that XSL does accept arbitrary mathematical expressions for values, but state
that XSL notation is verbose (it uses XML syntax) and its usage requires a
18
steep learning curve. Bos (2001), however, states that XML is verbose by
design, reporting the advantage that XML data can be human readable if the
need arises (although in general this should not be necessary). It should be
noted that Marden and Munson advocate the use of another style sheet
technology ‘Proteus and PSL’.
Lie and Saarela (1999) state that the use of style sheets is complementary to
structured documents. The separation of content and presentation is said by
the aforementioned authors to be a requirement for device independence, all
device specific information being held in the style sheet. The achievement of
device independence would be a major advantage in the use of style sheets.
Style sheet technologies are under development via the World Wide Web
Consortium, and some current recommendations are in place, such as CSS,
but further refinements and features are possible.
19
4 Review of software tools
This purpose of this review is to give an overview of the relevant features of the
software tools used.
4.1 Microsoft InfoPath
In 2003 Microsoft Corporation released a beta test version of its integrated
applications suite, Microsoft Office 2003 beta 2 edition (Microsoft Corp. (b),
2003). This suite was reported by Microsoft as supporting the XML format. A
new component of the Office suite is Microsoft InfoPath (Microsoft Corp. (a),
2003). The purpose of InfoPath is to create forms that can output data in XML
documents. The InfoPath software generates a form from a user-defined
template, the native file format of that form is XML. InfoPath is able to create
templates from a number of sources, including XML Schema and XML
documents.
InfoPath uses a ‘drag and drop’ graphical user interface (GUI) to create a form
template. Components of the source document can be dragged to the design
screen, where a form field is automatically created for each component.
InfoPath is able to map the form elements to the original XML structure. Design
features include the optional use of tables, graphics and text labels.
Components can be specified as repeating where appropriate.
Once the basic template is generated, repeating sections can be multiplied.
The template can be used to produce any number of forms. Data can be
entered into the forms, the native file format being an XML structure based on
the original source used to design the template.
20
4.2 Web browsers
Two web browsers were used, Microsoft Internet Explorer v6.0 (Microsoft
Corp., 2001 and Netscape Navigator v7.02 (Netscape Communications
Corporation, 2003).
4.3 HTML, CSS and XSLT
A web programming environment that is available at no cost is HTML Kit build
292 (Chami.com, 2002). This software provides assistance on the parameters
for HTML v4 tags, but not a ‘WYSIWYG’ (what you see is what you get) editable
view of the document. HTML Kit has the capability of accepting ‘plug-ins’ from
third-party developers that provide extra functionality, for example, viewing an
XML document as a tree.
The Microsoft FrontPage component (Microsoft Corp. (c), 2003) of the
Microsoft Office 2003 beta edition is a web design package that does include
a ‘WYSIWYG’ editable view of the document, as well as a code view. The
‘WYSIWYG’ is particularly useful for designing sophisticated tables, and adding
form elements.
The above-mentioned packages accept CSS commands in HTML pages.
Programming XSLT requires a software processor module that applies the
transformation to a document. HTML Kit can utilise a plug-in that performs this
task (Glass, 2001). Microsoft FrontPage can process XSLT transformations.
4.4 Perl
The use of Perl scripts requires a Perl interpreter and a web-server on the
computer. A Perl system was implemented on a stand-alone computer using
21
IndigoPerl build 08 from IndigoStar Software (2001). This includes a Perl
interpreter, Perl v5.6.1 and a pre-configured web server, Apache v1.3.2.
Editing of Perl scripts can benefit from a visual interactive development
environment (IDE). Such a system is Optiperl by Xarka Sofware (2003). This
includes the following features;
 interactive debugger for Perl
 internal server
 will read form variable values from HTML file
 built-in preview facility for output.
These features can significantly shorten program development time.
4.5 Software for validating an XML document and user-
defined XML schema
The user-defined schema should conform to the requirements of the
specification for XML Schema (XML schema activity (a), (b), World Wide Web
Consortium, 2001). To check for this conformance, a validation of the schema
can be performed with suitable software tools. Many of these tools are
commercial software packages, for example XML Spy v5.0 (Altova Gmbh,
2002). A tool that is available without cost is the web based XSV system v2.5-
2 (Thompson and Tobin (a), 2003), available for use via forms on the internet.
The XSV system is described by its authors as being in beta test state. XSV is
therefore not a final product at the time of writing but can be used as an
indication of schema conformance. Versions of XSV were made available for
use on a stand-alone computer. Command-line XSV version 1.4 (Thompson
22
and Tobin (a), 2003), dated 9/07/2003 was utilised in the later stages of this
investigation.
The Sun MSV system ver2 (Sun Microsystems, 2003), is stated by Sun
Microsystems to support a subset of XML Schema Part 1 (XML schema activity
(a), World Wide Web Consortium, 2001), hence not all schema errors may be
recognised. Sun’s MSV is however available at no cost and can be used on a
stand-alone computer. The MSV system could therefore be used for
debugging during schema development.
Another method of validating a schema or XML document is to import it into
Microsoft InfoPath. Usage of Microsoft InfoPath indicates that validation events
occur on importing XML documents and XML schema. Microsoft InfoPath
performs a check on the schema as it is imported. An XML document may
contain an association with a schema. On importing a document that does
have such an association into Microsoft InfoPath, the software will initially
check the named schema. If the schema generates no errors, the document
will have its conformance to the schema checked.
XML schemas and documents that link to a schema contain a declaration; a
uniform resource indicator (URI) that has been assigned to the W3C XML
Schema. The validation tool processes elements on the basis of this
declaration (White et al, 2001). Some documents (including MatML examples)
include a declaration that is obsolete ‘http://www.w3.org/2000/10/XMLSchema-
instance’. Current validation tools require an up to date declaration.
4.6 Comparing two documents
Beyond Compare 2 (Scooter Software, 2003) is a utility that can compare two
text files and highlight any differences.
23
5 Feasibility study
5.1 Introduction
A feasibility study was performed in relation to the project and selected
research aims. The objectives of the feasibility study were to;
1. explore possible approaches in achieving aims
2. investigate the achievability of aims
3. highlight practical problems and limitations.
Particular questions that could be answered in a feasibility study are shown in
relation to appropriate project and research aims;
Review the MatML XML schema for materials test data.
1. How are data organised into a hierarchy suitable for XML?
2. What is the nature of the XML Schema declarations that implement this
hierarchy?
Use HTML and Perl to produce a valid XML data file.
1. What form design can be used to accept user data?
2. How can Perl manipulate the form data to create a valid XML document?
Use style sheets to generate formatted reports from XML documents.
1. Which style sheet technology or combination of technologies is better at
producing a report from the XML document?
2. Are there any particular difficulties in formatting a MatML file?
A method was specified for a program of work that would assist in answering
these questions. The program utilises HTML web pages and Perl scripts
produced by Shaheedullah (2003), including a HTML form allowing entry of
24
statistics relating to basketball player statistics. The statistics are written to a
file on the server using a Perl script. Another Perl script reads the data from
disk, then formats them for display in a web browser. These web pages and
Perl scripts were used as the base material for the feasibility study. The steps
in the work program are as follows;
 organise the data structure into a hierarchy
 create an XML schema for the data structure
 utilise the existing HTML form to collect data and send it to a Perl script
 adapt the Perl script to write the data directly to an XML file
 use a validation tool to test conformance of the XML file to the schema
 use CSS to format and display the XML file
 use XSLT and CSS to format and display the XML file.
5.2 An XML schema for the basketball data
An XML schema is composed of elements and attributes. Data types of both
of these can be selected from predefined types, or user-defined types. The
basic types used in XML Schema are described, following;
5.2.1 XML Schema data types
The two core element types possible in XML Schema are simpleType and
complexType, from which specific elements are derived.
Simpletype
These are based on data types, such as string, integer, decimal, float, date,
time. Further simpleTypes include;
25
1. customised versions of basic data types, e.g. with maximum and
minimum values set
2. enumerated – a set of possible values
3. list – multiple values for a given element separated by whitespace
4. patterned types – simple coded patterns can be created e.g. for
telephone number.
ComplexType
ComplexTypes can define elements’ relationships with each other and their
own attributes. The four basic complexTypes are listed below;
1. empty elements; have attributes but no text value or child elements only
2. element-only; can have child elements but no text
3. mixed elements; can be set to have just a text entry or both text and
child elements
4. sequences and choices; specific sequences of elements can be defined,
and a series of optional elements can be given.
5.2.2 XML schema definition
An XML schema was produced for the basketball data structure. This involved
selection of entities, creating a hierarchical structure, and defining constraints.
The entities selected for use in the schema are as follows;
 name (first and last)
 age
 height (feet and inches)
 points
26
 steals
 rebounds
 blocks.
Some additional entities were specified to facilitate a logical hierarchy, based
on a player who has personal elements and score elements;
 person - consists of name, age, height
 scores - consists of points, steals, rebounds, blocks
 player - consists of person and score.
A hierarchy can be drawn. A judgement needs to be made as to whether
related entities will be described by a parent/child relationship or
element/attribute relationship. In this example, ‘first’ and ‘last’ were considered
to be entities, which were child entities of ‘name’. The units ‘feet’ and ‘inches’
however, were assigned as attributes of the height element.
The selected entities were organised into a hierarchical structure of elements
and attributes, shown in Figure 2 following;
27
Figure 2 Hierarchy of entities for basketball statistics
5.2.3 Constraints
XML Schema allows its entities to be constrained by data type. For example,
‘points’ needs to be an integer, this definition can be effected by a declaration
in XML Schema. XML Schema data types can also be customised to provide
user-defined types, for example the minimum and maximum values of the
integer value for ‘points’ could be defined. A simple set of constraints were
defined for the entities in Figure 2, these are shown in Table 1, following.
player
person
name age height
:feet
:inches
scores
points steals rebounds blocks
first last
element
:attribute
key
28
Table 1 Data Dictionary for basketball statistics hierarchy
Element XML Schema data
type
Constraint
player complexType Element only
person complexType Element only
scores complexType Element only
name complexType Element only
first simpleType String
last simpleType String
age complexType Integer
height complexType Empty
feet (attribute of height) simpleType Integer
inches (attribute of height) simpleType Integer
scores complexType Element only
points simpleType
Customised integer
Min 0 max 9999
blocks simpleType
rebounds simpleType
steals simpleType
5.2.4 Schema implementation
A simple model of the domain now exists, consisting of;
 hierarchy
 constraints.
An XML schema was developed to implement the model. The final schema is
shown in Listing 1 below.
Listing 1 Basket.xsd
<?xml version="1.0" encoding="utf-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="basketball">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="player" type="person" />
<xsd:element name="playerscores" type="scores" />
</xsd:sequence>
</xsd:complexType>
29
</xsd:element>
<xsd:complexType name="person">
<xsd:sequence>
<xsd:element name="playerheight" type="height" />
<xsd:element name="playername" type="names" />
<xsd:element name="playerage" type="age" />
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="names">
<xsd:sequence>
<xsd:element name="firstname" type="xsd:string" />
<xsd:element name="lastname" type="xsd:string" />
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="age">
<xsd:simpleContent>
<xsd:extension base="xsd:int" />
</xsd:simpleContent>
</xsd:complexType>
<xsd:simpleType name="scoreval">
<xsd:restriction base="xsd:integer">
<xsd:minInclusive value="0" />
<xsd:maxInclusive value="9999" />
</xsd:restriction>
</xsd:simpleType>
<xsd:complexType name="scores">
<xsd:sequence>
<xsd:element name="points" type="scoreval" />
<xsd:element name="blocks" type="scoreval" />
<xsd:element name="rebounds" type="scoreval" />
<xsd:element name="steals" type="scoreval" />
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="height">
<xsd:complexContent>
<xsd:extension base="xsd:anyType">
<xsd:attribute name="feet" type="xsd:integer"
use="required" />
<xsd:attribute name="inches" type="xsd:integer"
use="required" />
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
</xsd:schema>
5.2.5 Schema validation
A validation test for the schema was performed using the web based XSV tool
v2.5-2. The test reported that there were no validity problems with the
basketball schema file.
30
5.3 Form creation using HTML
The original web form for the basketball data was coded in HTML 4.0
(Shaheedullah, 2003). The web form is contained in a single table. The code
was re-prepared to utilise cascading style sheets (CSS). Form elements are
used for data input. Two types of form element are used, text fields and list
boxes. Text fields can accept alphanumeric input. List boxes are used for the
height data, where the value is entered as feet and inches. The use of these
units allows data values to be constrained, to a range of 4 to 7 for ‘feet’ and 1
to 11 for ‘inches’, these are presented as selectable options within the list
boxes.
The web form created with HTML and CSS is shown in Figure 3 below.
Figure 3 Web-form for basketball data
On submitting the query, the data are passed to a Perl script via the ‘post’
method. The Perl script parses the data, (matching the form element name
31
and the value entered) into variables. These variables can be combined with
additional characters and strings, resulting in generation of XML elements.
These elements can then be written to disk as an XML document.
Prior knowledge of the XML document format was provided by generating an
instance of the basketball XML schema. Software packages are available to
achieve this, including XML Spy v5.0 (Altova Gmbh, 2002).
The segment of the Perl script that writes the file is shown in Listing 2 below.
Listing 2 Perl script segment, writeball.cgi
#write text fields
print OUTF "<?xml version="1.0"?>";
print OUTF "n";
print OUTF "<basketball";
print OUTF " xmlns:xsi="http://www.w3.org/2001/XMLSchema-
instance"";
print OUTF " xsi:noNamespaceSchemaLocation="basket.xsd">";
print OUTF "n";
print OUTF "<player>n";
#
print OUTF "<playerheight ", "feet=", ""$FORM{'feet'}" ",
"inches=",
""$FORM{'inches'}" />n";
#
print OUTF "<playername>n";
print OUTF "<firstname>", "$FORM{'fname'}", "</firstname>";
print OUTF "<lastname>", "$FORM{'lname'}", "</lastname>";
print OUTF "</playername>n";
#
print OUTF "<playerage>", "$FORM{'age'}", "</playerage>n";
#
print OUTF "</player>n";
#
print OUTF "<playerscores>";
print OUTF "<points>", "$FORM{'points'}", "</points>n";
print OUTF "<blocks>", "$FORM{'block'}", "</blocks>n";
print OUTF "<rebounds>", "$FORM{'rebound'}", "</rebounds>n";
print OUTF "<steals>", "$FORM{'steal'}", "</steals>n";
print OUTF "</playerscores>";
print OUTF "</basketball>n";
The XML document that is produced from the web form and Perl script is shown
in Listing 3, following.
32
Listing 3 XML document, bball.xml
<?xml version="1.0"?>
<basketball xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="basket.xsd">
<player>
<playerheight feet="6" inches="4" />
<playername>
<firstname>Joe</firstname><lastname>Blogss</lastname></playername>
<playerage>24</playerage>
</player>
<playerscores><points>35</points>
<blocks>4</blocks>
<rebounds>3</rebounds>
<steals>2</steals>
</playerscores></basketball>
The Perl script writes the necessary tag identifiers as fixed values and the
content is taken from the form’s variables. Validation of the form values is not
performed by the Perl script, as this is conducted by an XML validation tool.
5.4 Form creation using Microsoft InfoPath
Microsoft InfoPath can use an XML schema or XML document as a source for
form creation. An attempt was made to use the basketball schema file,
basket.xsd in Listing 1, as a source to create a form for data entry. Microsoft
InfoPath gave the following error message
“ Derived type and the base type must have the same content type.
Base type : '{http://www.w3.org/2001/XMLSchema}anyType'
Derived type : 'height' “
The alternative to using the schema as a source was to use the XML document
written by the HTML/Perl system, bball.xml (Listing 3). Attempting to use this
resulted in display of the same error message. The XML document in Listing
3 was manually edited, removing the reference to basket.xsd. The edited
bball.xml file was then successfully imported as a source into Microsoft
InfoPath for form creation. A form template (bball.xsn) was simply created by
dragging the elements from the data source window into the design window. A
33
form was instanced from the template. The form (bballinfopathform.xml) is
shown in Figure 4.
Figure 4 Microsoft InfoPath form for basketball data,
bballinfopathform.xml (from template bball.xsn)
5.5 Validation of form-generated XML document
5.5.1 HTML/Perl system
There are a number of validation tools for XML schemas, as stated in section
4.5. The Sun MSV tool was used for the debugging the design of the basketball
schema and data file. The XML document generated by the Perl script in
Listing 2 was successfully validated against the schema in Listing 1. The XML
document was modified to generate errors and observe the output of the
validation tool, the data are shown in Table 2, following.
34
Table 2 Sun MSV validation tool output for bball.xml
Change to XML document Validation tool output
None the document is valid.
Age = 3r Error at line:7, column:26 of
file:///c:/perl5/htdocs/bball.xml "3r" does not
satisfy the "int" type
the document is NOT valid.
Points = -3 Error at line:9, column:34 of
file:///c:/perl5/htdocs/bball.xml the value is
out of the range (minInclusive specifies 0).
the document is NOT valid.
Age = “” (empty) validating c:perl5htdocsbball.xml
Error at line:7, column:24 of
file:///c:/perl5/htdocs/bball.xml
Content of element "playerage" is
incomplete
the document is NOT valid.
Firstname=”” (empty) the document is valid.
Final validation of the XML documents that were produced by HTML/Perl was
performed by the XSV tool, described in section 4.5. The bball.xml document
passed the validation test, with no errors reported.
5.5.2 Form creation using Microsoft InfoPath
As stated above in section 5.4, the Microsoft InfoPath system did not accept
the basket.xsd schema. Hence, instances of the basketball schema could not
be validated against the schema using this software. However, a validation
test of the Microsoft InfoPath XML document against the basketball schema
was attempted with the XSV tool (command-line). An error was reported with
a language attribute. The attribute was removed from the document by manual
editing, and the modified document passed the validation test.
In order to observe Microsoft InfoPath schema validation capabilities, a copy of
the basketball schema was made, excluding the height element. This schema
35
was accepted by Microsoft InfoPath, and a simple form prepared for validation
test purposes. The changes to the XML document in Table 2 were applied,
and the results for Microsoft InfoPath are shown in Table 3.
Table 3 Microsoft InfoPath output for form based on modified schema
(excluding height element)
Change to XML document Output
None None
Age = 3r This form includes errors. Do you still want
to save it?
Points = -3 This form includes errors. Do you still want
to save it?
Age = “” (empty) This form includes errors. Do you still want
to save it?
Firstname=”” (empty) None
All non-conformances to the scheme resulted in the same error message,
indicating that on attempting to save the form, Microsoft InfoPath performs a
check of the form contents against the schema used to design it.
5.6 Using CSS to format the XML document
XML documents can be formatted with CSS for display in a web browser. This
technique was evaluated for formatting the basketball XML document,
bball.xml. There are limitations with using CSS directly with XML files. These
include the absence of a method to access information within tags, such as
attributes, and no capability to add text content. Hence, the content within tags
appears without descriptors, unless extra such tags are created in the original
XML document. A simple CSS style sheet was created and associated with
the XML document, Listing 4. The XML document was edited to associate the
style sheet. The output on loading the edited XML document into Microsoft
Internet Explorer v6.0 is shown in Figure 5.
36
Listing 4 CSS Style sheet for bball.xml, basket.css
basketball {display:block;}
playername {display:block; font:bold 14pt Courier, serif;text-
align:center;}
playerage {display:block; font:italic 12pt Courier, serif;text-
align:center;}
playerheight {display:block; font:italic 12pt Courier, serif;text-
align:center;}
playerscores {font-family:Times,serif;font-size:12pt;font-
weight:bold;text-indent:45%}
points {display:block};
rebounds {display:block};
blocks {display:block};
steals {display:block};
Figure 5 Output of bball.xml with CSS style sheet (basket.css), in
Microsoft Internet Explorer v6.0
As the player height data are held in attributes, these data are not displayed
with CSS. Further, the tag content is printed without descriptors. The
capabilities of CSS can be used with XML documents transformed using XSLT.
5.7 Using XSLT and CSS to format XML file
Modern browsers such as Microsoft Internet Explorer v6.0 and Netscape
Navigator v7.02 support an XML related technology, XSLT (extended style
sheet language). XSLT is able to effect transformations on XML files. One
usage is to transform an XML document to HTML/CSS syntax, suitable for
display. The use of XSLT in conjunction with HTML and CSS has a number of
advantages over using CSS to directly format an XML file including;
37
 attributes of XML tags accessible
 text can be added as markup tags or literal text
 search operation on tags possible
 transformation operations include
o sorting
o conditional statements
o functions on numeric data.
An XSLT style sheet was prepared for the basketball XML file, see Listing 5.
Listing 5 XSLT Style sheet for bball.xml, basket.xsl
<?xml version="1.0"?>
<xsl:style sheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.w3.org/TR/REC-html40">
<xsl:template match="/">
<html><head><title>Basketball</title>
</head>
<body>
<xsl:apply-templates/>
</body>
</html>
</xsl:template>
<xsl:template match ="player">
<h3 style="text-align:center">Player Details</h3>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match ="playername">
<div style="display:block; font:bold 14pt Courier, serif;text-
align:center;">
Name:
<xsl:value-of select ="firstname" />
<xsl:value-of select ="lastname" />
<p />
</div>
</xsl:template>
<xsl:template match ="playerage">
<div style="display:block; font:italic 12pt Courier, serif;text-
align:center;">
Age:
<xsl:value-of select ="." />
</div>
</xsl:template>
38
<xsl:template match ="playerheight">
<div style="display:block; font:italic 12pt Courier, serif;text-
align:center;">
<xsl:value-of select = "@feet" /> feet
<xsl:value-of select = "@inches" /> inches tall
<p />
</div>
</xsl:template>
<xsl:template match ="playerscores">
<h3 style="text-align:center">Scores</h3>
<p style="text-align:center;">
<table style="font-family:Times,serif;font-size:12pt;font-
weight:bold;">
<tr><td>points</td><td><xsl:value-of select ="./points" /></td></tr>
<tr><td>blocks</td><td><xsl:value-of select ="./blocks" /></td></tr>
<tr><td>rebounds</td><td><xsl:value-of select ="./rebounds"
/></td></tr>
<tr><td>steals</td><td><xsl:value-of select ="./steals" /></td></tr>
</table>
</p>
</xsl:template>
</xsl:style sheet>
The XML document was edited to associate it with the XSLT file. Importing the
XML file into Microsoft Internet Explorer v6.0 results in the output shown in
Figure 6, following.
39
Figure 6 Output of XML document bball.xml with XSLT style sheet,
basket.xsl, in Microsoft Internet Explorer v6.0
Figure 6 shows that XSLT can add text (‘literals’ in XSLT terminology) to
describe the data. Attributes of XML tags can be accessed, resulting in display
of height data. The ability to include markup tags allowed a table to be created
to display the scores data.
5.8 Discussion
5.8.1 Schema design issues
The schema developed for the basketball data shows how entities can be
organised into a hierarchical structure. The schema was ‘hand-coded’ in a text
editor. There are a number of code structures that can achieve a similar result.
For a larger schema, a visual representation of the schema would be useful.
40
Some software packages, e.g. XML Spy v5.0 (Altova Gmbh, 2002) have
customised visual representations for XML schemas.
The schema creates definitions of all elements. A root is element specified,
‘basketball’. This is the root of a hierarchy. The definitions are instanced with
a new name at the appropriate point in the hierarchy, to create the schema.
There are two alternative methods of coding a hierarchy. One is to define and
instance each element at the point of entry in the hierarchy. This has the
advantage that the full hierarchy is presented without the need for cross
referencing instances to definitions. There is the disadvantage that multiple
instances of the same element would be defined multiple times. Another
alternative is to create definitions for each element and instance them at the
point of use.
A number of the element definitions include constraints on the value. For a
fuller implementation, further constraints could be added. For example, ‘feet’
could be limited to a range, 4 to 7, and ‘inches’ limited to a range of 1 to 11.
The constraints implemented illustrate the data typing capability of XML
Schema, and provide a basis for XML document validation.
Microsoft InfoPath did not accept the basketball schema, whereas the web
based XSV tool validated it and reported no conformance problems. This
suggests a difference in interpreting schema, and is an issue to be considered.
5.8.2 Form for data entry and document generation
The HTML form created for data entry included fields for all elements of the
basketball schema. Use of appropriate form fields can limit data entry to valid
options, such as the use of list boxes for ‘height’ and ‘feet’. Tables assist in
41
structuring the display of the labels and fields. The HTML form is capable of
providing all the fields necessary for an instance of the relatively simple
basketball schema, in a concise tabular layout.
The Perl script that accepts the form data, performs straightforward
procedures, parsing and subsequent combination of form variable with
appropriate text to produce an XML document.
Schema features that could potentially make the HTML/Perl system
significantly more complex include the use of repeatable elements, and
possibly, use of identification attributes.
The use of the HTML/Perl file as the source for the Microsoft InfoPath template
is considered reasonable for the small basketball schema in the feasibility
study. However, for the main project, an original source should be used.
The Microsoft InfoPath template created from the XML document successfully
produced a form, similar to the HTML form. Once a template has been
produced, any number of forms can be created, with the basketball XML
structure as the native file format.
5.8.3 Validation of the XML document
The validation of the XML document was successful, and useful messages are
generated by the Sun Microsystems MSV validation tool. However, for a user
entering data in the form, this is a less convenient method for data entry
validation than could be performed ‘inline’ (within the form) by a Perl program.
The validation of XML files to XML schema will be persisted with, as this is
considered to be one of the strengths of using XML.
42
5.8.4 Display of XML documents
The use of CSS to directly format XML documents has limitations. Some of
these, such as the inability to add text, could be circumvented by designing the
XML schema with CSS display properties as an objective. For example, no
data would be held in attributes and titles and any necessary descriptive text
would be included as text elements. The use of XSLT and HTML/CSS to
transform a data-centric XML file is considered a superior solution. XSLT code
is more complex than CSS style sheets but simpler than scripting languages
such as Perl. A disadvantage of XSLT (in comparison with CSS) is that
formatting styles must be applied inline, rather than placing all styles in the
header or a separate style sheet document. This can be partially overcome by
placing styles in block elements, such as the <div> or <table> tags.
5.9 Conclusions
1. A full or large part of the hierarchy of a schema should be given
representation.
2. For full comprehension of a schema, the coding techniques used should
be understood.
3. HTML/Perl and Microsoft InfoPath can both be used to produce forms
for data entry that will generate a valid XML document.
4. The Microsoft InfoPath XML document requires a modification to
validate against the basketball schema with the XSV tool.
5. A combination of CSS and XSLT style sheet technologies is more
appropriate for formatting XML documents than CSS alone.
43
6 Review of MatML
The advantages of using XML could be attained by a user developing their own
schema, or alternatively by using an existing schema. A simple XML schema
for a specific purpose can be readily developed, such as the basketball data
system described in section 5.2. However, an XML schema for the domain
relating to the application may already exist. A review of the existing schema
would assist the decision the user must make between using that schema and
developing a new one. A framework for the review of an XML schema would
be required.
An existing schema for materials test data, MatML, is mentioned in section 3.4.
The MatML v3.0 schema is available for download from the MatML website
(NIST (b), 2003). A framework for the review of an XML schema such as
MatML could consist of criteria in the form of questions. These criteria would
stem from the characteristics of XML. These characteristics include its
hierarchical nature, human readability and how well the schema can be applied
to its domain.
In software engineering texts such as Pressman (2000), evaluation techniques
for software systems often describe the concepts of verification and validation.
Verification is concerned with software testing and quality assurance, whereas
validation is concerned with meeting user requirements. The design of the
schema would be expected to include detailed full cycles of verification and
validation. A parallel could be drawn matching validation with effectiveness
(performing the appropriate task) and verification with efficiency (performing
the task correctly). For the purposes of this review, which is from the
44
perspective of a user, a number of issues are presented that are based on the
above concepts.
A number of such criteria that could be used to review the MatML schema
follow;
Effectiveness
1. How human-readable is the schema?
a. Are meaningful names used for elements and attributes?
b. Can a user plot the full hierarchy?
2. Is the hierarchy logical with respect to the stated domain?
3. Does the schema provide all the necessary components for the
application?
Efficiency
4. Is the code efficient?
a. Does the schema avoid unnecessary repetition?
b. Are the most appropriate coding techniques used?
5. Do validation tools accept the schema?
These criteria are considered below.
6.1 How human-readable is the schema?
The schema does use meaningful names for attributes and elements. For
example ‘BulkDetails’ relates to the condition of bulk material before it is
processed and made into components. Elements such as ‘ProcessingDetails’
and ‘DimensionalDetails’ are self-explanatory. Attributes are given meaningful
names, other than those that are for identification reference purposes (see
45
section 6.4). The use of attributes is consistent, for example a data format is
always described as an attribute. The documentation gives a description of
each element and attribute.
The full element hierarchy of MatML v3.0 is held within the schema definition,
but viewing of this definition as written does not exhibit the full tree. The
hierarchy instances a number of element definitions using the XML Schema
term ‘ref’. In the schema, the use of ‘ref’ shows only the element referenced,
any child elements of that element are shown only in its definition.
The documentation states for each element, the child elements and attributes.
A study of the schema was performed with a view to plotting the full hierarchy.
A diagram that shows only the first level below ‘Material’ is shown in Figure 7,
which includes the ‘BulkDetails’ and ‘ComponentDetails’, the two elements that
will contain materials test data. The full hierarchy each of these would contain
many instances of repeated element groups such as ‘Concentration’, which
may obscure the view of the schema. Partial hierarchies excluding some such
elements are shown in Figure 8 and Figure 9. These fuller hierarchies provide
a more human-readable view of the MatML v3.0 schema.
Figure 7 First level hierarchy of MatML
Material
BulkDetails
ComponentDetails
Metadata
Graphs
Glossary
46
Figure 8 Partially expanded hierarchy of MatML v3.0, ‘BulkDetails’
element
BulkDetails
Name
Class
Subclass
Specification
Source
Form
Processing details
Name
ParameterValue
Result
Notes
Geometry
Shape
Dimensions
Orientation
Notes
Characterization
Formulae
ChemicalComposition
Compound
Element
Symbol
Concentration
Notes
Concentration
OR Element
Symbol
Concentration
Notes
Concentration
Notes
PhaseComposition
Name
Concentration
Property
data
Data
Qualifier
Uncertainty
Parametervalue
Notes
Notes
DimensionalDetails
47
Name
Value
Units
Qualifier
Notes
PropertyData
Data
Qualifier
Uncertainty
Parameter value
Notes
Notes
Figure 9 Partially expanded hierarchy of MatML v3.0
material
ComponentDetails
(as BulkDetails, Figure 8) +
AssociationDetails
Associate
Relationship
ComponentDetails
Metadata
Data sourceDetails
Name
Notes
PropertyDetails
Name
Units
Notes
MeasurementTechniqueDetails
Name
Notes
SpecimentDetails
Name
Notes
ParameterDetails
Name
Units
Notes
Graph
Graph
Glossary
Term
Name
Definition
Abbreviation
Synonym
Notes
48
6.2 Is the hierarchy logical with respect to the stated domain?
The fuller hierarchy show the relationships between the elements in the
schema. A schematic diagram Figure 10, is a simple representation of the
domain, and is used to assess the schema with respect to the domain. This
assessment is relatively qualitative, mainly concerned with the primary
materials related elements.
Figure 10 A schematic diagram for the domain of materials property data
Bulk
Material processing
Formed
Material
processing Component
characterisation
Mechanical test
Figure 10 indicates that tests and characterisation can be performed on the
material at any stage of the processing. Mechanical tests include tensile
strength and fatigue strength, and characterisation concerns chemical and
micro-structural properties. MatML v3.0 includes a ‘PropertyData’ element to
contain mechanical test data, and a ‘Characterisation’ element to contain
characterisation data. The ‘Characterisation’ element handles the chemical
and micro-structural properties with ‘ChemicalComposition’, and
‘PhaseComposition’ child elements respectively (see Figure 8). These latter
49
child elements include the necessary technical details, such as chemical
element names and concentration, to describe them.
The ‘Characterisation’ element is a child of ‘BulkDetails’, which can be used to
describe the bulk material, and also a child of ‘ComponentDetails’, which can
be used to describe formed material and components. This satisfies the
requirement to accept test and characterisation data at all stages of processing.
The primary material related elements are therefore considered to map
logically to the domain.
6.3 Does the schema provide all the necessary components
for the application?
The MatML website (NIST (a), 2003) includes a number of examples that
indicate provision of necessary components for materials test reports. One of
the examples, MatML example 2 (Begley and Kaufman, 2003), relates to
mechanical properties of an aluminium alloy. The source for the data used in
MatML example 2 was obtained (Kaufman, 1999) to determine whether the
MatML implementation included all components from that source.
The components from the source are shown in Table 4 with MatML elements
that can contain the information, following.
Table 4 Required data for a materials test report, with corresponding
MatML elements
Components in data source MatML example 2 elements
Materials specification BulkDetails/Specification
Material reference Metadata/ParameterDetails
Product form BulkDetails/Form
Product heat treatment BulkDetails/ProcessingDetails
Test specification Metadata/MeasurementTechniqueDetails
Specimen details Metadata/SpecimenDetails
50
Test conditions Metadata/ParameterDetails
Test results BulkDetails(or
ComponentDetails)/PropertyData
Test units Metadata/PropertyDetails/Units
Notes Metadata/DataSourceDetails/Notes
Term description Glossary/Term/Definition
6.4 Is the code efficient?
The root element is ‘MatML_Doc’ which contains ‘Material’. The next level
includes five elements, bulk details, component details, metadata, graphs and
glossary. These elements have child elements and so on. The elements are
defined in one of two ways;
 within the parent elements as they are used
 as separate elements (below ‘Materials’ in the hierarchy), referenced
with ‘ref’.
The former method has the advantage that the full hierarchy is visible and the
disadvantage that the definition of the child element cannot be reused. A
separately defined element can be used multiple times, results in a more
modular system. Separately defined elements can also be instanced with a
new name (as performed for the basketball schema in 5.2) but this method is
not used in MatML v3.0.
An attribute, ‘format’, in each of the elements ‘ParameterValue’, ‘Data’, and
‘Value’ repeat the same type definition. This type definition could be created
as a separate custom data type.
Identification attributes are used, defined by ‘xsd:ID’,.and referenced with
‘xsd:IDREF’. These attributes are generally used to associate data with related
metadata, which contain elements and other attributes with meaningful names.
51
6.5 Do validation tools accept the schema?
The web-based XSV system described in section 4.5 was used to check the
MatML v3.0 schema file, matml.xsd. The output generated by XSV on testing
matml.xsd is shown in Figure 11, below.
Figure 11 XSV v1.4 software validation of MatML schema
As stated in section 4.5, Microsoft InfoPath appears to perform a validation
check on schema, when they are imported as a data source for form creation.
An attempt to import the MatML v3.0 schema into Microsoft InfoPath, resulted
in the display of the following error message;
“matml.xsd#/schema/element[2][@name =
'Material']/complexType[1]/attribute[3][@name = 'local_frame_of_reference']
Two distinct members of the {attribute uses} must not have {type definition}s
which are or are derived from ID.
<attribute> : 'local_frame_of_reference'.”
This appears to be state that an element can have only one identification
attribute. The 'local_frame_of_reference' element has two identification
attributes.
52
6.6 Conclusions
1. The MatML v3.0 schema provides a logical representation of the
materials property domain.
2. A map of the majority of the MatML v3.0 hierarchy is human-readable
3. Most of the code is efficient, one attribute is defined multiple times.
4. The MatML v3.0 schema validates with the web-based XSV v2.5-2
software validation tool, but not all Microsoft InfoPath 2003 beta 2
edition.
53
7 Project description
7.1 Selection of data source
A dataset for the project is required. A MatML example is considered a suitable
source for data, having the following advantages
 XML hierarchy for the data already exists
 the example XML document can be used as a datum for comparison of
the XML documents produced in the project.
The data in MatML example 2, make use of a number of MatML elements as
shown in Table 4, section 6.3. MatML example 2 was therefore used to provide
the dataset and datum XML document for the project. One element from
MatML example 2 was excluded from the project. This was the glossary
element. A glossary of industry standard terms would not be necessary in a
materials test report, terms could be looked up from reference texts if required.
The full MatML example 2 document is shown in Appendix A: Original MatML
Example 2 XML Document. MatML example 2 was modified to include the
current URI declaration, ‘http://www.w3.org/2001/XMLSchema’, (see section
4.5).
7.2 Form creation
7.2.1 Selection of elements from data source
In order to facilitate a simple interface and data entry format for the end-user,
certain elements are left ‘hard-coded’ in the XML document. Hence, the XML
documents produced from the project will not be entirely composed from form-
generated elements.
54
The ‘hard-coded’ elements are intended to be;
 PropertyData - all attributes
 ParameterValue - all attributes
 DatasourceDetails – attribute ‘ID’
 MeasurementTechniqueDetails – attribute ‘ID’
 PropertyDetails - all attributes, and child elements, two instances
 ParameterDetails - all attributes and child elements, three instances.
An effect of embedding the values of the above elements will be to fix the type
of test to axial fatigue stress (PropertyDetails), and fix the test conditions
(ParameterDetails). These are reasonable constraints, as the materials test
report is intended to represent a particular test, for different materials.
7.2.2 Objectives
The functional aims of the forms are, with MatML example 2 as the data source;
1. to enter metadata for;
a. material details
b. test details for axial fatigue stress
2. to enter test data for axial fatigue stress in two units;
a. ksi
b. MPa
3. to produce an XML document from the data, similar to MatML example
2.
These objectives are to be achieved for two systems that can generate XML
documents from forms;
55
 HTML and Perl
 Microsoft InfoPath.
A number of design criteria were considered including;
1. grouping related fields in separate tables
2. use of an index to link to the tables
3. the data are to be held in individual fields rather than groups.
These criteria are intended to simplify data entry, and minimise any scrolling.
All three above criteria are expected to be applicable to the HTML/Perl system.
Only the first design criterion is applied to the Microsoft InfoPath system, on
the basis of the capabilities of the menu-driven features found in the feasibility
study, 5.4.
7.3 Report Generation using XSLT
The aims for the XML document transformation are to format all elements for
display, specifically;
 provide sections with headings for materials details and test details
 provide a table with titles for test data and parameters
 display metadata such as data source and notes at appropriate places
 produce a single XSLT file that could be used for XML documents
produced by HTML/Perl and Microsoft InfoPath.
56
8 Form creation for MatML example 2 data, using
HTML
8.1 Index screen
An index is presented on loading the web-form, that provides links to other
parts of the form and other documents. This acts as a menu for data entry and
viewing of form-generated XML documents. The index is shown in Figure 12.
The first three options shown in the index all point to tables for data entry, these
are described in section 8.2.
All of the data in the web-form are encompassed in a single HTML form.
Hence, there is one ‘submit’ button for all the data, which passes the data to a
Perl script (described in section 8.3). This ‘submit’ button is given a screen to
itself, and a link to it in the index.
The fourth option is a link to the raw XML document that is produced by
processing the form. In Microsoft Internet Explorer v6.0, the XML document is
displayed as a hierarchy that can be collapsed and expanded. In Netscape
v7.02, the XML document is displayed as plain text.
The final option is to view the XML document with the XSLT transformation
described in section 10. This link displays a formatted representation of the
XML document in Internet Explorer v6.0 and Netscape v7.02.
8.2 Data entry tables
Three tables were created, designated as tables a to c for discussion;
 table a includes meta-information for material and test details.
 table b includes test data in ksi unit
 table c includes test data in MPa unit.
57
Table a uses text fields and list boxes to enter data. List boxes are used in a
number of instances where entries could be limited to a selection from fixed
values. The examples here are specification types, possible values given here
are ASTM, BS, ISO and BS:EN. These are typical standards for materials and
tests, but there are others that could be added in a commercial implementation.
The fields are organised into logical sub-groups, with sub-headings. Some
fields have labels that are different to the element names, to improve
descriptions. A number of elements are given two fields instead of one for
better flexibility. For example, the ‘MeasurementTechniqueDetails’ child
element ‘Name’ is given two separate entries. The value of this element in
MatML example 2 is ASTM E597. The form has one field with label ‘Type’ for
the ‘ASTM’ value, and another field labelled ‘Number’ for the E597 value. These
entries are under the main heading ‘Test Details’, and sub-heading
‘Specification’.
Tables b and c display forms for ‘PropertyData’ and ‘ParameterValue’
elements. These include data for the test results and test conditions. In MatML
example 2, the data are grouped in sequences of five values. In tables b and
c, separate fields are provided for each data value. MatML example 2 includes
a dataset in the MPa unit and a dataset in the ksi unit, to be entered in tables
b and c respectively. The sequences of fields in tables b and c are presented
in a column-wise format, for ease of data input.
The HTML code for the web-form is shown in Appendix B: HTML v4 Code to
Produce Web-Form for MatML example 2.
58
8.3 XML document generation from HTML form using Perl
A Perl program handles the form data and writes an XML document that
includes the form values. The technique used is that described in section 5.3.
The Perl script writes all the form data together with characters that will
generate XML elements valid for MatML example 2.
The Perl script is shown in Appendix C: Perl v5.6.1 Script to Transform HTML
Form Elements into MatML Document.
The XML document generated by the HTML/Perl system is shown in Appendix
D: XML Document Based on MatML Example 2, Produced by HTML/Perl.
59
Figure 12 Index of links for HTML form
60
Figure 13 Material and test details for HTML form
61
Figure 14 PropertyData element, HTML form
62
9 Form creation for MatML Example 2 data, using
Microsoft InfoPath
9.1 Producing a form template
The data fields describing material and test details were individually placed in
a table, Figure 16. InfoPath automatically inserts element names as text labels
for the fields, these labels can be changed without affecting the XML structure.
The ‘PropertyData’ element is inserted into the form as a group, including all
attributes and elements of ‘PropertyData’. This was necessary, as
‘PropertyData’ is a repeatable entity in MatML, and InfoPath can reflect this.
On invoking the template to create a form, repeatable sections can be
multiplied.
The ‘PropertyData’ element links to meta-information in the ‘PropertyDetails’
and ‘ParameterDetails’ child elements of ‘MetaData’. This necessitated
inclusion of ‘PropertyDetails’ and ‘ParameterDetails’ as repeating sections in
the form.
The ‘Data’ element was included as a single field for the five data entries
separated by commas.
A number of the data fields were included as dropdown list boxes (indicated by
an arrow pointing down), to limit entries to appropriate values. For example
‘parameter’ can be pa1, pa2 or pa3 and ‘format’ can be either integer or
exponential.
63
9.2 Generating a form from the template
Once the template had been created, it was invoked to create the form. A
number of form processing steps are required to produce an XML structure that
matches that of MatML example 2. The steps in form production were;
1. repeat ‘PropertyData’ to give four entries
2. repeat ‘ParameterValue’ to give three entries for each PropertyData
3. repeat ‘ParameterDetails’ to give three entries
4. repeat ‘PropertyDetails’ to give two entries
5. enter fields with appropriate values from MatML example 2
The material and test data fields are shown in Figure 16. There are four
‘PropertyData’ forms, all follow the format in Figure 17. The form for
‘PropertyDetails’ and ‘ParameterDetails’ is shown in Figure 18.
Note that the ‘PropertyData’ element includes attributes that link (via
identification attributes) to meta-information for the data source, measurement
technique, property type, and specimen type. These are ds1, mt1, pr1, and
sp1 in Figure 17. A separate form entry is only given to property type (Figure
18) as this is the only detail that has more than one value, pr1 or pr2.
The form for ‘PropertyData’ requires the user to look up the attribute for meta-
information to the relevant similar attribute in ‘PropertyDetails’. For example,
pr1 corresponds to mechanical property data in ksi, pr2 corresponds to
mechanical data in MPa. A similar look up requirement exists for
‘ParameterValue’ fields.
64
9.3 Comments
Filling in the form does require the user to look up some corresponding field
values. It is possible that this requirement may be obviated by use of more
advanced features of InfoPath. Investigation of this would be more appropriate
when InfoPath becomes a final commercial product with full support information
and instructional literature.
The form created with Microsoft InfoPath includes some Microsoft-specific tags
to associate the form to the InfoPath template. The root element also contains
some Microsoft meta-information, as attributes. The full document is shown in
Appendix E: XML Document Based on MatML Example 2, Produced by
Microsoft InfoPath.
65
Figure 15 Screenshot of Microsoft InfoPath in design mode
View of data
source
Form Design
view
66
Figure 16 Material and test details, InfoPath form
67
Figure 17 PropertyData element, InfoPath form
Figure 18 PropertyDetails and ParameterDetails elements
68
10 Report generation using XSLT and CSS
10.1 Introduction
Style sheet technologies are discussed in section 3.6. A conclusion of the
feasibility study, section 5.9, is that XSLT and CSS, rather than using CSS
alone, would be the better method of applying formatting to an XML document.
The application of these two style sheet technologies to the XML documents
created by the HTML/Perl and Microsoft InfoPath forms, is described below.
10.2 Implementation
10.2.1 Templates
XSLT files are coded in a script that uses template matching, where the
template is a script segment that corresponds to an element name. A root
template contains the initial code and can call other templates, which in turn
can execute templates of child elements by a relative path reference. Each
template can be referred to independently by its full path.
The use of separate templates for elements and relative references can
facilitate the application of a structure to the code. A number of full path
references were found to be appropriate for MatML documents. Meta-
information relating to the material is held in the ‘MetaData’ element and other
elements within ‘BulkDetails’.
For meta-information, references to element content are used with descriptive
literals. HTML with inline CSS styles are used to provide formatting. Where
possible, styles are applied in blocks using the HTML <div> tag.
69
10.2.2 Parsing grouped data
The data, held in ‘PropertyData’ child elements ‘Data’ and ‘ParameterValue’,
are in the form of lists separated by commas. These data hence require a
method of parsing them to access individual values. The more advanced
features of XSLT are used to achieve the parsing. These features included
string handlers and variables.
10.2.3 Drawing a table
The parsing is combined with the generation of HTML tags to produce a table
for the data. The data are parsed consecutively, row-wise. A method of
column-wise parsing using XSL was not ascertained. A separate table is
created for the row headings. The integration of the headings with the data in
the same table is rendered impractical by the multiple passes the XSLT
processor makes over the ‘PropertyData’ and ‘ParameterValue’ elements. The
table <align> tag is set to left for the headings table to facilitate the two tables
being displayed side by side.
The complete XSLT transformation program is shown in Appendix F: XSLT File
to Transform XML Documents Based on MatML Example 2 to a One-page
Report.
10.2.4 Association of XSLT file with XML documents
Transformation of an XML document by an XSLT file requires an association
between the two files. This is achieved by inserting a reference in the XML
document to the XSLT file using the ’xml-style sheet’ tag. This tag provides a
reference to the location of the XSLT file in the XML document. Loading the
70
XML document into a web-browser or other application capable of XSLT
transforms, applies the transformation resulting in a formatted document.
The appropriate ‘xml-style sheet’ reference tag was inserted into the HTML/Perl
and Microsoft InfoPath XML documents.
10.2.5 Results
The two XML documents were loaded into Microsoft Internet Explorer v6.0 and
Netscape Navigator v7.02. The formatted documents were observed when
printed. The results are shown in Table 5.
Table 5 XML documents with XSLT transformations, viewed when printed
from web-browsers
Method of
creating XML
document
Microsoft Internet
Explorer v6.0.0
Netscape Navigator
v7.02.0
HTML/Perl Fully Formatted
document, Figure
19
Tables not side by side, all
else formatted,
Figure 20 (note 1), Figure
18
Microsoft InfoPath Fully Formatted
document, (note 2)
Figure 21
Tables not side by side, all
else formatted, (note 1),
Figure 22
Note 1: when viewed on-screen, the tables are side by side
Note 2: by default, when an attempt to load the file into Microsoft Internet
Explorer is made, control is passed to Microsoft InfoPath. This was
circumvented by manually editing a copy of the document to exclude extra tags
inserted by Microsoft InfoPath.
71
Figure 19 HTML/Perl generated XML document, formatted with
XSLT/CSS, as displayed in Microsoft Internet Explorer v6.0, print view
72
Figure 20 HTML/Perl generated XML document formatted with XSLT/CSS,
as displayed in Netscape Navigator v7.02, partial print view
73
Figure 21 Microsoft InfoPath generated XML document formatted with
XSLT/CSS, as displayed in Microsoft Internet Explorer v6.0, print view
74
Figure 22 Microsoft InfoPath generated XML document with XSLT style
sheet, as displayed in Netscape Navigator v7.02, partial print view
75
11 Assessment of project data
11.1Introduction
Assessment of the data generated by the project is approached from following
perspectives;
 does the XML document contain all the data in the original MatML
example 2 file?
 is the XML document valid with respect to the schema?
 is the document generated by the XSLT file and XML document a
satisfactory report?
11.2 Comparison of original MatML example 2 document with
project generated XML documents
In order to ascertain whether the XML documents generated in sections 8 and
9 match the original MatML example 2 document, a file comparison exercise
was performed. The file comparison software described in section 4.6 was
used to highlight differences between text files. Note that the glossary element
in MatML example 2 was excluded from the documents produced in the project,
as discussed in section 7.1.
The results are given below.
1. Method of creating XML document: HTML/Perl
File name: Matmle2_htmlPerl_xslt.xml
Datum file: Matmle2.xml
Differences to datum file;
 datum file comments omitted
76
 datum file glossary content omitted
2. Method of creating XML document: Microsoft InfoPath
File name: HTML/Perl
Datum file: Matmle2_infopath.xml
Differences to datum file
 datum file comments omitted
 datum file glossary omitted
 extra Microsoft meta-information included
 extra attributes in root element
 order of attributes altered in some elements
 extra attributes inserted into some tags, empty
 self-terminating tag replaced by pair.
The XML documents produced by HTML/Perl and Microsoft InfoPath included
all critical data containing elements in MatML example2, other than glossary
content.
11.3 Validation of XML documents produced by HTML/Perl and
Microsoft InfoPath
Validation was performed with the command-line XSV v1.4 system, as
described in section 4.5. Microsoft InfoPath does not accept the MatML v3.0
schema, as noted in section 6.5. Hence, the Microsoft InfoPath validation
facility cannot be used to check XML documents against the MatML v3.0
schema.
77
The command-line version of XSV was used. The XML document generated
by HTML and Perl validated against MatML v3.0 without modification, Figure
23. The XML document produced by Microsoft InfoPath, however, generated
an error relating to an attribute ‘lang’, Figure 24. This attribute was deleted with
a text editor and the modified XML document saved to disk. Validation of this
modified Microsoft InfoPath form proved successful, Figure 25. A summary of
the results of the validation tests are shown in Table 6.
Table 6 Results of validating XML documents against MatML with XSV
v1.4 tool
Method of creating XML
document
File name Validation test
against MatML
with XSV
HTML/Perl Matmle2_htmlPerl_xslt.xml Pass
Microsoft InfoPath Matmle2.infopathform.xml Fail
Microsoft InfoPath (lang
attribute removed)
Matmle2.infopath_mod.xml Pass
Figure 23 Output of the results from XSV v1.4, for the HTML/Perl generate
XML document.
<?xml version='1.0'?>
<xsv docElt='{None}MatML_Doc' instanceAssessed='true'
instanceErrors='0' rootType='[Anonymous]' schemaDocs='matml.xsd'
schemaErrors='0'
schemaLocs='None -> matml.xsd'
target='file:/d:/XML/XSV/matmle2_htmlperl_raw.xml'
validation='strict' version='XSV 1.203.2.47.2.4.2.14
/1.106.2.25.2.6 of 2002/06/15 18:59:35'
xmlns='http://www.w3.org/2000/05/xsv'>
<schemaDocAttempt URI='file:/d:/XML/XSV/matml.xsd' outcome='success'
source='command line'/>
<schemaDocAttempt URI='file:/d:/XML/XSV/matml.xsd'
outcome='redundant' source='schemaLoc'/>
Figure 24 Output of the results from XSV v1.4 for the Microsoft InfoPath
XML document.
<?xml version='1.0'?>
<xsv docElt='{None}MatML_Doc' instanceAssessed='true'
instanceErrors='1' rootType='[Anonymous]' schemaDocs='matml.xsd'
schemaErrors='0'
target='file:/d:/XML/XSV/matmle2_infopathform.xml'
validation='strict' version='XSV 1.203.2.47.2.4.2.14/1.106.2.25.2.6
of 2002/06/15 1
78
8:59:35' xmlns='http://www.w3.org/2000/05/xsv'>
<schemaDocAttempt URI='file:/d:/XML/XSV/matml.xsd' outcome='success'
source='command line'/>
<invalid char='296' code='cvc-complex-type.1.3' line='1'
resource='file:/d:/XML/XSV/matmle2_infopathform.xml'>undeclared
attribute {htt
p://www.w3.org/XML/1998/namespace}:lang</invalid>
</xsv>
Figure 25 Output of the results from XSV v1.4, for the modified Microsoft
InfoPath XML document.
d:XMLXSV>xsv matmle2_infopath_mod.xml matml.xsd.
<?xml version='1.0'?>
<xsv docElt='{None}MatML_Doc' instanceAssessed='true'
instanceErrors='0' rootType='[Anonymous]' schemaDocs='matml.xsd.'
schemaErrors='0
' target='file:/d:/XML/XSV/matmle2_infopath_mod.xml'
validation='strict' version='XSV 1.203.2.47.2.4.2.14/1.106.2.25.2.6
of 2002/06/15
18:59:35' xmlns='http://www.w3.org/2000/05/xsv'>
<schemaDocAttempt URI='file:/d:/XML/XSV/matml.xsd.'
outcome='success' source='command line'/>
</xsv>
11.4 Comparison of project generated reports with word-
processed report
A word-processed report was prepared from the data in MatML example 2, to
provide a datum for comparing reports generated by the project. This report is
shown in Figure 26. Criteria for a comparison are defined as follows;
 heading styles for materials details, test details, test data and source
 all source data included
 table for test data, with;
o formatted headings
o structured display of data.
In section 10, XML documents produced from two different systems were
displayed in two different browsers, Figure 19 to Figure 22. The document that
was displayed without any modifications or display problems was that produced
by the HTML/Perl system and displayed in Microsoft Internet Explorer v6.0,
79
Figure 19. This was used to compare the better project results against the
datum, word-processed report, Figure 26.
11.4.1 Results
The results of the comparison between the project generated report and word-
processed report for formatted headings and data included are shown in Table 7 and
Table 8 below;
Table 7 Comparison of headings in project generated report and word-
processed report
Heading XML document Word-processed
document
Materials details Yes Yes
Test details Yes Yes
Test data Yes Yes
Source Yes Yes
Table 8 Comparison of included data in project generated report and
word-processed report
Data element Included in document
Material designation Yes Yes
Materials specification Yes Yes
Material form Yes Yes
Processing details Yes Yes
Test specification Yes Yes
Specimen type Yes Yes
Diameter Yes Yes
Test data / ksi (3 parameters) 10 values 10 values
Test data / MPa (3 parameters) 10 values 10 values
Notes Yes Yes
Source Yes Yes
Note:
80
The word-processed data table presented the data with each element in
column-wise form. The project generated report presented the data with each
element in row-wise form. The data tables in both documents contained
formatted headings
11.4.2 Comparison summary
The report generated by the HTML/Perl system and displayed in Microsoft
Internet Explorer v6.0 is essentially similar to the word-processed report. The
main difference is that the data table in the word-processed is effectively
transposed in the project generated report, as discussed in section 10.2.3.
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np
MSc dissertation np

Contenu connexe

En vedette

Project Documentation | Common Room Networks Foundation | 2003 - 2008
Project Documentation | Common Room Networks Foundation | 2003 - 2008Project Documentation | Common Room Networks Foundation | 2003 - 2008
Project Documentation | Common Room Networks Foundation | 2003 - 2008Gustaff Harriman Iskandar
 
E-books and App::Pod2Epub
E-books and App::Pod2EpubE-books and App::Pod2Epub
E-books and App::Pod2EpubSøren Lund
 
Chapter 04 part 2 Tech. Writing 2014-2015
Chapter 04 part 2 Tech. Writing  2014-2015Chapter 04 part 2 Tech. Writing  2014-2015
Chapter 04 part 2 Tech. Writing 2014-2015Magdi Saadawi
 
How developers write documentation
How developers write documentationHow developers write documentation
How developers write documentationSenthilkumar Gopal
 
To try use XSL for display group XML file movies
To try use XSL for display group XML file moviesTo try use XSL for display group XML file movies
To try use XSL for display group XML file moviesAey Unthika
 
Rendering XML Document
Rendering XML DocumentRendering XML Document
Rendering XML DocumentDudy Ali
 
Building Collaborative Applications with Wikis
Building Collaborative Applications with WikisBuilding Collaborative Applications with Wikis
Building Collaborative Applications with WikisMeredith Farkas
 
Wonderful World of Wikis
Wonderful World of WikisWonderful World of Wikis
Wonderful World of WikisVicki Davis
 
Doxygen - Source Code Documentation Generator Tool
Doxygen -  Source Code Documentation Generator ToolDoxygen -  Source Code Documentation Generator Tool
Doxygen - Source Code Documentation Generator ToolGuo Albert
 
Thinking of Documentation as Code [YUIConf 2013]
Thinking of Documentation as Code [YUIConf 2013]Thinking of Documentation as Code [YUIConf 2013]
Thinking of Documentation as Code [YUIConf 2013]evangoer
 
Introducción a la gestión de sistemas de información en la empresa. Universit...
Introducción a la gestión de sistemas de información en la empresa. Universit...Introducción a la gestión de sistemas de información en la empresa. Universit...
Introducción a la gestión de sistemas de información en la empresa. Universit...Julio Iglesias Pascual
 
Docs as-code-missing.-manual
Docs as-code-missing.-manualDocs as-code-missing.-manual
Docs as-code-missing.-manualMargaret Eker
 
XSLT 2010-03-03
XSLT 2010-03-03XSLT 2010-03-03
XSLT 2010-03-03kmiyako
 
DITA getting started
DITA getting startedDITA getting started
DITA getting startedRaghu nath
 

En vedette (20)

Project Documentation | Common Room Networks Foundation | 2003 - 2008
Project Documentation | Common Room Networks Foundation | 2003 - 2008Project Documentation | Common Room Networks Foundation | 2003 - 2008
Project Documentation | Common Room Networks Foundation | 2003 - 2008
 
E-books and App::Pod2Epub
E-books and App::Pod2EpubE-books and App::Pod2Epub
E-books and App::Pod2Epub
 
Chapter 04 part 2 Tech. Writing 2014-2015
Chapter 04 part 2 Tech. Writing  2014-2015Chapter 04 part 2 Tech. Writing  2014-2015
Chapter 04 part 2 Tech. Writing 2014-2015
 
How developers write documentation
How developers write documentationHow developers write documentation
How developers write documentation
 
Wikis 2008
Wikis 2008Wikis 2008
Wikis 2008
 
To try use XSL for display group XML file movies
To try use XSL for display group XML file moviesTo try use XSL for display group XML file movies
To try use XSL for display group XML file movies
 
Rendering XML Document
Rendering XML DocumentRendering XML Document
Rendering XML Document
 
Building Collaborative Applications with Wikis
Building Collaborative Applications with WikisBuilding Collaborative Applications with Wikis
Building Collaborative Applications with Wikis
 
Wonderful World of Wikis
Wonderful World of WikisWonderful World of Wikis
Wonderful World of Wikis
 
Code documentation
Code documentationCode documentation
Code documentation
 
XSLT. Basic.
XSLT. Basic.XSLT. Basic.
XSLT. Basic.
 
Doxygen - Source Code Documentation Generator Tool
Doxygen -  Source Code Documentation Generator ToolDoxygen -  Source Code Documentation Generator Tool
Doxygen - Source Code Documentation Generator Tool
 
Thinking of Documentation as Code [YUIConf 2013]
Thinking of Documentation as Code [YUIConf 2013]Thinking of Documentation as Code [YUIConf 2013]
Thinking of Documentation as Code [YUIConf 2013]
 
Introducción a la gestión de sistemas de información en la empresa. Universit...
Introducción a la gestión de sistemas de información en la empresa. Universit...Introducción a la gestión de sistemas de información en la empresa. Universit...
Introducción a la gestión de sistemas de información en la empresa. Universit...
 
Docs as-code-missing.-manual
Docs as-code-missing.-manualDocs as-code-missing.-manual
Docs as-code-missing.-manual
 
Session 4
Session 4Session 4
Session 4
 
Doxygen
DoxygenDoxygen
Doxygen
 
XSLT 2010-03-03
XSLT 2010-03-03XSLT 2010-03-03
XSLT 2010-03-03
 
Introduction to DITA
Introduction to DITAIntroduction to DITA
Introduction to DITA
 
DITA getting started
DITA getting startedDITA getting started
DITA getting started
 

Similaire à MSc dissertation np

Content and concept filter
Content and concept filterContent and concept filter
Content and concept filterLinkedTV
 
Data replication (software)
Data replication (software) Data replication (software)
Data replication (software) Masoud Gholami
 
XML2004-schwarzman
XML2004-schwarzmanXML2004-schwarzman
XML2004-schwarzmanaschwarzman
 
Rafal_Malanij_MSc_Dissertation
Rafal_Malanij_MSc_DissertationRafal_Malanij_MSc_Dissertation
Rafal_Malanij_MSc_DissertationRafał Małanij
 
[EN] PLC programs development guidelines
[EN] PLC programs development guidelines[EN] PLC programs development guidelines
[EN] PLC programs development guidelinesItris Automation Square
 
[Evaldas Taroza - Master thesis] Schema Matching and Automatic Web Data Extra...
[Evaldas Taroza - Master thesis] Schema Matching and Automatic Web Data Extra...[Evaldas Taroza - Master thesis] Schema Matching and Automatic Web Data Extra...
[Evaldas Taroza - Master thesis] Schema Matching and Automatic Web Data Extra...Evaldas Taroza
 
phd_thesis_PierreCHATEL_en
phd_thesis_PierreCHATEL_enphd_thesis_PierreCHATEL_en
phd_thesis_PierreCHATEL_enPierre CHATEL
 
report.doc
report.docreport.doc
report.docbutest
 
Project final report
Project final reportProject final report
Project final reportALIN BABU
 
Furniture shop management system project report
Furniture shop management system project reportFurniture shop management system project report
Furniture shop management system project reportMaiwandTechnologix
 
Market microstructure simulator. Overview.
Market microstructure simulator. Overview.Market microstructure simulator. Overview.
Market microstructure simulator. Overview.Anton Kolotaev
 
Mongo db data-models guide
Mongo db data-models guideMongo db data-models guide
Mongo db data-models guideDeysi Gmarra
 
Mongo db data-models-guide
Mongo db data-models-guideMongo db data-models-guide
Mongo db data-models-guideDan Llimpe
 
NACCL-Requirements-Specification-Template
NACCL-Requirements-Specification-TemplateNACCL-Requirements-Specification-Template
NACCL-Requirements-Specification-TemplateJason Weber
 

Similaire à MSc dissertation np (20)

Lakhotia09
Lakhotia09Lakhotia09
Lakhotia09
 
Gomadam Dissertation
Gomadam DissertationGomadam Dissertation
Gomadam Dissertation
 
Content and concept filter
Content and concept filterContent and concept filter
Content and concept filter
 
Data replication (software)
Data replication (software) Data replication (software)
Data replication (software)
 
XML2004-schwarzman
XML2004-schwarzmanXML2004-schwarzman
XML2004-schwarzman
 
Rafal_Malanij_MSc_Dissertation
Rafal_Malanij_MSc_DissertationRafal_Malanij_MSc_Dissertation
Rafal_Malanij_MSc_Dissertation
 
[EN] PLC programs development guidelines
[EN] PLC programs development guidelines[EN] PLC programs development guidelines
[EN] PLC programs development guidelines
 
[Evaldas Taroza - Master thesis] Schema Matching and Automatic Web Data Extra...
[Evaldas Taroza - Master thesis] Schema Matching and Automatic Web Data Extra...[Evaldas Taroza - Master thesis] Schema Matching and Automatic Web Data Extra...
[Evaldas Taroza - Master thesis] Schema Matching and Automatic Web Data Extra...
 
phd_thesis_PierreCHATEL_en
phd_thesis_PierreCHATEL_enphd_thesis_PierreCHATEL_en
phd_thesis_PierreCHATEL_en
 
Rzepnicki_thesis
Rzepnicki_thesisRzepnicki_thesis
Rzepnicki_thesis
 
report.doc
report.docreport.doc
report.doc
 
Project final report
Project final reportProject final report
Project final report
 
Furniture shop management system project report
Furniture shop management system project reportFurniture shop management system project report
Furniture shop management system project report
 
Market microstructure simulator. Overview.
Market microstructure simulator. Overview.Market microstructure simulator. Overview.
Market microstructure simulator. Overview.
 
Manual
ManualManual
Manual
 
Mongo db data-models guide
Mongo db data-models guideMongo db data-models guide
Mongo db data-models guide
 
Mongo db data-models-guide
Mongo db data-models-guideMongo db data-models-guide
Mongo db data-models-guide
 
thesis
thesisthesis
thesis
 
thesis
thesisthesis
thesis
 
NACCL-Requirements-Specification-Template
NACCL-Requirements-Specification-TemplateNACCL-Requirements-Specification-Template
NACCL-Requirements-Specification-Template
 

MSc dissertation np

  • 1. UQI810TM Project module 2003 M R Shaheedullah Student No. 02975062 An XML Approach Using Forms and Style Sheets to Generate Materials Test Reports
  • 2. Abstract An application that may benefit from the advantages of XML is the creation of materials test reports. An XML schema known as MatML already exists for materials properties. Two form based data entry methods were compared, a HTML v4 with Perl v5.6 system, and the Microsoft InfoPath software package. The resulting XML documents from these systems were formatted to produce reports using style sheets. A feasibility study explored the techniques and highlighted a number of issues, in particular the requirement to use a combination of extensible style language transformation (XSLT) and cascading style sheets (CSS) to format the XML document. A framework for the review of the MatML schema was developed, attempting to focus on its effectiveness and efficiency. The review concluded that MatML provides a logical description of the materials domain and the code is relatively efficient. The two systems of creating XML documents were found to have contrasting characteristics. The HTML/Perl system has good flexibility for manipulating data, at the cost of significant development time. This system is available at no cost. Implementation on multiple platforms, and use over the internet are also feasible. A form was developed in Microsoft InfoPath very simply with the use of a graphical design mode. The XML based approach to report generation has benefits in that the XML document can be transformed into a formatted document with a relatively simple style sheet, and the file format can potentially facilitate data interchange.
  • 3. Table of Contents 1 INTRODUCTION ....................................................................................................................... 1 2 RESEARCH AND PROJECT STRATEGY ............................................................................. 6 3 LITERATURE SURVEY.......................................................................................................... 10 4 REVIEW OF SOFTWARE TOOLS........................................................................................ 19 5 FEASIBILITY STUDY ............................................................................................................. 23 6 REVIEW OF MATML ............................................................................................................. 43 7 PROJECT DESCRIPTION ...................................................................................................... 53 8 FORM CREATION FOR MATML EXAMPLE 2 DATA, USING HTML......................... 56 9 FORM CREATION FOR MATML EXAMPLE 2 DATA, USING MICROSOFT INFOPATH.......................................................................................................................................... 62 10 REPORT GENERATION USING XSLT AND CSS.............................................................. 68 11 ASSESSMENT OF PROJECT DATA..................................................................................... 75 12 DISCUSSION............................................................................................................................. 82 13 SUMMARY OF FINDINGS ..................................................................................................... 91 14 REVIEW..................................................................................................................................... 93 15 REFERENCES .......................................................................................................................... 95 APPENDIX A: ORIGINAL MATML EXAMPLE 2 XML DOCUMENT APPENDIX B: HTML V4 CODE TO PRODUCE WEB-FORM FOR MATML EXAMPLE 2 APPENDIX C: PERL V5.0.6 SCRIPT TO TRANSFORM HTML FORM ELEMENTS INTO MATML DOCUMENT APPENDIX D: XML DOCUMENT BASED ON MATML EXAMPLE 2, PRODUCED BY HTML/PERL APPENDIX E: XML DOCUMENT BASED ON MATML EXAMPLE 2, PRODUCED BY MICROSOFT INFOPATH APPENDIX F: XSLT FILE TO TRANSFORM XML DOCUMENTS BASED ON MATML EXAMPLE 2 TO A ONE-PAGE REPORT
  • 4. Table of Figures Figure 1 Schematic map of the dissertation.............................................................................. 9 Figure 2 Hierarchy of entities for basketball statistics............................................................. 27 Figure 3 Web-form for basketball data.................................................................................... 30 Figure 4 Microsoft InfoPath form for basketball data, bballinfopathform.xml (from template bball.xsn)................................................................................................................................. 33 Figure 5 Output of bball.xml with CSS style sheet (basket.css), in Microsoft Internet Explorer v6.0.......................................................................................................................................... 36 Figure 6 Output of XML document bball.xml with XSLT style sheet, basket.xsl, in Microsoft Internet Explorer v6.0.............................................................................................................. 39 Figure 7 First level hierarchy of MatML................................................................................... 45 Figure 8 Partially expanded hierarchy of MatML, ‘BulkDetails’ element................................. 46 Figure 9 Partially Expanded Hierarchy of MatML ................................................................... 47 Figure 10 A schematic diagram for the domain of materials property data ............................ 48 Figure 11 XSV v1.4 software validation of MatML schema .................................................... 51 Figure 12 Index of links for HTML form................................................................................... 59 Figure 13 Material and test details for HTML form ................................................................. 60 Figure 14 PropertyData element, HTML form......................................................................... 61 Figure 15 Screenshot of Microsoft InfoPath in design mode.................................................. 65 Figure 16 Material and test details, InfoPath form .................................................................. 66 Figure 17 PropertyData element, InfoPath form ..................................................................... 67 Figure 18 PropertyDetails and ParameterDetails elements.................................................... 67 Figure 19 HTML/Perl generated XML document, formatted with XSLT/CSS, as displayed in Microsoft Internet Explorer v6.0, print view............................................................................. 71 Figure 20 HTML/Perl generated XML document formatted with XSLT/CSS, as displayed in Netscape Navigator v7.02, partial print view........................................................................... 72 Figure 21 Microsoft InfoPath generated XML document formatted with XSLT/CSS, as displayed in Microsoft Internet Explorer v6.0, print view......................................................... 73 Figure 22 Microsoft InfoPath generated XML document with XSLT style sheet, as displayed in Netscape Navigator v7.02, partial print view........................................................................... 74 Figure 23 Output of the results from XSV, for the HTML/Perl generate XML document........ 77 Figure 24 Output of the results from XSV, for the Microsoft InfoPath XML document. .......... 77 Figure 25 Output of the results from XSV, for the modified Microsoft InfoPath XML document. ................................................................................................................................................ 78 Figure 26 Word-processed materials test report .................................................................... 81 Figure 27 Schematic diagram to show possible routes to report generation.......................... 87
  • 5. 1 1 Introduction 1.1 What is XML? XML (XML activity, World Wide Web Consortium, 2000) is an abbreviation for eXtensible Markup Language. This markup language facilitates a user- definable, hierarchical data structure. The device independence and customisability of XML give it the potential to profoundly affect data formats and usage. In particular, XML can be used to create entire user-defined markup languages, complete with a schema to define constraints. Database management systems, web based information systems, and any application that uses a hierarchical data structure may be able to utilise XML. The nature of XML is discussed in more detail in the Literature Survey, section 3. 1.2 What are material test reports? In the manufacturing industry, the raw material used to make components must often meet certain property requirements. These requirements may include mechanical, chemical or micro-structural properties. The nature of the properties required is dependent on the application. For example motor engine parts may need to be fatigue resistant whereas a pipeline may require corrosion resistance. Hence, when raw material is produced, test reports can be prepared that selected properties of the material. 1.3 How can XML be used with material test reports? Methods for creating a test report could include manually preparing the document (using data extracted from a separate database), or use of a software database package with a report generator. The use of XML documents has a number of benefits, described in section 3.2. In particular, a
  • 6. 2 schema can be defined to represent a specific domain. Such a schema has been developed for materials properties, MatML (Begley and Sturrock, 2000). MatML is discussed in section 3 and reviewed in section 6. A convenient method of placing data in an XML document would be to use forms. XML documents can be formatted using a technology termed style sheets (Style activity, World Wide Web Consortium, 2003). Forms and style sheets could therefore potentially be used to create materials test reports from raw data, using XML documents. 1.4 What technologies could be used to create materials test reports from XML documents? The proposed method of storing materials test report data in an XML file and subsequently producing a report consists of the following steps; 1. create a form to enter data to an XML document 2. apply formatting to the XML document for display as a report in a web browser. 1.4.1 Creating forms Forms are a convenient method of entering data for end-users. Form element types include text boxes, option lists and check boxes. Database software packages often use forms for data entry. An example is Microsoft Access 2000 (Microsoft, 1999), which has a form design component. Microsoft Access 2000 does not however support the creation of XML documents. XForms (XForms activity, World Wide Web Consortium, 2003) discussed in section 3.6, is a proposed specification for a form based technology for XML.
  • 7. 3 However, the specification is not yet finalised and hence not fully implementable in software. HTML (HTML activity, World Wide Web Consortium 1999) is commonly used to provide web-based forms, via a web browser. A number of scripting technologies are available to manipulate the form data, including Perl (The Perl Foundation, 2003). The combination of HTML and Perl can potentially, therefore create a form based data entry system for XML documents. The development time for a HTML/Perl system is expected to be less than that for a language such as Java or C++, as the web-browser would provide most of the interface and form elements via relatively simple HTML code. Microsoft InfoPath 2003 beta 2 edition (Microsoft (b), 2003) is a software package, in beta test state at the time of writing, which provides a graphical interface for XML-based form design. A short description of its capabilities is given in section 4.1. As HTML with Perl is a widely available development system, and Microsoft have vast influence in the business software market, a comparison of these two methods of creating forms for XML documents would be a very useful exercise. 1.4.2 Formatting the XML file to produce report The final step of formatting the XML document to produce a report, is to be performed by applying style sheets to the XML document. Scripting languages such as Perl could be used to read and format the XML document, but the XML data would need to be parsed. This is achievable but adds complexity to the formatting process. Style sheet technologies can recognise XML components
  • 8. 4 directly, and are specifically designed to apply formatting to markup documents such as those produced with HTML and XML. Two style sheet languages that have support in later versions of Microsoft and Netscape web browsers are Cascading Style Sheets (CSS) and eXtensible Style sheet Language Transformations (XSLT) which is part of eXtensible Style sheet Language, XSL. The specifications of both these languages are defined by the World Wide Web Consortium. The specification of CSS is given in CSS activity, World Wide Web Consortium, 1998 and that for XSLT is given in XSL activity, World Wide Web Consortium, 1999. The style sheet languages are discussed in section 3.6. 1.5 Proposed title for research: An XML Approach Using Forms and Style Sheets to Generate Materials Test Reports. 1.6 Research questions: 1. How does the use of HTML and Perl for data entry and XML document production compare with the use of Microsoft InfoPath for the same purpose? 2. Are style sheets capable of generating a formatted report from a suitable MatML v3.0 data file? 3. What are the benefits and limitations of using XML documents and style sheets to generate materials test reports? 4. How can a review be performed on an XML Schema such as MatML? 5. Is it possible to produce a valid MatML v3.0 data file using HTML and Perl?
  • 9. 5 1.7 Research aims: 1. Develop a framework for review of the MatML schema. 2. Compare the benefits and problems of two approaches to data entry for XML document creation; I. use of HTML forms and Perl script II. use of Microsoft InfoPath Comparison criteria will include the following;  development time  cost of software  flexibility in handling of data  suitability or commercial application  feasibility of use on multiple platforms 3. Assess the benefits and problems of using XML documents with style sheets to generate reports. This comparison will be conducted in a more general context, with criteria that include;  ease of use  use of XSLT for formatting  use of the XML file format  commercial implementation issues.
  • 10. 6 2 Research and project strategy 2.1 Paradigms and methodology Hussey & Hussey (1997)1 describe two research paradigms, the positivist and phenomenological. These authors report that the positivistic paradigm is an approach used by social scientists, is similar to that used by natural scientists. This approach is said by the above authors to be objective, scientific and to assume that laws provide the basis for explanation. The phenomenological approach is said by the above authors to be qualitative, humanistic and subjectivist. It is asserted that the researcher has an effect on and cannot be separated from the research (in social science). The concepts in Hussey and Hussey (1997) are described mainly in the context of business, socially based, research (see definitions of positivist and phenomenological paradigms above). An attempt will be made to map the concepts for this, technological dissertation proposal. A number of research concepts (in research associated with human systems) are described by Hussey and Hussey (1997), including validity, reliability and triangulation. Reliability could be portrayed as a measure of the repeatability of research findings. Validity is stated by Hussey & Hussey (1997) to be the extent to which the research findings accurately represent what is really happening in a situation. This could be described as the effectiveness of the research method, with respect to the stated aims of the researcher. 1 Hussey and Hussey (1997) is the recommended text for the research methods module, UQQ803HM at UWE.
  • 11. 7 Triangulation is the use of different research methods in the same investigation. Where appropriate (perhaps in the study of human systems), use of triangulation could reduce bias effects. This investigation is naturalistic and hence objective in nature. It could be said that there is an element of data triangulation in the multiple comparison techniques for examining the XML documents generated by the two form systems described in 1.4.1. Validity for this objective system could be expected to be high. Good reliability would result in the findings of the research being applicable to multiple projects in form-based XML document production and formatting. There are no particular ethical issues envisaged for the investigation proposed here. 2.2 Plan 2.2.1 Feasibility study An initial feasibility study will be performed. This will be essentially qualitative, the purpose being to explore approaches, highlight problems and investigate achievability of project aims. 2.2.2 Review of MatML v3.0 Use an analytical approach to provide and implement a framework for the review of MatML v3.0. 2.2.3 Project development On completion of the feasibility study, the development of the application system will begin. This will involve; 1. production of a web-based interface in HTML, for entering data
  • 12. 8 2. development of Perl code that will write data into an XML file 3. development of a Microsoft InfoPath application, for entering data 4. development of a style sheet system that will transform the document file into a report format. 2.2.4 Assessment of data This assessment of documents produced in project, to involve a. comparing project XML documents with the source XML document b. testing of XML documents with a validation tool c. comparison of formatted, transformed XML document with a word-processed report based on the same data. A schematic of the intended map of the dissertation is shown in Figure 1.
  • 13. 9 Figure 1 Schematic map of the dissertation Review of MatMLHTML/Perl Microsoft InfoPath Report generation: Style sheet Form creation: Comparison Literature Survey Feasibility StudyResearch questions Objectives/Preparation Analyse XML approach to report creation Key: activity objective process analytical
  • 14. 10 3 Literature Survey 3.1 Introduction to markup languages XML (XML activity, World Wide Web Consortium, 2000) was created as a markup technology. A markup is information embedded with text that describes the document structure. SGML (Standard Generalized Markup Language) is a sophisticated international markup language, developed as an early standard for structuring documents stored on a computer. SGML is now defined by an international standard, (International Organisation for Standardisation, 1986). SGML can be used to create other markup languages, by using a specification known as a DTD (document type definition). Hypertext Markup Language (HTML activity, World Wide Web Consortium, 1999), abbreviated to HTML, is derived from SGML, and can be described as a DTD of SGML. HTML is a much simpler language than SGML that is accessible to end-users. HTML provides many features to format information for display, but cannot create user-defined markups. XML is another, more recent DTD of SGML. XML is a data-centric markup technology that has the ability to create user-defined markup tags. The purpose of these tags is to organise hierarchical data. The data can then be displayed or processed. Constraints can be applied to XML data with validation rule systems. One is document type definitions, DTD (XML activity, World Wide Web Consortium, 1998). DTD for XML is a specific system for applying constraints to XML elements. A more sophisticated constraint system is provided by XML Schema (XML schema activity, World Wide Web Consortium, 2003). User-defined
  • 15. 11 schema, such as MatML can be created using XML Schema. Validation systems are discussed in section 3.3. The body that approves generally accepted markup languages for the internet is currently the World Wide Web Consortium (1994), also known by the abbreviated name W3C. The World Wide Web Consortium is an international industry consortium that creates specifications for open standards on the web. 3.2 Benefits of XML The World Wide Web Consortium (XML activity, 2003) state a number of benefits of XML, including;  enables internationalised media-independent electronic publishing  provides underpinnings of the Semantic Web, enabling a new level of interoperability and information exchange  encourages industries to define platform-independent protocols for the exchange of data, including electronic commerce  allows people to display information the way they want it, under style sheet control. Many authors including Barillot and Achard (2000), Lie and Saarela (1999), and Ceri et al (2000) focus on the ability of XML to add semantics to content. This is a key attribute of XML. The user-defined, descriptive tags, hierarchical data structure and the ability to add constraints and validation rules with a DTD (or XML Schema) could be considered as the foundation for the semantic nature of XML. The structure and meaning provided by the semantics of XML are intended to be machine readable, and from this results other advantages attributed to XML, for example XML as a language for data interchange.
  • 16. 12 Begley and Sturrock (2000) identify three principal features that XML brings to the web;  extensibility: users can identify their own tags and attributes in their documents  structure: users can define their own DTD  validation: users can test the conformance of their documents to the structure defined by the DTD. These points are considered an apt summary of the advantages provided by XML for users intending to create hierarchical data structures. It should be noted that whilst XML is often quoted with respected to web-based information systems, the XML language is not exclusively for web internet usage. An information system that can handle a hierarchical data structure is likely to be usable with XML. Barillot and Achard (2000) claim that XML is a powerful language for data interconnection, and its operating system independence makes it a universal hub between databases. 3.3 DTD and schema for XML A DTD can be produced to define rules for the XML document. These include, for example, what child elements a parent element must have, whether an element is able to have child elements and whether an element must consist of plain text. The World Wide Web Consortium (XML activity, 2003) comment that automated processing of XML documents requires more rigorous and comprehensive facilities than are available with the DTD mechanism of declaring constraints on the use of markup.
  • 17. 13 Elliot (2003) and Needleman (2001) call attention to some specific limitations of DTDs. DTDs are said to enforce basic structural rules on documents, but not provide fine control over the format and data types of elements (Elliot, 2003, p254). Needleman (2001) refers, in particular to DTDs limited support for data types. XML Schema is a more recent language designed for users to validate and constrain XML applications. XML applications based on XML Schema will be denoted by ‘schema’. The World Wide Web Consortium produced a requirements document in 1999, (XML schema activity, World Wide Web Consortium, 1999), which states that XML Schema will address specific goals beyond DTD functionality, including integration of structural schemas with primitive data types, and inheritance. The XML Schema language was delivered by the World Wide Web Consortium as a formal recommendation in 2001 (XML schema activity, World Wide Web Consortium (a), (b), 2001). Needleman (2001) reports that XML Schema does add support for many types of data, including Boolean, integer, date etc, in addition to supporting user- defined types. XML schemas are defined using XML notation. Use of XML Schema has a particular advantage with respect to data validation. Often, applications use complex code to validate data documents. Needleman (2001) comments that an XML aware parser would be able to perform data validation prior to the XML data file reaching the application that had to process it. 3.4 Applications of XML XML is considered to be a format for data interchange. However, a number of commentators, including Ceri et al (2000) and Barillot and Achard (2000), point
  • 18. 14 out that XML provides syntax but does not directly provide semantic interoperability. These commentators relate that the semantic descriptions necessary for data interchange are defined for specific domains, for which there is a ‘community agreement’. This type of agreement is not defined, but is expected to be a formal or informal accord between major bodies such as large corporations, government departments and standards institutes. MathML is a markup language that is supported by the World Wide Web Consortium and they have issued a recommendation for it (Math activity, World Wide Web Consortium, 2001). In general, a successful markup language will require widespread acceptance by those that may use it. Examples of markup languages that have been developed recently include ThermoML. ThermoML is described by Frenkel et al (2003) as an XML-based approach for storage and exchange of thermophysical and thermochemical property data. A complete schema has been developed according to Frenkel et al, and institutions such as the Thermodynamic Research Centre at the National Institute for of Standards and Technology (NIST, USA) along with the Journal of Chemical Engineering Data have been involved. NIST is also reported to be involved in the development of MatML, a language for describing materials property data (Begley and Sturrock, 2000). According to Begley and Sturrock, the next steps for MatML would be acceptance testing and dissemination through various standards and professional organisations. XML can also be used with database management systems (DBMS). Seligman (2001) states that XML and database technology complement rather than compete with each other. Seligman points out that DBMS provide specific facilities including highly tuned query and transaction processing, recovery,
  • 19. 15 indexes, integrity and triggers. These facilities go beyond the structure of the data. Many DBMS now provide import and export facilities to an XML data format. 3.5 Is XML self describing? Ceri et al (2000) state that XML is self-describing, as semantic annotations provide information about the content of the document. Simeon (2003) however, asserts that XML is not self-describing, due to the dependence of the internal format (corresponding to an external XML description) on the XML Schema (or DTD). Nambiar et al (2002) point out that SGML parsers require stricter adherence to the DTD, and report this as a shortcoming for everyday use. The use of a DTD or schema with XML is optional. Whilst an XML data file may not be fully self-describing in isolation, a unit consisting of an XML data file with a DTD or XML Schema would be self-describing. Further, according to the World Wide Web Consortium (XML schema activity, World Wide Web Consortium, 1999) XML schema, document their own meaning, usage and function, implying that they are self-describing. 3.6 Forms for XML XForms is a planned specification by the World Wide Web Consortium (XForms activity, World Wide Web Consortium, 2003) in the process of development. XForms is intended to be the successor to HTML forms (w3cschools (a)), and as such, implementation in future web-browsers can be expected. The aim of XForms is to separate the purpose of a form from its presentation. The form would consist of components that describe the functions of the form,
  • 20. 16 and control its appearance (XForms activity, World Wide Web Consortium, 2003). XForms operates on the basis of XML instance data, represented as form controls (O’Reilly, 2003). XForms also provides some basic functions such as ‘sum’ and string controls. (w3Schools (b)). The current web browsers Netscape Navigator v7.02 and Microsoft Internet Explorer v6.0 do not support XForms. Khare (2000) mentions the aim of separating data, logic and presentation and in this respect asks the question; “Can Xform Transform the Web?” Khare points out that XForms needs to account for various technologies, including XML schema, style sheets and multimedia. According to Khare, if web authors will be required to learn all these associated technologies in order to use XForms, the XForms approach will be hindered. 3.7 Style sheets The World Wide Web Consortium, promotes the use of style sheets to format web-based documents (Style activity, World Wide Web Consortium, 2003). Style sheet technology is part of the movement that separates web content from display, has practical advantages in its usage, and is the primary method accessible to the end user of formatting XML data for web browser display. A style sheet is a set of syntactical rules that applies visual style to web page elements. Prior to style sheets, custom visual attributes such as typefaces, colour and font size were defined in HTML tags for every instance of the element. Style sheets, using a style sheet language, enable definitions for the
  • 21. 17 visual attributes to be separated from the web content. A single style sheet can be applied to multiple documents. A style sheet language that is supported by Microsoft Internet Explorer v6.0 and Netscape Navigator v7.02 is CSS (Cascading Style Sheet) language (CSS activity, World Wide Web Consortium, 1998). The term cascading refers to order of precedence from the designer (highest) to the user, then to the browser. CSS can apply formatting to HTML and XML documents. A second style sheet language is XSL (eXtensible Style Languages), as defined in XSL activity, World Wide Web Consortium, 1999). This currently has less browser support than CSS, but has the additional functionality of being able to transform XML documents. This latter functionality is known as XSLT (eXtensible Style Language Transformations) and is a subset of XSL. For example, an XML document could be transformed to a HTML document using XSL. Both CSS and XSL can be used to format XML data. Thomas (1999) describes features in the CSS specification that are expected to be of practical use to web developers. These include the use of paged media, which will be used for printable web pages not designed to scroll, and page position control. The latter feature may become significant, as according to Thomas, many web designers use tables for position control, a purpose for which it was not designed. Some commentators, including Marden and Munson (1999), report that CSS technology has limitations. A further comment is made that CSS does not support arbitrary mathematical expressions for values. The authors comment that XSL does accept arbitrary mathematical expressions for values, but state that XSL notation is verbose (it uses XML syntax) and its usage requires a
  • 22. 18 steep learning curve. Bos (2001), however, states that XML is verbose by design, reporting the advantage that XML data can be human readable if the need arises (although in general this should not be necessary). It should be noted that Marden and Munson advocate the use of another style sheet technology ‘Proteus and PSL’. Lie and Saarela (1999) state that the use of style sheets is complementary to structured documents. The separation of content and presentation is said by the aforementioned authors to be a requirement for device independence, all device specific information being held in the style sheet. The achievement of device independence would be a major advantage in the use of style sheets. Style sheet technologies are under development via the World Wide Web Consortium, and some current recommendations are in place, such as CSS, but further refinements and features are possible.
  • 23. 19 4 Review of software tools This purpose of this review is to give an overview of the relevant features of the software tools used. 4.1 Microsoft InfoPath In 2003 Microsoft Corporation released a beta test version of its integrated applications suite, Microsoft Office 2003 beta 2 edition (Microsoft Corp. (b), 2003). This suite was reported by Microsoft as supporting the XML format. A new component of the Office suite is Microsoft InfoPath (Microsoft Corp. (a), 2003). The purpose of InfoPath is to create forms that can output data in XML documents. The InfoPath software generates a form from a user-defined template, the native file format of that form is XML. InfoPath is able to create templates from a number of sources, including XML Schema and XML documents. InfoPath uses a ‘drag and drop’ graphical user interface (GUI) to create a form template. Components of the source document can be dragged to the design screen, where a form field is automatically created for each component. InfoPath is able to map the form elements to the original XML structure. Design features include the optional use of tables, graphics and text labels. Components can be specified as repeating where appropriate. Once the basic template is generated, repeating sections can be multiplied. The template can be used to produce any number of forms. Data can be entered into the forms, the native file format being an XML structure based on the original source used to design the template.
  • 24. 20 4.2 Web browsers Two web browsers were used, Microsoft Internet Explorer v6.0 (Microsoft Corp., 2001 and Netscape Navigator v7.02 (Netscape Communications Corporation, 2003). 4.3 HTML, CSS and XSLT A web programming environment that is available at no cost is HTML Kit build 292 (Chami.com, 2002). This software provides assistance on the parameters for HTML v4 tags, but not a ‘WYSIWYG’ (what you see is what you get) editable view of the document. HTML Kit has the capability of accepting ‘plug-ins’ from third-party developers that provide extra functionality, for example, viewing an XML document as a tree. The Microsoft FrontPage component (Microsoft Corp. (c), 2003) of the Microsoft Office 2003 beta edition is a web design package that does include a ‘WYSIWYG’ editable view of the document, as well as a code view. The ‘WYSIWYG’ is particularly useful for designing sophisticated tables, and adding form elements. The above-mentioned packages accept CSS commands in HTML pages. Programming XSLT requires a software processor module that applies the transformation to a document. HTML Kit can utilise a plug-in that performs this task (Glass, 2001). Microsoft FrontPage can process XSLT transformations. 4.4 Perl The use of Perl scripts requires a Perl interpreter and a web-server on the computer. A Perl system was implemented on a stand-alone computer using
  • 25. 21 IndigoPerl build 08 from IndigoStar Software (2001). This includes a Perl interpreter, Perl v5.6.1 and a pre-configured web server, Apache v1.3.2. Editing of Perl scripts can benefit from a visual interactive development environment (IDE). Such a system is Optiperl by Xarka Sofware (2003). This includes the following features;  interactive debugger for Perl  internal server  will read form variable values from HTML file  built-in preview facility for output. These features can significantly shorten program development time. 4.5 Software for validating an XML document and user- defined XML schema The user-defined schema should conform to the requirements of the specification for XML Schema (XML schema activity (a), (b), World Wide Web Consortium, 2001). To check for this conformance, a validation of the schema can be performed with suitable software tools. Many of these tools are commercial software packages, for example XML Spy v5.0 (Altova Gmbh, 2002). A tool that is available without cost is the web based XSV system v2.5- 2 (Thompson and Tobin (a), 2003), available for use via forms on the internet. The XSV system is described by its authors as being in beta test state. XSV is therefore not a final product at the time of writing but can be used as an indication of schema conformance. Versions of XSV were made available for use on a stand-alone computer. Command-line XSV version 1.4 (Thompson
  • 26. 22 and Tobin (a), 2003), dated 9/07/2003 was utilised in the later stages of this investigation. The Sun MSV system ver2 (Sun Microsystems, 2003), is stated by Sun Microsystems to support a subset of XML Schema Part 1 (XML schema activity (a), World Wide Web Consortium, 2001), hence not all schema errors may be recognised. Sun’s MSV is however available at no cost and can be used on a stand-alone computer. The MSV system could therefore be used for debugging during schema development. Another method of validating a schema or XML document is to import it into Microsoft InfoPath. Usage of Microsoft InfoPath indicates that validation events occur on importing XML documents and XML schema. Microsoft InfoPath performs a check on the schema as it is imported. An XML document may contain an association with a schema. On importing a document that does have such an association into Microsoft InfoPath, the software will initially check the named schema. If the schema generates no errors, the document will have its conformance to the schema checked. XML schemas and documents that link to a schema contain a declaration; a uniform resource indicator (URI) that has been assigned to the W3C XML Schema. The validation tool processes elements on the basis of this declaration (White et al, 2001). Some documents (including MatML examples) include a declaration that is obsolete ‘http://www.w3.org/2000/10/XMLSchema- instance’. Current validation tools require an up to date declaration. 4.6 Comparing two documents Beyond Compare 2 (Scooter Software, 2003) is a utility that can compare two text files and highlight any differences.
  • 27. 23 5 Feasibility study 5.1 Introduction A feasibility study was performed in relation to the project and selected research aims. The objectives of the feasibility study were to; 1. explore possible approaches in achieving aims 2. investigate the achievability of aims 3. highlight practical problems and limitations. Particular questions that could be answered in a feasibility study are shown in relation to appropriate project and research aims; Review the MatML XML schema for materials test data. 1. How are data organised into a hierarchy suitable for XML? 2. What is the nature of the XML Schema declarations that implement this hierarchy? Use HTML and Perl to produce a valid XML data file. 1. What form design can be used to accept user data? 2. How can Perl manipulate the form data to create a valid XML document? Use style sheets to generate formatted reports from XML documents. 1. Which style sheet technology or combination of technologies is better at producing a report from the XML document? 2. Are there any particular difficulties in formatting a MatML file? A method was specified for a program of work that would assist in answering these questions. The program utilises HTML web pages and Perl scripts produced by Shaheedullah (2003), including a HTML form allowing entry of
  • 28. 24 statistics relating to basketball player statistics. The statistics are written to a file on the server using a Perl script. Another Perl script reads the data from disk, then formats them for display in a web browser. These web pages and Perl scripts were used as the base material for the feasibility study. The steps in the work program are as follows;  organise the data structure into a hierarchy  create an XML schema for the data structure  utilise the existing HTML form to collect data and send it to a Perl script  adapt the Perl script to write the data directly to an XML file  use a validation tool to test conformance of the XML file to the schema  use CSS to format and display the XML file  use XSLT and CSS to format and display the XML file. 5.2 An XML schema for the basketball data An XML schema is composed of elements and attributes. Data types of both of these can be selected from predefined types, or user-defined types. The basic types used in XML Schema are described, following; 5.2.1 XML Schema data types The two core element types possible in XML Schema are simpleType and complexType, from which specific elements are derived. Simpletype These are based on data types, such as string, integer, decimal, float, date, time. Further simpleTypes include;
  • 29. 25 1. customised versions of basic data types, e.g. with maximum and minimum values set 2. enumerated – a set of possible values 3. list – multiple values for a given element separated by whitespace 4. patterned types – simple coded patterns can be created e.g. for telephone number. ComplexType ComplexTypes can define elements’ relationships with each other and their own attributes. The four basic complexTypes are listed below; 1. empty elements; have attributes but no text value or child elements only 2. element-only; can have child elements but no text 3. mixed elements; can be set to have just a text entry or both text and child elements 4. sequences and choices; specific sequences of elements can be defined, and a series of optional elements can be given. 5.2.2 XML schema definition An XML schema was produced for the basketball data structure. This involved selection of entities, creating a hierarchical structure, and defining constraints. The entities selected for use in the schema are as follows;  name (first and last)  age  height (feet and inches)  points
  • 30. 26  steals  rebounds  blocks. Some additional entities were specified to facilitate a logical hierarchy, based on a player who has personal elements and score elements;  person - consists of name, age, height  scores - consists of points, steals, rebounds, blocks  player - consists of person and score. A hierarchy can be drawn. A judgement needs to be made as to whether related entities will be described by a parent/child relationship or element/attribute relationship. In this example, ‘first’ and ‘last’ were considered to be entities, which were child entities of ‘name’. The units ‘feet’ and ‘inches’ however, were assigned as attributes of the height element. The selected entities were organised into a hierarchical structure of elements and attributes, shown in Figure 2 following;
  • 31. 27 Figure 2 Hierarchy of entities for basketball statistics 5.2.3 Constraints XML Schema allows its entities to be constrained by data type. For example, ‘points’ needs to be an integer, this definition can be effected by a declaration in XML Schema. XML Schema data types can also be customised to provide user-defined types, for example the minimum and maximum values of the integer value for ‘points’ could be defined. A simple set of constraints were defined for the entities in Figure 2, these are shown in Table 1, following. player person name age height :feet :inches scores points steals rebounds blocks first last element :attribute key
  • 32. 28 Table 1 Data Dictionary for basketball statistics hierarchy Element XML Schema data type Constraint player complexType Element only person complexType Element only scores complexType Element only name complexType Element only first simpleType String last simpleType String age complexType Integer height complexType Empty feet (attribute of height) simpleType Integer inches (attribute of height) simpleType Integer scores complexType Element only points simpleType Customised integer Min 0 max 9999 blocks simpleType rebounds simpleType steals simpleType 5.2.4 Schema implementation A simple model of the domain now exists, consisting of;  hierarchy  constraints. An XML schema was developed to implement the model. The final schema is shown in Listing 1 below. Listing 1 Basket.xsd <?xml version="1.0" encoding="utf-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="basketball"> <xsd:complexType> <xsd:sequence> <xsd:element name="player" type="person" /> <xsd:element name="playerscores" type="scores" /> </xsd:sequence> </xsd:complexType>
  • 33. 29 </xsd:element> <xsd:complexType name="person"> <xsd:sequence> <xsd:element name="playerheight" type="height" /> <xsd:element name="playername" type="names" /> <xsd:element name="playerage" type="age" /> </xsd:sequence> </xsd:complexType> <xsd:complexType name="names"> <xsd:sequence> <xsd:element name="firstname" type="xsd:string" /> <xsd:element name="lastname" type="xsd:string" /> </xsd:sequence> </xsd:complexType> <xsd:complexType name="age"> <xsd:simpleContent> <xsd:extension base="xsd:int" /> </xsd:simpleContent> </xsd:complexType> <xsd:simpleType name="scoreval"> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="0" /> <xsd:maxInclusive value="9999" /> </xsd:restriction> </xsd:simpleType> <xsd:complexType name="scores"> <xsd:sequence> <xsd:element name="points" type="scoreval" /> <xsd:element name="blocks" type="scoreval" /> <xsd:element name="rebounds" type="scoreval" /> <xsd:element name="steals" type="scoreval" /> </xsd:sequence> </xsd:complexType> <xsd:complexType name="height"> <xsd:complexContent> <xsd:extension base="xsd:anyType"> <xsd:attribute name="feet" type="xsd:integer" use="required" /> <xsd:attribute name="inches" type="xsd:integer" use="required" /> </xsd:extension> </xsd:complexContent> </xsd:complexType> </xsd:schema> 5.2.5 Schema validation A validation test for the schema was performed using the web based XSV tool v2.5-2. The test reported that there were no validity problems with the basketball schema file.
  • 34. 30 5.3 Form creation using HTML The original web form for the basketball data was coded in HTML 4.0 (Shaheedullah, 2003). The web form is contained in a single table. The code was re-prepared to utilise cascading style sheets (CSS). Form elements are used for data input. Two types of form element are used, text fields and list boxes. Text fields can accept alphanumeric input. List boxes are used for the height data, where the value is entered as feet and inches. The use of these units allows data values to be constrained, to a range of 4 to 7 for ‘feet’ and 1 to 11 for ‘inches’, these are presented as selectable options within the list boxes. The web form created with HTML and CSS is shown in Figure 3 below. Figure 3 Web-form for basketball data On submitting the query, the data are passed to a Perl script via the ‘post’ method. The Perl script parses the data, (matching the form element name
  • 35. 31 and the value entered) into variables. These variables can be combined with additional characters and strings, resulting in generation of XML elements. These elements can then be written to disk as an XML document. Prior knowledge of the XML document format was provided by generating an instance of the basketball XML schema. Software packages are available to achieve this, including XML Spy v5.0 (Altova Gmbh, 2002). The segment of the Perl script that writes the file is shown in Listing 2 below. Listing 2 Perl script segment, writeball.cgi #write text fields print OUTF "<?xml version="1.0"?>"; print OUTF "n"; print OUTF "<basketball"; print OUTF " xmlns:xsi="http://www.w3.org/2001/XMLSchema- instance""; print OUTF " xsi:noNamespaceSchemaLocation="basket.xsd">"; print OUTF "n"; print OUTF "<player>n"; # print OUTF "<playerheight ", "feet=", ""$FORM{'feet'}" ", "inches=", ""$FORM{'inches'}" />n"; # print OUTF "<playername>n"; print OUTF "<firstname>", "$FORM{'fname'}", "</firstname>"; print OUTF "<lastname>", "$FORM{'lname'}", "</lastname>"; print OUTF "</playername>n"; # print OUTF "<playerage>", "$FORM{'age'}", "</playerage>n"; # print OUTF "</player>n"; # print OUTF "<playerscores>"; print OUTF "<points>", "$FORM{'points'}", "</points>n"; print OUTF "<blocks>", "$FORM{'block'}", "</blocks>n"; print OUTF "<rebounds>", "$FORM{'rebound'}", "</rebounds>n"; print OUTF "<steals>", "$FORM{'steal'}", "</steals>n"; print OUTF "</playerscores>"; print OUTF "</basketball>n"; The XML document that is produced from the web form and Perl script is shown in Listing 3, following.
  • 36. 32 Listing 3 XML document, bball.xml <?xml version="1.0"?> <basketball xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="basket.xsd"> <player> <playerheight feet="6" inches="4" /> <playername> <firstname>Joe</firstname><lastname>Blogss</lastname></playername> <playerage>24</playerage> </player> <playerscores><points>35</points> <blocks>4</blocks> <rebounds>3</rebounds> <steals>2</steals> </playerscores></basketball> The Perl script writes the necessary tag identifiers as fixed values and the content is taken from the form’s variables. Validation of the form values is not performed by the Perl script, as this is conducted by an XML validation tool. 5.4 Form creation using Microsoft InfoPath Microsoft InfoPath can use an XML schema or XML document as a source for form creation. An attempt was made to use the basketball schema file, basket.xsd in Listing 1, as a source to create a form for data entry. Microsoft InfoPath gave the following error message “ Derived type and the base type must have the same content type. Base type : '{http://www.w3.org/2001/XMLSchema}anyType' Derived type : 'height' “ The alternative to using the schema as a source was to use the XML document written by the HTML/Perl system, bball.xml (Listing 3). Attempting to use this resulted in display of the same error message. The XML document in Listing 3 was manually edited, removing the reference to basket.xsd. The edited bball.xml file was then successfully imported as a source into Microsoft InfoPath for form creation. A form template (bball.xsn) was simply created by dragging the elements from the data source window into the design window. A
  • 37. 33 form was instanced from the template. The form (bballinfopathform.xml) is shown in Figure 4. Figure 4 Microsoft InfoPath form for basketball data, bballinfopathform.xml (from template bball.xsn) 5.5 Validation of form-generated XML document 5.5.1 HTML/Perl system There are a number of validation tools for XML schemas, as stated in section 4.5. The Sun MSV tool was used for the debugging the design of the basketball schema and data file. The XML document generated by the Perl script in Listing 2 was successfully validated against the schema in Listing 1. The XML document was modified to generate errors and observe the output of the validation tool, the data are shown in Table 2, following.
  • 38. 34 Table 2 Sun MSV validation tool output for bball.xml Change to XML document Validation tool output None the document is valid. Age = 3r Error at line:7, column:26 of file:///c:/perl5/htdocs/bball.xml "3r" does not satisfy the "int" type the document is NOT valid. Points = -3 Error at line:9, column:34 of file:///c:/perl5/htdocs/bball.xml the value is out of the range (minInclusive specifies 0). the document is NOT valid. Age = “” (empty) validating c:perl5htdocsbball.xml Error at line:7, column:24 of file:///c:/perl5/htdocs/bball.xml Content of element "playerage" is incomplete the document is NOT valid. Firstname=”” (empty) the document is valid. Final validation of the XML documents that were produced by HTML/Perl was performed by the XSV tool, described in section 4.5. The bball.xml document passed the validation test, with no errors reported. 5.5.2 Form creation using Microsoft InfoPath As stated above in section 5.4, the Microsoft InfoPath system did not accept the basket.xsd schema. Hence, instances of the basketball schema could not be validated against the schema using this software. However, a validation test of the Microsoft InfoPath XML document against the basketball schema was attempted with the XSV tool (command-line). An error was reported with a language attribute. The attribute was removed from the document by manual editing, and the modified document passed the validation test. In order to observe Microsoft InfoPath schema validation capabilities, a copy of the basketball schema was made, excluding the height element. This schema
  • 39. 35 was accepted by Microsoft InfoPath, and a simple form prepared for validation test purposes. The changes to the XML document in Table 2 were applied, and the results for Microsoft InfoPath are shown in Table 3. Table 3 Microsoft InfoPath output for form based on modified schema (excluding height element) Change to XML document Output None None Age = 3r This form includes errors. Do you still want to save it? Points = -3 This form includes errors. Do you still want to save it? Age = “” (empty) This form includes errors. Do you still want to save it? Firstname=”” (empty) None All non-conformances to the scheme resulted in the same error message, indicating that on attempting to save the form, Microsoft InfoPath performs a check of the form contents against the schema used to design it. 5.6 Using CSS to format the XML document XML documents can be formatted with CSS for display in a web browser. This technique was evaluated for formatting the basketball XML document, bball.xml. There are limitations with using CSS directly with XML files. These include the absence of a method to access information within tags, such as attributes, and no capability to add text content. Hence, the content within tags appears without descriptors, unless extra such tags are created in the original XML document. A simple CSS style sheet was created and associated with the XML document, Listing 4. The XML document was edited to associate the style sheet. The output on loading the edited XML document into Microsoft Internet Explorer v6.0 is shown in Figure 5.
  • 40. 36 Listing 4 CSS Style sheet for bball.xml, basket.css basketball {display:block;} playername {display:block; font:bold 14pt Courier, serif;text- align:center;} playerage {display:block; font:italic 12pt Courier, serif;text- align:center;} playerheight {display:block; font:italic 12pt Courier, serif;text- align:center;} playerscores {font-family:Times,serif;font-size:12pt;font- weight:bold;text-indent:45%} points {display:block}; rebounds {display:block}; blocks {display:block}; steals {display:block}; Figure 5 Output of bball.xml with CSS style sheet (basket.css), in Microsoft Internet Explorer v6.0 As the player height data are held in attributes, these data are not displayed with CSS. Further, the tag content is printed without descriptors. The capabilities of CSS can be used with XML documents transformed using XSLT. 5.7 Using XSLT and CSS to format XML file Modern browsers such as Microsoft Internet Explorer v6.0 and Netscape Navigator v7.02 support an XML related technology, XSLT (extended style sheet language). XSLT is able to effect transformations on XML files. One usage is to transform an XML document to HTML/CSS syntax, suitable for display. The use of XSLT in conjunction with HTML and CSS has a number of advantages over using CSS to directly format an XML file including;
  • 41. 37  attributes of XML tags accessible  text can be added as markup tags or literal text  search operation on tags possible  transformation operations include o sorting o conditional statements o functions on numeric data. An XSLT style sheet was prepared for the basketball XML file, see Listing 5. Listing 5 XSLT Style sheet for bball.xml, basket.xsl <?xml version="1.0"?> <xsl:style sheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.w3.org/TR/REC-html40"> <xsl:template match="/"> <html><head><title>Basketball</title> </head> <body> <xsl:apply-templates/> </body> </html> </xsl:template> <xsl:template match ="player"> <h3 style="text-align:center">Player Details</h3> <xsl:apply-templates/> </xsl:template> <xsl:template match ="playername"> <div style="display:block; font:bold 14pt Courier, serif;text- align:center;"> Name: <xsl:value-of select ="firstname" /> <xsl:value-of select ="lastname" /> <p /> </div> </xsl:template> <xsl:template match ="playerage"> <div style="display:block; font:italic 12pt Courier, serif;text- align:center;"> Age: <xsl:value-of select ="." /> </div> </xsl:template>
  • 42. 38 <xsl:template match ="playerheight"> <div style="display:block; font:italic 12pt Courier, serif;text- align:center;"> <xsl:value-of select = "@feet" /> feet <xsl:value-of select = "@inches" /> inches tall <p /> </div> </xsl:template> <xsl:template match ="playerscores"> <h3 style="text-align:center">Scores</h3> <p style="text-align:center;"> <table style="font-family:Times,serif;font-size:12pt;font- weight:bold;"> <tr><td>points</td><td><xsl:value-of select ="./points" /></td></tr> <tr><td>blocks</td><td><xsl:value-of select ="./blocks" /></td></tr> <tr><td>rebounds</td><td><xsl:value-of select ="./rebounds" /></td></tr> <tr><td>steals</td><td><xsl:value-of select ="./steals" /></td></tr> </table> </p> </xsl:template> </xsl:style sheet> The XML document was edited to associate it with the XSLT file. Importing the XML file into Microsoft Internet Explorer v6.0 results in the output shown in Figure 6, following.
  • 43. 39 Figure 6 Output of XML document bball.xml with XSLT style sheet, basket.xsl, in Microsoft Internet Explorer v6.0 Figure 6 shows that XSLT can add text (‘literals’ in XSLT terminology) to describe the data. Attributes of XML tags can be accessed, resulting in display of height data. The ability to include markup tags allowed a table to be created to display the scores data. 5.8 Discussion 5.8.1 Schema design issues The schema developed for the basketball data shows how entities can be organised into a hierarchical structure. The schema was ‘hand-coded’ in a text editor. There are a number of code structures that can achieve a similar result. For a larger schema, a visual representation of the schema would be useful.
  • 44. 40 Some software packages, e.g. XML Spy v5.0 (Altova Gmbh, 2002) have customised visual representations for XML schemas. The schema creates definitions of all elements. A root is element specified, ‘basketball’. This is the root of a hierarchy. The definitions are instanced with a new name at the appropriate point in the hierarchy, to create the schema. There are two alternative methods of coding a hierarchy. One is to define and instance each element at the point of entry in the hierarchy. This has the advantage that the full hierarchy is presented without the need for cross referencing instances to definitions. There is the disadvantage that multiple instances of the same element would be defined multiple times. Another alternative is to create definitions for each element and instance them at the point of use. A number of the element definitions include constraints on the value. For a fuller implementation, further constraints could be added. For example, ‘feet’ could be limited to a range, 4 to 7, and ‘inches’ limited to a range of 1 to 11. The constraints implemented illustrate the data typing capability of XML Schema, and provide a basis for XML document validation. Microsoft InfoPath did not accept the basketball schema, whereas the web based XSV tool validated it and reported no conformance problems. This suggests a difference in interpreting schema, and is an issue to be considered. 5.8.2 Form for data entry and document generation The HTML form created for data entry included fields for all elements of the basketball schema. Use of appropriate form fields can limit data entry to valid options, such as the use of list boxes for ‘height’ and ‘feet’. Tables assist in
  • 45. 41 structuring the display of the labels and fields. The HTML form is capable of providing all the fields necessary for an instance of the relatively simple basketball schema, in a concise tabular layout. The Perl script that accepts the form data, performs straightforward procedures, parsing and subsequent combination of form variable with appropriate text to produce an XML document. Schema features that could potentially make the HTML/Perl system significantly more complex include the use of repeatable elements, and possibly, use of identification attributes. The use of the HTML/Perl file as the source for the Microsoft InfoPath template is considered reasonable for the small basketball schema in the feasibility study. However, for the main project, an original source should be used. The Microsoft InfoPath template created from the XML document successfully produced a form, similar to the HTML form. Once a template has been produced, any number of forms can be created, with the basketball XML structure as the native file format. 5.8.3 Validation of the XML document The validation of the XML document was successful, and useful messages are generated by the Sun Microsystems MSV validation tool. However, for a user entering data in the form, this is a less convenient method for data entry validation than could be performed ‘inline’ (within the form) by a Perl program. The validation of XML files to XML schema will be persisted with, as this is considered to be one of the strengths of using XML.
  • 46. 42 5.8.4 Display of XML documents The use of CSS to directly format XML documents has limitations. Some of these, such as the inability to add text, could be circumvented by designing the XML schema with CSS display properties as an objective. For example, no data would be held in attributes and titles and any necessary descriptive text would be included as text elements. The use of XSLT and HTML/CSS to transform a data-centric XML file is considered a superior solution. XSLT code is more complex than CSS style sheets but simpler than scripting languages such as Perl. A disadvantage of XSLT (in comparison with CSS) is that formatting styles must be applied inline, rather than placing all styles in the header or a separate style sheet document. This can be partially overcome by placing styles in block elements, such as the <div> or <table> tags. 5.9 Conclusions 1. A full or large part of the hierarchy of a schema should be given representation. 2. For full comprehension of a schema, the coding techniques used should be understood. 3. HTML/Perl and Microsoft InfoPath can both be used to produce forms for data entry that will generate a valid XML document. 4. The Microsoft InfoPath XML document requires a modification to validate against the basketball schema with the XSV tool. 5. A combination of CSS and XSLT style sheet technologies is more appropriate for formatting XML documents than CSS alone.
  • 47. 43 6 Review of MatML The advantages of using XML could be attained by a user developing their own schema, or alternatively by using an existing schema. A simple XML schema for a specific purpose can be readily developed, such as the basketball data system described in section 5.2. However, an XML schema for the domain relating to the application may already exist. A review of the existing schema would assist the decision the user must make between using that schema and developing a new one. A framework for the review of an XML schema would be required. An existing schema for materials test data, MatML, is mentioned in section 3.4. The MatML v3.0 schema is available for download from the MatML website (NIST (b), 2003). A framework for the review of an XML schema such as MatML could consist of criteria in the form of questions. These criteria would stem from the characteristics of XML. These characteristics include its hierarchical nature, human readability and how well the schema can be applied to its domain. In software engineering texts such as Pressman (2000), evaluation techniques for software systems often describe the concepts of verification and validation. Verification is concerned with software testing and quality assurance, whereas validation is concerned with meeting user requirements. The design of the schema would be expected to include detailed full cycles of verification and validation. A parallel could be drawn matching validation with effectiveness (performing the appropriate task) and verification with efficiency (performing the task correctly). For the purposes of this review, which is from the
  • 48. 44 perspective of a user, a number of issues are presented that are based on the above concepts. A number of such criteria that could be used to review the MatML schema follow; Effectiveness 1. How human-readable is the schema? a. Are meaningful names used for elements and attributes? b. Can a user plot the full hierarchy? 2. Is the hierarchy logical with respect to the stated domain? 3. Does the schema provide all the necessary components for the application? Efficiency 4. Is the code efficient? a. Does the schema avoid unnecessary repetition? b. Are the most appropriate coding techniques used? 5. Do validation tools accept the schema? These criteria are considered below. 6.1 How human-readable is the schema? The schema does use meaningful names for attributes and elements. For example ‘BulkDetails’ relates to the condition of bulk material before it is processed and made into components. Elements such as ‘ProcessingDetails’ and ‘DimensionalDetails’ are self-explanatory. Attributes are given meaningful names, other than those that are for identification reference purposes (see
  • 49. 45 section 6.4). The use of attributes is consistent, for example a data format is always described as an attribute. The documentation gives a description of each element and attribute. The full element hierarchy of MatML v3.0 is held within the schema definition, but viewing of this definition as written does not exhibit the full tree. The hierarchy instances a number of element definitions using the XML Schema term ‘ref’. In the schema, the use of ‘ref’ shows only the element referenced, any child elements of that element are shown only in its definition. The documentation states for each element, the child elements and attributes. A study of the schema was performed with a view to plotting the full hierarchy. A diagram that shows only the first level below ‘Material’ is shown in Figure 7, which includes the ‘BulkDetails’ and ‘ComponentDetails’, the two elements that will contain materials test data. The full hierarchy each of these would contain many instances of repeated element groups such as ‘Concentration’, which may obscure the view of the schema. Partial hierarchies excluding some such elements are shown in Figure 8 and Figure 9. These fuller hierarchies provide a more human-readable view of the MatML v3.0 schema. Figure 7 First level hierarchy of MatML Material BulkDetails ComponentDetails Metadata Graphs Glossary
  • 50. 46 Figure 8 Partially expanded hierarchy of MatML v3.0, ‘BulkDetails’ element BulkDetails Name Class Subclass Specification Source Form Processing details Name ParameterValue Result Notes Geometry Shape Dimensions Orientation Notes Characterization Formulae ChemicalComposition Compound Element Symbol Concentration Notes Concentration OR Element Symbol Concentration Notes Concentration Notes PhaseComposition Name Concentration Property data Data Qualifier Uncertainty Parametervalue Notes Notes DimensionalDetails
  • 51. 47 Name Value Units Qualifier Notes PropertyData Data Qualifier Uncertainty Parameter value Notes Notes Figure 9 Partially expanded hierarchy of MatML v3.0 material ComponentDetails (as BulkDetails, Figure 8) + AssociationDetails Associate Relationship ComponentDetails Metadata Data sourceDetails Name Notes PropertyDetails Name Units Notes MeasurementTechniqueDetails Name Notes SpecimentDetails Name Notes ParameterDetails Name Units Notes Graph Graph Glossary Term Name Definition Abbreviation Synonym Notes
  • 52. 48 6.2 Is the hierarchy logical with respect to the stated domain? The fuller hierarchy show the relationships between the elements in the schema. A schematic diagram Figure 10, is a simple representation of the domain, and is used to assess the schema with respect to the domain. This assessment is relatively qualitative, mainly concerned with the primary materials related elements. Figure 10 A schematic diagram for the domain of materials property data Bulk Material processing Formed Material processing Component characterisation Mechanical test Figure 10 indicates that tests and characterisation can be performed on the material at any stage of the processing. Mechanical tests include tensile strength and fatigue strength, and characterisation concerns chemical and micro-structural properties. MatML v3.0 includes a ‘PropertyData’ element to contain mechanical test data, and a ‘Characterisation’ element to contain characterisation data. The ‘Characterisation’ element handles the chemical and micro-structural properties with ‘ChemicalComposition’, and ‘PhaseComposition’ child elements respectively (see Figure 8). These latter
  • 53. 49 child elements include the necessary technical details, such as chemical element names and concentration, to describe them. The ‘Characterisation’ element is a child of ‘BulkDetails’, which can be used to describe the bulk material, and also a child of ‘ComponentDetails’, which can be used to describe formed material and components. This satisfies the requirement to accept test and characterisation data at all stages of processing. The primary material related elements are therefore considered to map logically to the domain. 6.3 Does the schema provide all the necessary components for the application? The MatML website (NIST (a), 2003) includes a number of examples that indicate provision of necessary components for materials test reports. One of the examples, MatML example 2 (Begley and Kaufman, 2003), relates to mechanical properties of an aluminium alloy. The source for the data used in MatML example 2 was obtained (Kaufman, 1999) to determine whether the MatML implementation included all components from that source. The components from the source are shown in Table 4 with MatML elements that can contain the information, following. Table 4 Required data for a materials test report, with corresponding MatML elements Components in data source MatML example 2 elements Materials specification BulkDetails/Specification Material reference Metadata/ParameterDetails Product form BulkDetails/Form Product heat treatment BulkDetails/ProcessingDetails Test specification Metadata/MeasurementTechniqueDetails Specimen details Metadata/SpecimenDetails
  • 54. 50 Test conditions Metadata/ParameterDetails Test results BulkDetails(or ComponentDetails)/PropertyData Test units Metadata/PropertyDetails/Units Notes Metadata/DataSourceDetails/Notes Term description Glossary/Term/Definition 6.4 Is the code efficient? The root element is ‘MatML_Doc’ which contains ‘Material’. The next level includes five elements, bulk details, component details, metadata, graphs and glossary. These elements have child elements and so on. The elements are defined in one of two ways;  within the parent elements as they are used  as separate elements (below ‘Materials’ in the hierarchy), referenced with ‘ref’. The former method has the advantage that the full hierarchy is visible and the disadvantage that the definition of the child element cannot be reused. A separately defined element can be used multiple times, results in a more modular system. Separately defined elements can also be instanced with a new name (as performed for the basketball schema in 5.2) but this method is not used in MatML v3.0. An attribute, ‘format’, in each of the elements ‘ParameterValue’, ‘Data’, and ‘Value’ repeat the same type definition. This type definition could be created as a separate custom data type. Identification attributes are used, defined by ‘xsd:ID’,.and referenced with ‘xsd:IDREF’. These attributes are generally used to associate data with related metadata, which contain elements and other attributes with meaningful names.
  • 55. 51 6.5 Do validation tools accept the schema? The web-based XSV system described in section 4.5 was used to check the MatML v3.0 schema file, matml.xsd. The output generated by XSV on testing matml.xsd is shown in Figure 11, below. Figure 11 XSV v1.4 software validation of MatML schema As stated in section 4.5, Microsoft InfoPath appears to perform a validation check on schema, when they are imported as a data source for form creation. An attempt to import the MatML v3.0 schema into Microsoft InfoPath, resulted in the display of the following error message; “matml.xsd#/schema/element[2][@name = 'Material']/complexType[1]/attribute[3][@name = 'local_frame_of_reference'] Two distinct members of the {attribute uses} must not have {type definition}s which are or are derived from ID. <attribute> : 'local_frame_of_reference'.” This appears to be state that an element can have only one identification attribute. The 'local_frame_of_reference' element has two identification attributes.
  • 56. 52 6.6 Conclusions 1. The MatML v3.0 schema provides a logical representation of the materials property domain. 2. A map of the majority of the MatML v3.0 hierarchy is human-readable 3. Most of the code is efficient, one attribute is defined multiple times. 4. The MatML v3.0 schema validates with the web-based XSV v2.5-2 software validation tool, but not all Microsoft InfoPath 2003 beta 2 edition.
  • 57. 53 7 Project description 7.1 Selection of data source A dataset for the project is required. A MatML example is considered a suitable source for data, having the following advantages  XML hierarchy for the data already exists  the example XML document can be used as a datum for comparison of the XML documents produced in the project. The data in MatML example 2, make use of a number of MatML elements as shown in Table 4, section 6.3. MatML example 2 was therefore used to provide the dataset and datum XML document for the project. One element from MatML example 2 was excluded from the project. This was the glossary element. A glossary of industry standard terms would not be necessary in a materials test report, terms could be looked up from reference texts if required. The full MatML example 2 document is shown in Appendix A: Original MatML Example 2 XML Document. MatML example 2 was modified to include the current URI declaration, ‘http://www.w3.org/2001/XMLSchema’, (see section 4.5). 7.2 Form creation 7.2.1 Selection of elements from data source In order to facilitate a simple interface and data entry format for the end-user, certain elements are left ‘hard-coded’ in the XML document. Hence, the XML documents produced from the project will not be entirely composed from form- generated elements.
  • 58. 54 The ‘hard-coded’ elements are intended to be;  PropertyData - all attributes  ParameterValue - all attributes  DatasourceDetails – attribute ‘ID’  MeasurementTechniqueDetails – attribute ‘ID’  PropertyDetails - all attributes, and child elements, two instances  ParameterDetails - all attributes and child elements, three instances. An effect of embedding the values of the above elements will be to fix the type of test to axial fatigue stress (PropertyDetails), and fix the test conditions (ParameterDetails). These are reasonable constraints, as the materials test report is intended to represent a particular test, for different materials. 7.2.2 Objectives The functional aims of the forms are, with MatML example 2 as the data source; 1. to enter metadata for; a. material details b. test details for axial fatigue stress 2. to enter test data for axial fatigue stress in two units; a. ksi b. MPa 3. to produce an XML document from the data, similar to MatML example 2. These objectives are to be achieved for two systems that can generate XML documents from forms;
  • 59. 55  HTML and Perl  Microsoft InfoPath. A number of design criteria were considered including; 1. grouping related fields in separate tables 2. use of an index to link to the tables 3. the data are to be held in individual fields rather than groups. These criteria are intended to simplify data entry, and minimise any scrolling. All three above criteria are expected to be applicable to the HTML/Perl system. Only the first design criterion is applied to the Microsoft InfoPath system, on the basis of the capabilities of the menu-driven features found in the feasibility study, 5.4. 7.3 Report Generation using XSLT The aims for the XML document transformation are to format all elements for display, specifically;  provide sections with headings for materials details and test details  provide a table with titles for test data and parameters  display metadata such as data source and notes at appropriate places  produce a single XSLT file that could be used for XML documents produced by HTML/Perl and Microsoft InfoPath.
  • 60. 56 8 Form creation for MatML example 2 data, using HTML 8.1 Index screen An index is presented on loading the web-form, that provides links to other parts of the form and other documents. This acts as a menu for data entry and viewing of form-generated XML documents. The index is shown in Figure 12. The first three options shown in the index all point to tables for data entry, these are described in section 8.2. All of the data in the web-form are encompassed in a single HTML form. Hence, there is one ‘submit’ button for all the data, which passes the data to a Perl script (described in section 8.3). This ‘submit’ button is given a screen to itself, and a link to it in the index. The fourth option is a link to the raw XML document that is produced by processing the form. In Microsoft Internet Explorer v6.0, the XML document is displayed as a hierarchy that can be collapsed and expanded. In Netscape v7.02, the XML document is displayed as plain text. The final option is to view the XML document with the XSLT transformation described in section 10. This link displays a formatted representation of the XML document in Internet Explorer v6.0 and Netscape v7.02. 8.2 Data entry tables Three tables were created, designated as tables a to c for discussion;  table a includes meta-information for material and test details.  table b includes test data in ksi unit  table c includes test data in MPa unit.
  • 61. 57 Table a uses text fields and list boxes to enter data. List boxes are used in a number of instances where entries could be limited to a selection from fixed values. The examples here are specification types, possible values given here are ASTM, BS, ISO and BS:EN. These are typical standards for materials and tests, but there are others that could be added in a commercial implementation. The fields are organised into logical sub-groups, with sub-headings. Some fields have labels that are different to the element names, to improve descriptions. A number of elements are given two fields instead of one for better flexibility. For example, the ‘MeasurementTechniqueDetails’ child element ‘Name’ is given two separate entries. The value of this element in MatML example 2 is ASTM E597. The form has one field with label ‘Type’ for the ‘ASTM’ value, and another field labelled ‘Number’ for the E597 value. These entries are under the main heading ‘Test Details’, and sub-heading ‘Specification’. Tables b and c display forms for ‘PropertyData’ and ‘ParameterValue’ elements. These include data for the test results and test conditions. In MatML example 2, the data are grouped in sequences of five values. In tables b and c, separate fields are provided for each data value. MatML example 2 includes a dataset in the MPa unit and a dataset in the ksi unit, to be entered in tables b and c respectively. The sequences of fields in tables b and c are presented in a column-wise format, for ease of data input. The HTML code for the web-form is shown in Appendix B: HTML v4 Code to Produce Web-Form for MatML example 2.
  • 62. 58 8.3 XML document generation from HTML form using Perl A Perl program handles the form data and writes an XML document that includes the form values. The technique used is that described in section 5.3. The Perl script writes all the form data together with characters that will generate XML elements valid for MatML example 2. The Perl script is shown in Appendix C: Perl v5.6.1 Script to Transform HTML Form Elements into MatML Document. The XML document generated by the HTML/Perl system is shown in Appendix D: XML Document Based on MatML Example 2, Produced by HTML/Perl.
  • 63. 59 Figure 12 Index of links for HTML form
  • 64. 60 Figure 13 Material and test details for HTML form
  • 65. 61 Figure 14 PropertyData element, HTML form
  • 66. 62 9 Form creation for MatML Example 2 data, using Microsoft InfoPath 9.1 Producing a form template The data fields describing material and test details were individually placed in a table, Figure 16. InfoPath automatically inserts element names as text labels for the fields, these labels can be changed without affecting the XML structure. The ‘PropertyData’ element is inserted into the form as a group, including all attributes and elements of ‘PropertyData’. This was necessary, as ‘PropertyData’ is a repeatable entity in MatML, and InfoPath can reflect this. On invoking the template to create a form, repeatable sections can be multiplied. The ‘PropertyData’ element links to meta-information in the ‘PropertyDetails’ and ‘ParameterDetails’ child elements of ‘MetaData’. This necessitated inclusion of ‘PropertyDetails’ and ‘ParameterDetails’ as repeating sections in the form. The ‘Data’ element was included as a single field for the five data entries separated by commas. A number of the data fields were included as dropdown list boxes (indicated by an arrow pointing down), to limit entries to appropriate values. For example ‘parameter’ can be pa1, pa2 or pa3 and ‘format’ can be either integer or exponential.
  • 67. 63 9.2 Generating a form from the template Once the template had been created, it was invoked to create the form. A number of form processing steps are required to produce an XML structure that matches that of MatML example 2. The steps in form production were; 1. repeat ‘PropertyData’ to give four entries 2. repeat ‘ParameterValue’ to give three entries for each PropertyData 3. repeat ‘ParameterDetails’ to give three entries 4. repeat ‘PropertyDetails’ to give two entries 5. enter fields with appropriate values from MatML example 2 The material and test data fields are shown in Figure 16. There are four ‘PropertyData’ forms, all follow the format in Figure 17. The form for ‘PropertyDetails’ and ‘ParameterDetails’ is shown in Figure 18. Note that the ‘PropertyData’ element includes attributes that link (via identification attributes) to meta-information for the data source, measurement technique, property type, and specimen type. These are ds1, mt1, pr1, and sp1 in Figure 17. A separate form entry is only given to property type (Figure 18) as this is the only detail that has more than one value, pr1 or pr2. The form for ‘PropertyData’ requires the user to look up the attribute for meta- information to the relevant similar attribute in ‘PropertyDetails’. For example, pr1 corresponds to mechanical property data in ksi, pr2 corresponds to mechanical data in MPa. A similar look up requirement exists for ‘ParameterValue’ fields.
  • 68. 64 9.3 Comments Filling in the form does require the user to look up some corresponding field values. It is possible that this requirement may be obviated by use of more advanced features of InfoPath. Investigation of this would be more appropriate when InfoPath becomes a final commercial product with full support information and instructional literature. The form created with Microsoft InfoPath includes some Microsoft-specific tags to associate the form to the InfoPath template. The root element also contains some Microsoft meta-information, as attributes. The full document is shown in Appendix E: XML Document Based on MatML Example 2, Produced by Microsoft InfoPath.
  • 69. 65 Figure 15 Screenshot of Microsoft InfoPath in design mode View of data source Form Design view
  • 70. 66 Figure 16 Material and test details, InfoPath form
  • 71. 67 Figure 17 PropertyData element, InfoPath form Figure 18 PropertyDetails and ParameterDetails elements
  • 72. 68 10 Report generation using XSLT and CSS 10.1 Introduction Style sheet technologies are discussed in section 3.6. A conclusion of the feasibility study, section 5.9, is that XSLT and CSS, rather than using CSS alone, would be the better method of applying formatting to an XML document. The application of these two style sheet technologies to the XML documents created by the HTML/Perl and Microsoft InfoPath forms, is described below. 10.2 Implementation 10.2.1 Templates XSLT files are coded in a script that uses template matching, where the template is a script segment that corresponds to an element name. A root template contains the initial code and can call other templates, which in turn can execute templates of child elements by a relative path reference. Each template can be referred to independently by its full path. The use of separate templates for elements and relative references can facilitate the application of a structure to the code. A number of full path references were found to be appropriate for MatML documents. Meta- information relating to the material is held in the ‘MetaData’ element and other elements within ‘BulkDetails’. For meta-information, references to element content are used with descriptive literals. HTML with inline CSS styles are used to provide formatting. Where possible, styles are applied in blocks using the HTML <div> tag.
  • 73. 69 10.2.2 Parsing grouped data The data, held in ‘PropertyData’ child elements ‘Data’ and ‘ParameterValue’, are in the form of lists separated by commas. These data hence require a method of parsing them to access individual values. The more advanced features of XSLT are used to achieve the parsing. These features included string handlers and variables. 10.2.3 Drawing a table The parsing is combined with the generation of HTML tags to produce a table for the data. The data are parsed consecutively, row-wise. A method of column-wise parsing using XSL was not ascertained. A separate table is created for the row headings. The integration of the headings with the data in the same table is rendered impractical by the multiple passes the XSLT processor makes over the ‘PropertyData’ and ‘ParameterValue’ elements. The table <align> tag is set to left for the headings table to facilitate the two tables being displayed side by side. The complete XSLT transformation program is shown in Appendix F: XSLT File to Transform XML Documents Based on MatML Example 2 to a One-page Report. 10.2.4 Association of XSLT file with XML documents Transformation of an XML document by an XSLT file requires an association between the two files. This is achieved by inserting a reference in the XML document to the XSLT file using the ’xml-style sheet’ tag. This tag provides a reference to the location of the XSLT file in the XML document. Loading the
  • 74. 70 XML document into a web-browser or other application capable of XSLT transforms, applies the transformation resulting in a formatted document. The appropriate ‘xml-style sheet’ reference tag was inserted into the HTML/Perl and Microsoft InfoPath XML documents. 10.2.5 Results The two XML documents were loaded into Microsoft Internet Explorer v6.0 and Netscape Navigator v7.02. The formatted documents were observed when printed. The results are shown in Table 5. Table 5 XML documents with XSLT transformations, viewed when printed from web-browsers Method of creating XML document Microsoft Internet Explorer v6.0.0 Netscape Navigator v7.02.0 HTML/Perl Fully Formatted document, Figure 19 Tables not side by side, all else formatted, Figure 20 (note 1), Figure 18 Microsoft InfoPath Fully Formatted document, (note 2) Figure 21 Tables not side by side, all else formatted, (note 1), Figure 22 Note 1: when viewed on-screen, the tables are side by side Note 2: by default, when an attempt to load the file into Microsoft Internet Explorer is made, control is passed to Microsoft InfoPath. This was circumvented by manually editing a copy of the document to exclude extra tags inserted by Microsoft InfoPath.
  • 75. 71 Figure 19 HTML/Perl generated XML document, formatted with XSLT/CSS, as displayed in Microsoft Internet Explorer v6.0, print view
  • 76. 72 Figure 20 HTML/Perl generated XML document formatted with XSLT/CSS, as displayed in Netscape Navigator v7.02, partial print view
  • 77. 73 Figure 21 Microsoft InfoPath generated XML document formatted with XSLT/CSS, as displayed in Microsoft Internet Explorer v6.0, print view
  • 78. 74 Figure 22 Microsoft InfoPath generated XML document with XSLT style sheet, as displayed in Netscape Navigator v7.02, partial print view
  • 79. 75 11 Assessment of project data 11.1Introduction Assessment of the data generated by the project is approached from following perspectives;  does the XML document contain all the data in the original MatML example 2 file?  is the XML document valid with respect to the schema?  is the document generated by the XSLT file and XML document a satisfactory report? 11.2 Comparison of original MatML example 2 document with project generated XML documents In order to ascertain whether the XML documents generated in sections 8 and 9 match the original MatML example 2 document, a file comparison exercise was performed. The file comparison software described in section 4.6 was used to highlight differences between text files. Note that the glossary element in MatML example 2 was excluded from the documents produced in the project, as discussed in section 7.1. The results are given below. 1. Method of creating XML document: HTML/Perl File name: Matmle2_htmlPerl_xslt.xml Datum file: Matmle2.xml Differences to datum file;  datum file comments omitted
  • 80. 76  datum file glossary content omitted 2. Method of creating XML document: Microsoft InfoPath File name: HTML/Perl Datum file: Matmle2_infopath.xml Differences to datum file  datum file comments omitted  datum file glossary omitted  extra Microsoft meta-information included  extra attributes in root element  order of attributes altered in some elements  extra attributes inserted into some tags, empty  self-terminating tag replaced by pair. The XML documents produced by HTML/Perl and Microsoft InfoPath included all critical data containing elements in MatML example2, other than glossary content. 11.3 Validation of XML documents produced by HTML/Perl and Microsoft InfoPath Validation was performed with the command-line XSV v1.4 system, as described in section 4.5. Microsoft InfoPath does not accept the MatML v3.0 schema, as noted in section 6.5. Hence, the Microsoft InfoPath validation facility cannot be used to check XML documents against the MatML v3.0 schema.
  • 81. 77 The command-line version of XSV was used. The XML document generated by HTML and Perl validated against MatML v3.0 without modification, Figure 23. The XML document produced by Microsoft InfoPath, however, generated an error relating to an attribute ‘lang’, Figure 24. This attribute was deleted with a text editor and the modified XML document saved to disk. Validation of this modified Microsoft InfoPath form proved successful, Figure 25. A summary of the results of the validation tests are shown in Table 6. Table 6 Results of validating XML documents against MatML with XSV v1.4 tool Method of creating XML document File name Validation test against MatML with XSV HTML/Perl Matmle2_htmlPerl_xslt.xml Pass Microsoft InfoPath Matmle2.infopathform.xml Fail Microsoft InfoPath (lang attribute removed) Matmle2.infopath_mod.xml Pass Figure 23 Output of the results from XSV v1.4, for the HTML/Perl generate XML document. <?xml version='1.0'?> <xsv docElt='{None}MatML_Doc' instanceAssessed='true' instanceErrors='0' rootType='[Anonymous]' schemaDocs='matml.xsd' schemaErrors='0' schemaLocs='None -> matml.xsd' target='file:/d:/XML/XSV/matmle2_htmlperl_raw.xml' validation='strict' version='XSV 1.203.2.47.2.4.2.14 /1.106.2.25.2.6 of 2002/06/15 18:59:35' xmlns='http://www.w3.org/2000/05/xsv'> <schemaDocAttempt URI='file:/d:/XML/XSV/matml.xsd' outcome='success' source='command line'/> <schemaDocAttempt URI='file:/d:/XML/XSV/matml.xsd' outcome='redundant' source='schemaLoc'/> Figure 24 Output of the results from XSV v1.4 for the Microsoft InfoPath XML document. <?xml version='1.0'?> <xsv docElt='{None}MatML_Doc' instanceAssessed='true' instanceErrors='1' rootType='[Anonymous]' schemaDocs='matml.xsd' schemaErrors='0' target='file:/d:/XML/XSV/matmle2_infopathform.xml' validation='strict' version='XSV 1.203.2.47.2.4.2.14/1.106.2.25.2.6 of 2002/06/15 1
  • 82. 78 8:59:35' xmlns='http://www.w3.org/2000/05/xsv'> <schemaDocAttempt URI='file:/d:/XML/XSV/matml.xsd' outcome='success' source='command line'/> <invalid char='296' code='cvc-complex-type.1.3' line='1' resource='file:/d:/XML/XSV/matmle2_infopathform.xml'>undeclared attribute {htt p://www.w3.org/XML/1998/namespace}:lang</invalid> </xsv> Figure 25 Output of the results from XSV v1.4, for the modified Microsoft InfoPath XML document. d:XMLXSV>xsv matmle2_infopath_mod.xml matml.xsd. <?xml version='1.0'?> <xsv docElt='{None}MatML_Doc' instanceAssessed='true' instanceErrors='0' rootType='[Anonymous]' schemaDocs='matml.xsd.' schemaErrors='0 ' target='file:/d:/XML/XSV/matmle2_infopath_mod.xml' validation='strict' version='XSV 1.203.2.47.2.4.2.14/1.106.2.25.2.6 of 2002/06/15 18:59:35' xmlns='http://www.w3.org/2000/05/xsv'> <schemaDocAttempt URI='file:/d:/XML/XSV/matml.xsd.' outcome='success' source='command line'/> </xsv> 11.4 Comparison of project generated reports with word- processed report A word-processed report was prepared from the data in MatML example 2, to provide a datum for comparing reports generated by the project. This report is shown in Figure 26. Criteria for a comparison are defined as follows;  heading styles for materials details, test details, test data and source  all source data included  table for test data, with; o formatted headings o structured display of data. In section 10, XML documents produced from two different systems were displayed in two different browsers, Figure 19 to Figure 22. The document that was displayed without any modifications or display problems was that produced by the HTML/Perl system and displayed in Microsoft Internet Explorer v6.0,
  • 83. 79 Figure 19. This was used to compare the better project results against the datum, word-processed report, Figure 26. 11.4.1 Results The results of the comparison between the project generated report and word- processed report for formatted headings and data included are shown in Table 7 and Table 8 below; Table 7 Comparison of headings in project generated report and word- processed report Heading XML document Word-processed document Materials details Yes Yes Test details Yes Yes Test data Yes Yes Source Yes Yes Table 8 Comparison of included data in project generated report and word-processed report Data element Included in document Material designation Yes Yes Materials specification Yes Yes Material form Yes Yes Processing details Yes Yes Test specification Yes Yes Specimen type Yes Yes Diameter Yes Yes Test data / ksi (3 parameters) 10 values 10 values Test data / MPa (3 parameters) 10 values 10 values Notes Yes Yes Source Yes Yes Note:
  • 84. 80 The word-processed data table presented the data with each element in column-wise form. The project generated report presented the data with each element in row-wise form. The data tables in both documents contained formatted headings 11.4.2 Comparison summary The report generated by the HTML/Perl system and displayed in Microsoft Internet Explorer v6.0 is essentially similar to the word-processed report. The main difference is that the data table in the word-processed is effectively transposed in the project generated report, as discussed in section 10.2.3.