The document discusses the history and standardization of HTML. It notes that HTML was originally created by Tim Berners-Lee at CERN in 1990 for scientific purposes. The development of graphical web browsers by Mosaic and Netscape in the early 1990s and the subsequent "Browser Wars" led to HTML being defined by each browser instead of standards. The World Wide Web Consortium was founded in 1994 to establish standards for HTML. The document outlines that HTML standardization involves both syntax, which defines valid characters and strings, and semantics, which describes the meaning of elements.
1. Trinity College
Markup Languages
Timothy Richards
Trinity College, Hartford CT • Department of Computer Science • CPSC 225
2. HTML
Hypertext Markup Language
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 2
3. HTML
Hypertext Markup Language
A Family of Related Languages
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 3
4. HTML
Hypertext Markup Language
A Family of Related Languages
Most documents communicated
on the web are written using HTML.
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 4
5. HTML
After the next few lectures
you should be able to...
• Create standards-compliant static HTML documents
• Know where to find the reference definitions of HTML and
XML and be able to understand (most of) these defns.
• Determine if an XHTML document is syntactically correct
by consulting an XML document type definition or schema.
• Describe the history of HTML and relationship between
HTML, XML, and XHTML.
• Discuss pros and cons of following standards.
• Explain the new additions to the next version: HTML 5
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 5
6. HTML Example
<!DOCTYPE html
PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN”
“http://www.w3.org/1999/xhtml”>
<html xmlns=”http://www.w3.org/1999/xhtml”>
<head>
<title>HelloWorld</title>
</head>
<body>
<p>Hello World!</p>
</body>
</html>
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 6
7. HTML Example
Every HTML document contains two types of information
<!DOCTYPE html
PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN”
“http://www.w3.org/1999/xhtml”>
<html xmlns=”http://www.w3.org/1999/xhtml”>
<head>
<title>HelloWorld</title>
</head>
<body>
<p>Hello World!</p>
</body>
</html>
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 7
8. HTML Example
The markup information (tags)
<!DOCTYPE html
PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN”
“http://www.w3.org/1999/xhtml”>
<html xmlns=”http://www.w3.org/1999/xhtml”>
<head>
<title>HelloWorld</title>
</head>
<body>
<p>Hello World!</p>
</body>
</html>
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 8
9. HTML Example
The character data of the document (not tags)
<!DOCTYPE html
PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN”
“http://www.w3.org/1999/xhtml”>
<html xmlns=”http://www.w3.org/1999/xhtml”>
<head>
<title>HelloWorld</title>
</head>
<body>
<p>Hello World!</p>
</body>
</html>
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 9
10. HTML Example
Document Type Declaration (more later)
<!DOCTYPE html
PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN”
“http://www.w3.org/1999/xhtml”>
<html xmlns=”http://www.w3.org/1999/xhtml”>
<head>
<title>HelloWorld</title>
</head>
<body>
<p>Hello World!</p>
</body>
</html>
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 10
11. HTML Example
Document Instance
<!DOCTYPE html
PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN”
“http://www.w3.org/1999/xhtml”>
<html xmlns=”http://www.w3.org/1999/xhtml”>
<head>
<title>HelloWorld</title>
</head>
<body>
<p>Hello World!</p>
</body>
</html>
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 11
12. HTML Example
Each tag is either a start tag or end tag
<!DOCTYPE html
PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN”
“http://www.w3.org/1999/xhtml”>
<html xmlns=”http://www.w3.org/1999/xhtml”>
<head>
<title>HelloWorld</title>
</head>
<body>
<p>Hello World!</p>
</body>
</html>
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 12
13. HTML Example
The “word” in a tag is called the element name
<!DOCTYPE html
PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN”
“http://www.w3.org/1999/xhtml”>
<html xmlns=”http://www.w3.org/1999/xhtml”>
<head>
<title>HelloWorld</title>
</head>
<body>
<p>Hello World!</p>
</body>
</html>
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 13
14. HTML Example
This is called the content of the head element.
<!DOCTYPE html
PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN”
“http://www.w3.org/1999/xhtml”>
<html xmlns=”http://www.w3.org/1999/xhtml”>
<head>
<title>HelloWorld</title>
</head>
<body>
<p>Hello World!</p>
</body>
</html>
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 14
15. HTML Example
Each document has a root element
<!DOCTYPE html
PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN”
“http://www.w3.org/1999/xhtml”>
<html xmlns=”http://www.w3.org/1999/xhtml”>
<head>
<title>HelloWorld</title>
</head>
<body>
<p>Hello World!</p>
</body>
</html>
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 15
16. HTML Example
Each document has a root element
<!DOCTYPE html
PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN”
“http://www.w3.org/1999/xhtml”>
<html xmlns=”http://www.w3.org/1999/xhtml”>
<head>
<title>HelloWorld</title>
</head>
<body> This is always
<p>Hello World!</p> html in HTML
</body>
documents
</html>
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 16
17. HTML Example
This document strictly conforms
to the XHTML 1.0 standard
<!DOCTYPE html
PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN”
“http://www.w3.org/1999/xhtml”>
<html xmlns=”http://www.w3.org/1999/xhtml”>
<head>
<title>HelloWorld</title>
</head>
<body>
<p>Hello World!</p>
</body>
</html>
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 17
18. HTML Example
When viewed as a tree, XHTML 1.0 Documents always
have two children: head and body
<!DOCTYPE html
PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN”
“http://www.w3.org/1999/xhtml”>
<html xmlns=”http://www.w3.org/1999/xhtml”>
<head>
<title>HelloWorld</title>
</head>
<body>
<p>Hello World!</p>
</body>
</html>
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 18
19. HTML Example
When viewed as a tree, XHTML 1.0 Documents always
have two children: head and body
<!DOCTYPE html
PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN”
“http://www.w3.org/1999/xhtml”>
<html xmlns=”http://www.w3.org/1999/xhtml”>
<head>
<title>HelloWorld</title>
</head>
The head element is
<body>
used to provide certain
<p>Hello World!</p>
instructions to the browser
</body>
</html>
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 19
20. HTML Example
When viewed as a tree, XHTML 1.0 Documents always
have two children: head and body
<!DOCTYPE html
PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN”
“http://www.w3.org/1999/xhtml”>
<html xmlns=”http://www.w3.org/1999/xhtml”>
<head>
<title>HelloWorld</title>
</head>
The body element defines
<body>
the content of the page.
<p>Hello World!</p>
</body>
</html>
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 20
21. HTML Example
This document as a tree.
html
head body
title p
“HelloWorld” “Hello World!”
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 21
22. HTML History
• Tim Berners-Lee (CERN, 1990)
• CERN - Physics Research Center
• Originally designed with science and
engineering interest in mind.
• •1992 Elements:
title
• paragraph
• hyperlinks
• headings
• simple lists
• glossaries
• monospace text
• address blocks & search terms in URL
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 22
23. HTML History
• Tim Berners-Lee (CERN, 1990)
• CERN - Physics Research Center
• Originally designed with science and
engineering interest in mind.
• •1992 Elements:
title
• paragraph
• hyperlinks That was it!
• headings
• simple lists
• glossaries
• monospace text
• address blocks & search terms in URL
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 23
24. HTML History
• Marc Andreessen, Eric Bina
• National Center for Supercomputer Applications (NCSA)
• Graphical Browser: Mosaic (1993)
• Key Developers Left...
• To form Netscape Communications!
• Microsoft
• Created a team to develop Internet Explorer.
• The “Browser Wars”!
• 1993-1997 HTML was defined by browser support
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 24
25. HTML History
• Marc Andreessen, Eric Bina
• National Center for Supercomputer Applications (NCSA)
• Graphical Browser: Mosaic (1993)
• Key Developers Left...
• To form Netscape Communications!
• Microsoft
• Created a team to develop Internet Explorer. This was
BAD!
• The “Browser Wars”! Why?
• 1993-1997 HTML was defined by browser support
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 25
26. HTML History
• HTML Developers
• Required to “code” to each browser’s idiosyncrasies
• World Wide Web Consortium (W3C)
• Launched in October of 1994 (16 years ago this month!)
• Tim Berners-Lee
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 26
27. HTML History
• HTML Developers
• Required to “code” to each browser’s idiosyncrasies
• World Wide Web Consortium (W3C)
• Launched in October of 1994 (16 years ago this month!)
• Tim Berners-Lee
• Goal: Produce Web Standards!
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 27
28. HTML History
• Standards lagged behind de facto standards
• 2.0 was a standard 6 months after draft for 3.0 released
• 3.0 was never a standard
• 3.2 was adopted as a standard by W3C in 1997
• 3.2 specification captured “practice” of 1996 (year behind)
• HTML 4 released in December 1997
• HTML 4.01 is the “standard”
• HTML 5 is up and coming!
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 28
29. HTML History
HTML standards are now being
adopted from W3C rather than
browser manufactures.
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 29
30. HTML History
HTML standards are now being
adopted from W3C rather than
browser manufactures.
There are two important aspects
of standardization for HTML.
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 30
31. HTML History
HTML standards are now being
adopted from W3C rather than
browser manufactures.
There are two important aspects
of standardization for HTML.
Syntax
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 31
32. HTML History
HTML standards are now being
adopted from W3C rather than
browser manufactures.
There are two important aspects
of standardization for HTML.
Syntax Semantics
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 32
33. HTML History
The Syntax
Defines the strings of characters that can be
used to represent an HTML document and
those that cannot.
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 33
34. HTML History
The Syntax
Defines the strings of characters that can be
used to represent an HTML document and
those that cannot.
< > A-Z a-z / * & % $ @ ! 0-9
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 34
35. HTML History
The Semantics
A description of what the various elements of
a syntactically correct document mean.
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 35
36. HTML History
The Semantics
A description of what the various elements of
a syntactically correct document mean.
The p element represents a paragraph
The a element represents an anchor
The href attribute represents a hyperlink
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 36
37. HTML History
The Semantics
Formal methods do exist for defining
semantics, however, often a language is defined
using natural-language descriptions.
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 37
38. HTML History
The Semantics
Formal methods do exist for defining
semantics, however, often a language is defined
using natural-language descriptions.
For the syntax of computer languages,
however, we use a metalanguage to describe
components of the language.
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 38
39. HTML History
For languages such as Java, a formal notation
known as Backus-Naur Form (BNF) is used.
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 39
40. HTML History
For languages such as Java, a formal notation
known as Backus-Naur Form (BNF) is used.
BNF could be used to define HTML...
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 40
41. HTML History
For languages such as Java, a formal notation
known as Backus-Naur Form (BNF) is used.
BNF could be used to define HTML...
But, SGML
(Standard Generalized Markup Language)
is used for HTML 4.01
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 41
42. HTML History
For languages such as Java, a formal notation
known as Backus-Naur Form (BNF) is used.
BNF could be used to define HTML...
Turns out SGML is VERY complex!
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 42
43. HTML History
For languages such as Java, a formal notation
known as Backus-Naur Form (BNF) is used.
BNF could be used to define HTML...
Turns out SGML is VERY complex!
W3C introduced XML in 1998 to describe
HTML...
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 43
44. HTML History
For languages such as Java, a formal notation
known as Backus-Naur Form (BNF) is used.
BNF could be used to define HTML...
Turns out SGML is VERY complex!
W3C introduced XML in 1998 to describe
HTML...
This resulted in XHTML 1.0, which is
syntactically identical to HTML 4.01
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 44
45. HTML History
For languages such as Java, a formal notation
known as Backus-Naur Form (BNF) is used.
BNF could be used to define HTML...
Turns out SGML is VERY complex!
W3C introduced XML in 1998 to describe
With Some HTML...
Restrictions
This resulted in XHTML 1.0, which is
syntactically identical to HTML 4.01
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 45
46. HTML History
• XHTML 1.0
• Semantically identical to HTML 4.01
• Restricts from of HTML 4.01 generality
• Abstract Syntax Trees (AST)
• Representation of HTML elements “abstractly” as trees
• Concrete Syntax Trees (CST)
• Representation of HTML elements as characters in trees
• XHTML 1.0 AST == HTML 4.01 AST
• XHTML 1.0 CST != HTML 4.01 CST
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 46
47. HTML History
• XHTML 1.0 Differences
• Omitted tags are not allowed
• All element and attribute names must be lowercase
(HTML 4.01 names are case insensitive)
• All attribute values must be quoted (not always necessary
in HTML 4.01)
• Differences are not burdensome
• They make it easier to write software to process HTML
documents
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 47
48. Digging into XHTML
More on this next time!
Any questions?
Trinity College, Hartford CT • Department of Computer Science • CPSC 225 48