This document is a brief introduction to HTML and XHTML. W3C (the World Wide Web Consortium) provides specifications of these languages on its web site [HTML4.01, XHTML1.1]. Once you are reasonably familiar with the possibilities the languages provide, these specifications are all you need to understand and write web pages.
HTML (Hyper Text Markup Language) is the original Web page language. XHTML is an XML language based on HTML, with the exactly the same power but a more regular syntax that can be processed by XML tools. Both languages are accepted by current browsers (this page is XHTML). If you are just starting to learn, you should learn XHTML.
Both HTML and XHTML are designed to express
the structure of documents,
rather than merely their presentation.
A separate language, CSS (Cascading Style Sheets),
provides powerful and detailed control of presentation,
far more than was ever provided by HTML
[CSS1,
CSS2.1].
As the HTML specification states,
Experience has shown that separating the structure of a document
from its presentational aspects
reduces the cost of serving a wide range of platforms, media, etc.,
and facilitates document revisions.
(X)HTML elements are divided into block elements that are rendered as rectangles of the window, usually in a vertical stack; and inline elements that are catenated horizontally and then broken into lines that are rendered as parts of a block.
You may find it helpful to save this page and examine it with a text editor to see how a web page is put together. You can experiment with (X)HTML by writing your own test pages and opening them in a browser to see how they may be presented.
The W3C specifications are the definitions of the two languages; some tools and browsers (especially those produced by for-profit groups) produce or read variations on the definitions, but it is to your advantage to stick to the standard.
The XHTML specification is quite brief, and simply explains how to express what HTML expresses in an XML syntax.
The HTML specification contains the meat of both languages. Besides the table of contents, the specification has an index, and useful tables of all the elements and all their attributes. The table of contents is on the main page, and the tables of elements and attributes and the index are accessible through links at the top of each page.
Once you have become reasonably familiar with XHTML and HTML, you probably will want to play with how your pages are presented. The best way to do this is to learn and use CSS. Two versions of CSS have been defined at this writing: CSS1 is much simpler and will do most of what you want, while CSS2.1 is considerably more complex and may be postponed until you need something CSS1 won't do.
You may define a separate style sheet
and link to it from your web documents
(that is how this document's style is defined)
using a link
element in the document's head
;
or you can include style information
for a single page
in a style
element in that page's
head
;
and finally,
an individual element's presentation can be controlled
by style information in its style
attribute.
More-local style has higher priority
(thus the 'cascading' in the CSS name).
W3C provides an online validation service for web pages and style sheets. Click the icons at the bottom right of this page to validate this page and its style information; you will be taken to URLs where you can also validate your own documents, whether they are accessible by URL or are local copies on your machine. You should take advantage of this service.
An XHTML document consists of nested elements. Each element consists of a start tag, the element's content (if any), and an end tag.
A start tag consists of
<
,
the tag's name,
optionally some attributes and values, and >
.
An end tag consists of </
,
the tag's name, and >
.
The content can be text, elements, a combination of text and elements,
or nothing at all.
An element with no content can be written briefly as
an empty element tag consisting of
<
,
the tag's name,
optionally some attributes and values, and />
.
Elements must be nested:
you can write
<dfn><i>term</i></dfn>
but not
<dfn><i>term</dfn></i>
Examples:
start tag <h1 id='example'>
,
end tag </h1>
,
empty element tag <h1 id='example' />
.
The older HTML language differs from XHTML primarily at this level: end tags are optional or forbidden for some elements, and empty-element tags are not allowed. The specific version of HTML being used is specified by an initial !DOCTYPE element.
Figure 1. Three successively deeper views of an XHTML document
An HTML document consists of
a doctype
followed by
a single html
element.
The html
element contains
a head
element and
a body
element
(see Figure 1).
The !DOCTYPE declaration states what version of HTML you believe you are using. If present, it begins your (X)HTML file. If absent, a browser will attempt to guess what version you are using and render it as best it can, but it is better (and produces more reliable results) to state the doctype explicitly.
The standard HTML4.01 !DOCTYPE is
<!DOCTYPE HTML PUBLIC '-//W3C//DTD HTML 4.01//EN' 'http://www.w3.org/TR/html4/strict.dtd'>
The standard XHTML doctype is discussed below.
W3C provides a list of standard doctypes.
The head
element
contains general information about the document
and that controls how the document as a whole is rendered.
The title
element
contains the document's title.
This is typically displayed at the top of the browser window.
The meta
elements specify
various kinds of meta-information about the document.
Each meta
element sets the value of a property.
<meta name='
NAME'/>
meta name=
sets the property whose NAME is given.
Some properties are author
(to specify you wrote the page)
and description
and keywords
(to summarize the page for search engines).
The value is given by the element's content
attribute.
<meta http-equiv='
NAME'/>
meta http-equiv=
tells
an HTTP (HyperText Transfer Protocol) server
how to serve the page.
Examples:
<meta http-equiv='content-type'
content='text/html; charset=ISO-8859-1'>
states what character set the document uses
(this is the standard character set);
<meta http-equiv='Content-Style-Type' content='text/css'>
states that style information will be in CSS.
link
elements specify
links to files related to the current document.
Example:
<link href='../alspaugh.css' rel='stylesheet'
type='text/css'/>
specifies a stylesheet for the current document's presentation.
A style
element contains
style for the document.
Example:
<style type='text/css'>
p { margins: 0 0 1ex .5em; }
</style>
The body
element contains
what is rendered in the browser window.
The headings of sections of an XHTML document
are given as the contents of
h1
, h2
,
h3
, h4
,
h5
, and h6
elements.
Top level headings are given in h1
elements;
second-level headings
within the section headed by an h1
element
are given in h2
elements;
and so on.
The sections themselves follow their headings; see Figure 1.
A p
element contains the text of a paragraph.
In HTML the end tag
of a p
element is optional.
A blockquote
element contains
a long quotation of paragraph size.
(Compare q
.)
An ol
element contains
a list whose items are numbered.
Each items consists of an li
element.
In HTML the end tag
of an li
element is optional.
A ul
element contains
a list whose items are not numbered.
Each items consists of an li
element.
In HTML the end tag
of an li
element is optional.
A dl
element contains
a list whose items are terms and their definitions.
Each items consists of
a dt
element containing the term,
followed by a dd
element containing its definition.
In HTML the end tag
of a dt
or dd
element is optional.
Tables are organized within a table
element.
Briefly,
a table
element
contains tr
elements each of which contains a row of the table,
and
each tr
element contains
td
elements, one for each data cell of the table,
and possibly th
elements for each heading cell.
Ordinarily each tr
element
contains the same number of th
and td
elements.
A td
or th
element for a cell that is to extend over
more than one column
is indicated by its colspan
attribute,
for example colspan='2'
for a cell spanning two columns.
Similarly,
a cell spanning two or more rows
is indicated by a rowspan
attribute.
For example,
<td colspan='2' rowspan='3'>
is a cell covering two columns and three rows.
In such cases,
the tr
elements for the affected rows
must contain correspondingly fewer
td
and th
elements.
table
and its subelements
have a variety of attributes and other elements
for expressing a wide range of tables.
See the W3C specification for more details.
An img
element is rendered as an image.
The image is specified by the element's src
attribute,
whose value is the URL of the image.
It is recommended that a brief summary of the image
be given in the element's alt
attribute,
which is shown if the image can't be displayed for some reason.
In HTML an img
element is not allowed to have an
end tag.
These elements have no meaning in terms of the structure of the document; they merely control its presentation. Where appropriate, use the functional elements above instead.
pre
element are
rendered in the lines and spacing in which it appears there.
A link is expressed using an a
(anchor) element.
The text that if mouse-clicked
sends a browser to another location
is the contents of the a
element,
and the destination is given as the href
attribute of the element.
The destinations can be entire URLs,
or can be an element within an (X)HTML page
(including the page containing the link).
In order to be a destination,
an element must have an id
attribute
whose value is a name unique within the page.
These names must
match the regular expression
[A-Za-z][-_.:A-Za-z0-9]*
Links to a named element
have an href
attribute value consisting of
the URL, a pound sign #, and the name.
Links to a named element in the same document
may omit the URL,
and be just # followed by the name.
Finally, a link to the head of the current document
may simply be the empty string.
URLs may be the Web address of a page, or a local relative pathname beginning at the directory containing the current page.
Example:
<dl> <dt>HTML</dt> <dd>The current version is <a href='#HTML4.01'>HTML4.01</a>.</dd> <dt id='HTML4.01'>HTML4.01</dt> <dd>The <a href='http://www.w3.org/TR/1999/REC‑html401‑19991224' >HTML 4.01 specification</a>.</dd> There is also a <a href='../HTML401.html'>local copy</a>. </dl>
These elements identify the function in the document of specific words or phrases. Browsers generally render them with appropriate formatting.
code
element contains a fragment of program input
or programming language code.
dfn
element contains
the defining instance of a term or phrase,
the one that is part of the term or phrase's definition.
em
element contains
a word or phrase that is emphasized.
q
element contains
a short quotation of less-than-paragraph size.
(Compare blockquote
.)
These elements have no meaning in terms of the structure of the document; they merely control its presentation. Where appropriate, use the functional elements above instead.
b
element are rendered in boldface.
br
empty element
(<br/>
) causes a line break.
i
element are rendered in
italics.
tt
element are rendered in
a teletype or monospace font.
(X)HTML provides for a number of special characters and symbols.
These are written using character entities consisting of
an ampersand, a code, and a semicolon.
Each symbol has a mnemonic code and a hex code.
For example,
the non-breaking space character
is written either as
or  
and is rendered as a space
but unlike a normal space, a line break can't take place there.
Some of the most useful character entities are:
Mnemonic | Hex | Character |
---|---|---|
& | " |
ampersand & |
< | < |
less-than sign < |
> | > |
greater-than sign > |
|   |
non-breaking space |
Hundreds of character entities are listed in [HTML4.01].
In addition,
any UTF8 character
can be placed in an (X)HTML document
by enclosing its Numeric Character Reference (decimal number)
in
&#
and ;
,
for example
☄
☄
and
❧
❧.
style
section of the head
or in separate style files,
and may be given for specific formatting.
An element's class
attribute
may name a defined class,
in which case the element inherits the class's style.
style
attribute
whose value gives specific CSS style information for that element.
title
attribute value
when the mouse hovers over the element.
(X)HTML allows comments enclosed in
<!--
and -->
.
Do not put strings of two or more hyphens inside a comment.
Unnecessary but interesting technical explanation:
Because a comment is syntactically
a markup declaration in <!>
containing a markup declaration comment within -- --
,
strings of two or more hyphens are not allowed within a comment (they end it).
It's pretty straightforward to make an HTML file be XHTML.
The W3C's
tidy
utility will do it for you, if you like.
If you wish to write XHTML in the first place,
follow these easy steps:
<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Strict//EN' 'http://www.w3.org/TR/xhtml1/DTD/xhtml1‑strict.dtd'>
<html xmlns='http://www.w3.org/1999/xhtml' xml:lang='en' lang='en'>
area base basefont br col frame hr img input link meta param
)
as
an empty-element tag,
and give every other empty element an end tag.
id
attribute
rather than the name
attribute of any element.
This document has just touched on the most basic parts of (X)HTML. For much more information, see the references below.