How to Write XHTML

HMTL has been around for a while and there is concern about whether older HTML can age gracefully enough to transition into the new HTML without looking seedy and worn out.

Versions of HTML previous to HTML 4 are like the Portrait of Dorian Gray. They may still look pretty decent in a browser, but underneath they are turning decrepit (or deprecated, to use the W3C's term).

What we need is HTML that ages gracefully, like Katherine Hepburn or Sean Connery, HTML that just keeps getting better. Luckily, we know what we want our HTML to be when it grows up: XHTML.

XHTML: What Is It?

It is eXtensible Hypertext Markup Language. XHTML is an application of XML. It uses HTML as the document root. The W3C refers to it as, "the next version of HTML," and as, "a reformulation of HTML 4 in XML 1.0." If you know HTML, it isn't much of a leap to learn XHTML. But XHTML is picky since it uses XML's rules. You can't write sloppy XHMTL and get away with it the way you can in HTML.

Why Bother?

HTML is old. It's been around since 1992. Even with the advent of CSS and HTML 4.0, HTML can't do everything we want it to. XHTML, first proposed in Sept. 1999, can do more. It is instantly compatible with XML. It can be extended with modules that do things HTML cannot, such as math. It is a step toward tagging for devices other than computers, such as phones or hand held devices. XHTML can be used right now with existing browsers.

The Rules of XHTML

  1. XHMTL must be well-formed (1)
  2. All tags must be properly nested (2)
  3. There must be an opening and a closing for every tag (3)
  4. Empty elements must end with /> (4)
  5. It must validate against one of three Document Type Definitions (DTD) (5)
  6. The document root element must be HTML (6)
  7. The root element of the document must designate the XHTML 1.0 namespace (7)
  8. Element and attribute names must be in lower case (8)
  9. Attribute values must always be quoted (9)
  10. Attribute-value pairs must be written in full. Attribute names such as compact and checked cannot occur in elements without their value being specified (10)

The Details

Rule 1: XHMTL must be well-formed.

A well-formed document contains at least one element. An element is a tag. There must be a unique opening and closing element which forms the root element of the document. In the case of XHTML, this root element must be <html></html>

Rule 2: All tags must be properly nested

Well-formed XHTML is properly nested. Nesting is the use of one or more elements inside other elements. The last element opened must be the first one closed. Or, to think of it in another way, elements must be closed in the reverse order from which they were opened. For example, a bold element <b> can be nested inside a paragraph element <p>. Proper nesting would look like this:

<p><b>Rule 2: All tags must be properly nested</b></p>

Improper nesting would look like this:

<p><b>Rule 2: All tags must be properly nested</p></b>

Rule 3: There must be an opening and a closing for every tag

For every <p> there must be </p>. For every <li> there must be </li>. For every <option> there must be </option>. ALL tags must be closed. In HTML you could leave tags unclosed and it would display just fine. XHTML won't work unless all tags are closed.

Rule 4: Empty elements must end with />

An empty element has no closing tag. Some of the empty tags you may use are <img>, <hr>, <meta>, and <br>. To make these tags comply with XML rules, add a space and the forward slash at the end of the tag. For example: <hr />, <br />.

In an empty tag with attributes, add the space and forward slash after the attributes. For example: <img src="images/button.gif" />.

Rule 5: It must validate against one of three Document Type Definitions (DTD)

The three DTDs for XHTML are Transitional, Strict, and Frameset. Open the XHTML page with one of these:

This rule differs slightly from XML rules. XML does not have to be valid, that is, there does not have to be a declared DTD in an XML document. But in XHTML, you do need the DTD every time.

Rule 6: The document root element must be HTML

Immediately following the DTD, state the document root element as HTML. This is combined with the designation of the namespace. See Rule 7.

Rule 7: The root element of the document must designate the XHTML 1.0 namespace

The tag is:
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US" xml:lang="en-US">

This line follows the DTD from Rule 5, and does the twofold task of declaring the root element to be HTML and designating the XHTML 1.0 namespace. XHTML 1.0 became a W3C recommendation on Jan. 26, 2000.

Rule 8: Element and attribute names must be in lower case

HTML isn't case sensitive. Many HTML editors, including the ubiquitous Netscape Composer, write HTML tags in all caps. However XML is case sensitive. XHTML tags need to be written in lower case. A good HTML editing tool will allow you to set a preference for lower case tags. When typing by hand in Notepad or some other text editor you actually save keystrokes by typing lower case tags.

Rule 9: Attribute values must always be quoted

Browsers can interpret sloppy HTML and understand a tag such as <table width=400>. Valid and well-formed XHTML, however, requires quoted values, for example <table width="400">. Numbers, single words - all values must be quoted. The target tags using underscores continue not to require quoted values, so <a href="newpage.html" target=_top> is correct.

Rule 10: Attribute-value pairs must be written in full. Attribute names such as compact and checked cannot occur in elements without their value being specified

Perhaps because of its seeming redundancy, some attributes such as checked don't require a value in HTML. Stated in XHTML, they would:
<option checked="checked">

Example

Here's an example of an XHTML document.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US" xml:lang="en-US">

<head>

<title>It's XHTML!</title>

</head>

<body>

<h1>Hello world</h1>

<hr />

<p align="center">XHTML looks like HTML.</p>

</body>

</html>

 

Changing your old HTML files to XHTML

The W3C offers a utility called HTML Tidy which will convert your previous HTML files into XHTML and clean up bad markup in the process.