Notes on Validation

In an HTML or XHTML context, it's useful to keep in mind the distinction between "Validators" and "Checkers".

"Validators" (properly so called in an SGML context) apply strictly the syntax rules set out in the DTD that is nominated in the DOCTYPE, nothing more and nothing less, without fear or favour. "Checkers" (or linters) may apply checks on both the syntax and some other issues, such as common mistakes of authoring style or known shortcomings in browser support. Both kinds of software are useful to have, but to understand better what they are doing you should know which is which. Beware: some heuristic checkers misleadingly call themselves "validators". Some validators (properly so called) also include some additional checks, which are not strictly part of the validation.

Validators and online validation services

Let me re-iterate that HTML validators, properly so called, are engines that apply the rules of the relevant DTD - the one nominated in the DOCTYPE, nothing more and nothing less, so in terms of the syntax-validating job they do they all must be identical; any differences lie in their capabilities, and in the cosmetics of their user interface, error report options etc.

It should also be noted that the HTML specification texts contain a number of syntactic restrictions that are not expressed in the DTD itself, and which the validators will therefore not be checking for. To take just one example, the specification text for HTML4.0 states that table cell widths must be a number of pixels, but the DTD says that the cell width attribute is CDATA, which means that pretty much any character string will pass the validator. This kind of issue should be kept in mind, but it by no means implies that validation is pointless or useless, as it amuses some people to claim.

The reports from the validators can be quite obscure, so be sure to consult their respective FAQs. The most common confusion is that a failure to close an open element at the appropriate place, may cause the validator to report that it is looking for some apparently quite unrelated tag (for example OBJECT), rather than reporting directly that the open element needs to be closed. The WDG's tool gives user-oriented reports; the W3C's validator used to give diagnostics which newcomers found hard to handle, but it has meantime been much improved, and diagnostics come with useful hints.

The WDG also offer a form for validating snippets of HTML, to help check one's understanding of the rules. This presently comes with an HTML4.01 Transitional DOCTYPE by default, but you can overwrite it according to your requirements.

Where a document contains a DOCTYPE that calls out an inappropriate DTD, there are going to be problems with validation: annoyingly, some would-be "HTML authoring tools" take it upon themselves to insert bogus DOCTYPE declarations into what they produce. If the document contains no DOCTYPE, the W3C validator tries to deduce which DOCTYPE to apply, but this comes with a warning. The WDG's web site also discusses the option of calling out a private, custom DTD via a DOCTYPE that nominates the URL of your DTD.

If presented with a URL that represents a relocation (typically, the perennial problem of a URL representing the default document in a subdirectory but missing its trailing slash), the W3C validator delivers an appropriate report.

HTML Checkers of various kinds

A personal selection only - by no means intended to be exhaustive.

Nick Kew's Accessibility Valet.

There's a curious notion around that making WWW content accessible to those with disabilities is an exclusively specialised technique, that can only be afforded for a tiny subset of pages that are specifically targetted at disabled persons. But meantime, many authors are misguidedly investing time and resources in techniques that are actively hostile to any unusual browsing situation, such as small palmtops, integrated cellphone/browsers (Nokia etc.), character mode browsers, etc., the use of which is by no means confined to a small disabled community: in fact all the evidence points to browsing situations getting more diverse. They seem unaware that in doing this, they are also making life hard for the robot indexers, without whose co-operation their pages will be adrift on the WWW with little hope of rescue.

By taking advantage of HTML's ability to morph your content to a wide range of presentation situations, you bring significant benefits to all of those minority browsing situations, and an accessibility checker can help assess how you're doing in that regard.

WebXACT (includes what was formerly known as Bobby).

A long-standing checker, originally from the Center for Applied Special Technology, this tool aimed to track the accessibility recommendations from the W3C and apply them as best an automated tool could do. It subsequently underwent a number of transmogrifications in different hands, and there is disagreement amongst practitioners about its value as measured against current best practice.

My gut feeling is that Nick Kew has done a better job overall with his Accessibility Valet.

The results from Bobby do need to be interpreted with some care and understanding. It draws attention to many potential problems, but it can't in general determine whether you have addressed those problems or not. In many cases there is no need to take out the items that it has drawn to your attention in its warnings, but rather, to check out the risk factors and make sure that you've handled them in an appropriate fashion.

HTML Tidy at SourceForge, originally from Dave Raggett of the W3C

Not exactly a "checker"; it concentrates more on fixing problems than on reporting them. A very useful tool, with lots of options. Users report that it can even create usable HTML out of the stuff that's generated by MS products!

Others are working on the HTML Tidy code where Dave Raggett left off, so look around for any current developments.

CSS checking

CSS isn't HTML, and it isn't an SGML application, so the precise meaning of "validation" versus "checking" isn't so clearly defined. You'll find both terms in use.

|Up| |RagBag|About the author|