The WWW concept

Caution, this material is from 1994 and is now considerably out-of-date.

Servers, browsers and all that.

The WWW concept aims to comprise all information that is available on the world-wide Internet. To this end, it not only supports its own preferred protocol and formats, but also is open to other protocols and formats. WWW clients typically include methods to access Gopher servers, anonymous FTP servers, TELNET, Usenet news, etc., and other methods can be added as required.

Furthermore, it allows many different kinds of document to be handled, either internally (plain text, and certain image formats, as well as its own preferred format html) or by means of external "helper applications" (images, movies, sounds, PostScript documents etc.), and further types can be added as required. Internally, this works along the same lines as the MIME mail extensions.

The CERN approach has now been enthusiastically adopted in many parts of the community, and WWW client software (or Browsers as they are often known) have come from such diverse places as NCSA (Supercomputer Applications) and the Cornell Legal Information Institute.

For the moment, NCSA seems to offer the "best" windowing browsers (NCSA Mosaic: for X, for MS-Windows, and for the Mac), although other browsers have their attractive features. There are also several WWW server packages on offer, some of which can act as Gopher servers at the same time. It would be pointless to try to give you details, the situation is changing too rapidly. The best thing to do is to start with any browser that you can get your hands on for your own particular platform, and start browsing. If you are really stuck, you can make a TELNET call to (no userid or password required) and start reading there: it gives you a basic line-mode browser into the information, which is better than nothing, and should at least get you as far as locating suitable browser software for your own situation.

Once you have a browser, you will probably want to study the material offered by CERN that was already mentioned above, in particular the Illustrated Talk (On-line Seminar).

In normal usage, the browser would have access to the Internet, and network links could be followed wherever they might lead. Each piece of data is described by its URL (Uniform Resource Locator), which contains an access method, a host name, and a local name (for example a file name).

We may note a number of problems with this ambitious approach.

Data may move...
If I see an interesting piece of data, I may decide to put a pointer to it ("link" in WWW terminology) into one of my own documents. Later, the owner may decide to move the data, unaware that it is being pointed to from elsewhere.
Copyright issues...
The copyright laws, originally conceived for documents on paper, produce some strange consequences on a network, and especially on an international network. Some documents can be viewed, but cannot legally be copied. It seems you could not store them to disk, or print them off, you could only view them. Impossible to police! Could they legally be cached?
Distribution limitations...
The links that are included in documents at a distant site may include both items that can be freely viewed, and items that, for licensing or other reasons, are not allowed to be distributed off their site. The server will know where you are, by checking your IP calling address. It is frustrating to be offered an item, and then be refused access to it. If it were just a matter of building menus (as in Gopher), it would not be difficult to build one menu for "outsiders", and a different one for "insiders". With WWW, on the other hand, experienced authors are accustomed to scattering "links" throughout a piece of narrative, as you have seen on the CERN pages for example. It would wreck the flow of explanation if every link had to be accompanied by caveats as to who was allowed to access it.
Network bandwidth...
WWW does not routinely advertise the size of documents which it offers. Attempting to access a huge document (e.g a movie consisting of several megabytes) from the other side of the world is frustrating (as well as being an awful waste of the network, if it was just idle curiosity). One is well-advised to glance at the URL beforehand (most clients have this facility) to see where the data will be fetched from, if there is any likelihood of the file being embarassingly large. As I said, the protocol does not advertise the size of documents, but a considerate document provider will include a warning, when offering large documents.

We have already seen some examples of in-line graphics. Not all browsers support this, and there may be limitations on the graphics formats supported in-line. Here is an in-line image, in colour for a change, and what's more it has something to do with physics.

The same thing can be called up out of line, as you may see here. This particular browser supports in-line graphics in GIF format, for example, which is what you just saw. Here, on the other hand, is a link to an image in a format (JPEG) that requires an external "helper application", and could not be displayed in-line with this particular browser.

With a full range of helper applications, you can have sounds and movies, as well as still images. These applications will probably be familiar to you from demonstrations, but (CDs excepted?) there is a limit to the amount of such data you would want to keep on your own PC: this way, you can get the data 'seamlessly' from a shared server somewhere. To take just one example, up to date weather satellite images are offered at a JANET site. Of course, even WWW cannot work magic, and retrieving megabyte image files from far-distant sites is no easier with WWW than with any other scheme!

Various text formats are also displayable - not just html (WWW's own markup format) and plain text, but also maybe RTF and various proprietary formats. Viewing of PostScript files is another possibility. This means that information providers are not being asked to re-format their existing data in order to make it available to WWW users: much existing information can be offered in the form in which it happens to be available already.

A link in a WWW document can also specify a TELNET call. WWW browsers know how to access a Usenet News server (NNTP-server), which is sometimes useful, but WWW browsers lack the comprehensive range of facilities that are offered by a fully-fledged Usenet News client package.

Writing for WWW

As may have been painfully obvious during this talk, I am only a beginner at this, so I will just mention a few points that I think are undeniable.

If you are composing original material for WWW, then the aim should be to produce html, by one means or another. Html is a mark-up language, in which a number of principles should be followed. (The original html has quite a limited number of constructs, which can soon be learned. A richer language is being developed, and some of its constructs are already supported by some browsers. Browsers are supposed to ignore constructs that they do not understand. They generally also ignore syntax errors, although as an author you would need to bear in mind that different browsers may display the affected material in quite different ways.)

Most of the constructs in html specify the function of a piece of text, e.g whether it is a paragraph of running text, a third-level heading, an unnumbered list, a quotation, a "link", etc. It is considered bad form for an author to try to control the cosmetic appearance of the document. Unlike, say, TeX, there is no facility in html for the author to select a font and size, or to specify page size or margins. In this way, the browser can be left to format the document according to the tastes of the reader - for example using large fonts for a sight-impaired user or for projection, while conveying the same information with normal sized fonts for most users. This is quite a different philosophy from that found in desktop publishing, for example. However, the author must still have some control over the layout, for example when presenting a specimen of programming code or output listing. Html allows pre-formatted texts to be presented in a fixed-width font, without further interference.

Although there are numerous tools offered for the preparation of html, and for verifying its syntactic correctness. there is no such thing as a WYSIWYG editor for producing html. Certainly, you can preview what you are producing, and see it displayed in ONE possible embodiment, but you can have no idea what browser your various users will be using - whether they have colour or black and white screens - graphics or only characters - nor have you any idea what size of screen they will be using.

When writing html at the keyboard, it is convenient to run the editor and a WWW browser in separate windows, and from time to time one Saves the edited document and loads it into the browser to check it. One must however always be aware that browsers are designed to be tolerant, and may disregard errors that could upset a different browser. Professional authors would probably want to run their final documents through a strict syntax checking program. On top of that, as I already said, one cannot and must not rely on the detailed cosmetics of what one browser displays.