Day 2

Chapter 3

Begin with the Basics


CONTENTS


After finishing up yesterday's discussion, with lots of text to read and concepts to digest, you're probably wondering when you're actually going to get to write a Web page. That is, after all, why you bought the book. Welcome to Day 2! Today you'll get to create Web pages, learn about HTML, the language for writing Web pages, and learn about the following things:

What HTML Is…and What It Isn't

There's just one more thing to note before you dive into actually writing Web pages: you should know what HTML is, what it can do, and most importantly what it can't do.

HTML stands for HyperText Markup Language. HTML is based on SGML (Standard Generalized Markup Language), a much bigger document-processing system. To write HTML pages, you won't need to know a whole lot about SGML, but it does help to know that one of the main features of SGML is that it describes the general structure of the content inside documents, not that content's actual appearance on the page or on the screen. This will be a bit of a foreign concept to you if you're used to working with WYSIWYG (What You See is What You Get) editors, so let's go over this slowly.

HTML Describes the Structure of a Page

HTML, by virtue of its SGML heritage, is a language for describing the structure of a document, not its actual presentation. The idea here is that most documents have common elements-for example, titles, paragraphs, or lists. Before you start writing, therefore, you can identify and define the set of elements in that document and give them appropriate names (see Figure 3.1).

Figure 3.1 : Document elements.

If you've worked with word processing programs that use style sheets (such as Microsoft Word) or paragraph catalogs (such as FrameMaker), then you've done something similar; each section of text conforms to one of a set of styles that are pre-defined before you start working.

HTML defines a set of common styles for Web pages: headings, paragraphs, lists, and tables. It also defines character styles such as boldface and code examples. Each element has a name and is contained in what's called a tag. When you write a Web page in HTML, you label the different elements of your page with these tags that say "this is a heading" or "this is a list item." It's like if you were working for a newspaper or a magazine where you do the writing but someone else does the layout; you might explain to the layout person that this line is the title, this line is a figure caption, or this line is a heading. It's the same way with HTML.

HTML Does Not Describe Page Layout

When you're working with a word processor or page layout program, styles are not just named elements of a page-they also include formatting information such as the font size and style, indentation, underlining, and so on. So when you write some text that's supposed to be a heading, you can apply the Heading style to it, and the program automatically formats that paragraph for you in the correct style.

HTML doesn't go this far. For the most part, HTML doesn't say anything about how a page looks when it's viewed. All HTML tags indicate is that an element is a heading or a list-they say nothing about how that heading or list is to be formatted. So, as with the magazine example and the layout person who formats your article, it's the layout person's job to decide how big the heading should be and what font it should be in-the only thing you have to worry about is marking which section is supposed to be a heading.

Web browsers, in addition to providing the networking functions to retrieve pages from the Web, double as HTML formatters. When you read an HTML page into a browser such as Netscape or Lynx, the browser reads, or parses, the HTML tags and formats the text and images on the screen. The browser has mappings between the names of page elements and actual styles on the screen; for example, headings might be in a larger font than the text on the rest of the page. The browser also wraps all the text so that it fits into the current width of the window.

Different browsers, running on different platforms, may have different style mappings for each page element. Some browsers may use different font styles than others. So, for example, one browser might display italics as italics, whereas another might use reverse text or underlining on systems that don't have italic fonts. Or it might put a heading in all capital letters instead of a larger font. What this means to you as a Web page designer is that the pages you create using HTML may look radically different from system to system and from browser to browser. The actual information and links inside those pages will still be there, but the appearance on the screen will change. You can design a Web page so that it looks perfect on your computer system, but when someone else reads it on a different system, it may look entirely different (and it may very well be entirely unreadable).

Why It Works This Way

If you're used to writing and designing on paper, this concept may seem almost perverse. No control over the layout of a page? The whole design can vary depending on where the page is viewed? This is awful! Why on earth would it work like this?

Remember in Chapter 1 when I mentioned that one of the cool things about the Web is that it is cross-platform and that Web pages can be viewed on any computer system, on any size screen, with any graphics display? If the final goal of Web publishing is for your pages to be readable by anyone in the world, you can't count on your readers having the same computer system, the same size screen, the same number of colors, or the same fonts as you. The Web takes into account all these differences and allows all browsers and all computer systems to be on equal ground.

The Web, as a design medium, is not a new form of paper. The Web is an entirely new medium, with new constraints and goals that are very different from working with paper. The number-one rule of Web page design, as I'll keep harping on throughout this book, is this:

Throughout this book I'm going to be showing you examples of HTML code and what they look like when displayed. In many examples, I'll give you a comparison of how a snippet of code looks in two very different browsers: Netscape, probably the most popular browser on the market today, and Lynx, a browser that works on text-only terminals which is less popular but still is in common use. Through these examples, you'll get an idea for how different the same page can look from browser to browser.

HTML Is a Markup Language

HTML is a markup language. Writing in a markup language means that you start with the text of your page and add special tags around words and paragraphs. If you've ever worked with other markup languages such as troff or LaTeX, or even older DOS-based word processors where you put in special codes for things such as "turn on boldface," this won't seem all that unusual.

The tags indicate the different parts of the page and produce different effects in the browser. You'll learn more about tags and how they're used in the next section.

HTML has a defined set of tags you can use. You can't make up your own tags to create new appearances or features. And, just to make sure things are really confusing, different browsers support different sets of tags.

The base set of HTML tags, the lowest common denominator, is referred to as HTML 2.0. HTML 2.0 is the current standard for HTML (there's a written specification for it that is developed and maintained by the W3 Consortium) and the set of tags that all browsers must support. For the next couple of chapters, you'll learn primarily about HTML 2.0 tags that you can use anywhere.

HTML 3.0 is considered the "next generation" of HTML and a catch-all for lots of new features that give you lots of flexibility over HTML 2.0 for how you design your pages. When a browser claims to support HTML 3.0, they usually mean it supports some HTML 3.0 features such as tables and backgrounds. HTML 3.0 is still in development, and there are many HTML 3.0 features that are not supported by any browsers yet. Like HTML 2.0, the HTML 3.0 standard is maintained by the W3 Consortium.

Note
To be exactly correct about it, HTML 3.0 as a uniform standard no longer exists; the existing draft of the proposal has expired, and work on HTML 3.0 has broken into several smaller sub-groups, each handling a different aspect of HTML 3.0. However, the concept of HTML 3.0 is still so strong in the minds of Web designers and the industry that any advanced HTML standards work is usually referred to as part of the proposed HTML 3.0 standard. If you're interested in how HTML development is working, and just exactly what's going on at the W3 Consortium, check out the pages for HTML at the Consortium's site at http://www.w3.org/pub/WWW/MarkUp/.

In addition to the tags defined by HTML 2.0 and 3.0, there are also browser-specific extensions to HTML that are implemented by an individual browser company and are proposed for inclusion in HTML 3.0. Netscape and Microsoft are particularly guilty of this, and they offer many new features unique to their browsers. However, many other browsers may also support other browser's extensions, to varying degrees; for example, NCSA Mosaic supports many of the Netscape extensions but none of Internet Explorer's.

Confused yet? You're not alone. Even Web designers with years of experience and hundreds of pages under their belts have to struggle with the problem of which set of tags to choose in order to strike a balance between wide support for a design (using HTML 2.0) or having more flexibility in layout but less consistency across browsers (HTML 3.0 or the browser extensions). Keeping track of all this can be really confusing. Throughout this book, as I introduce each tag, I'll let you know which version of HTML that tag belongs to, how widely supported it is, and how to use it to best effect in a wide variety of browsers. And later in this book, you'll get hints on how to deal with the different HTML tags to make sure that your pages are readable and still look good in all kinds of browsers.

But even with all these different tags to choose from, HTML is an especially small and simple-to-learn markup language-far smaller than other languages such as PostScript or troff on UNIX. Those languages are so large and complex that it often takes ages to learn enough to write even simple documents. With HTML, you can get started right away.

And with that note, let's get started.

What HTML Files Look Like

Pages written in HTML are plain text files (ASCII), which means they contain no platform- or program specific information-they can be read by any editor that supports text (which should be just about any editor-more about this later). HTML files contain two things:

Most HTML tags look something like this:

<TheTagName> affected text </TheTagName>

The tag name itself (here, TheTagName) is enclosed in brackets (<>).

HTML tags generally have a beginning and an ending tag, surrounding the text that they affect. The beginning tag "turns on" a feature (such as headings, bold, and so on), and the ending tag turns it off. Closing tags have the tag name preceded by a slash (/).

New Term
HTML tags are the things inside brackets (<>) that indicate features or elements of a page.

Not all HTML tags have a beginning and an end. Some tags are only one-sided, and still other tags are "containers" that hold extra information and text inside the brackets. You'll learn about these tags as the book progresses.

All HTML tags are case-insensitive; that is, you can specify them in uppercase, lowercase, or in any mixture. So, <HTML> is the same as <html> is the same as <HtMl>. I like to put my tags in all caps (<HTML>) so I can pick them out from the text better. That's how I show them in the examples in this book.

Exercise 3.1: Take a look at HTML sources.

Before you actually start writing your own HTML pages, it helps to get a feel for what an HTML page looks like. Luckily, there's plenty of source material out there for you to look at-every page that comes over the wire to your browser is in HTML format. (You almost never see the codes in your browser; all you see is the final result.)

Most Web browsers have a way of letting you see the HTML source of a Web page. You may have a menu item or a button for View Document Source or View HTML. In Lynx, the \ (backslash) command toggles between source view and formatted view.

Tip
Some browsers do not have the capability to directly view the source of a Web page, but do allow you to save the current page as a file to your local disk. Under a dialog box for saving the file, there may be a menu of formats-for example, Text, PostScript, or HTML. You can save the current page as HTML and then open that file in a text editor or word processor to see the HTML source.

Try going to a typical home page and then viewing the source for that page. For example, Figure 3.2 shows the home page for Alta Vista, which is a popular search page at http://www.altavista.digital.com/:

Figure 3.2 : Alta Vista home page.

The HTML source of that page looks something like Figure 3.3.

Figure 3.3 : Some HTML source.

Try viewing the source of your own favorite Web pages. You should start seeing some similarities in the way pages are organized and get a feel for the kinds of tags that HTML uses. You can learn a lot about HTML by comparing the text on the screen with the source for that text.

Exercise 3.2: Create an HTML page.

You've seen what HTML looks like-now it's your turn to create your own Web page. Let's start with a really simple example so you can get a basic feel for HTML.

To get started writing HTML, you're not going to need a Web server, a Web provider, or even a connection to the Web itself. All you really need is something to create your HTML files, and at least one browser to view them. You can write, link, and test whole suites of Web pages without even touching a network. In fact, that's what we're going to do for the majority of this book-I'll talk later about publishing everything on the Web so other people can see.

First, you'll need a text editor. A text editor is a program that saves files in ASCII format. ASCII format is just plain text, with no font formatting or special characters. On UNIX, vi, emacs, and pico are all text editors. On Windows, Notepad, Microsoft Write, and DOS edit are good basic text editors (and free with your system!); or, a shareware editor such as WED or WinEdit will work as well. On the Macintosh, you can use the SimpleText application that came with your system, or a more powerful text editor such as BBedit or Alpha (both of which are shareware).

If all you have is a word processor such as Microsoft Word, don't panic. You can still write pages in word processors just as you would in text editors, although it'll be more complicated to do so. When you use the Save or Save As command, there will be a menu of formats you can use to save the file. One of those should be "Text Only," "Text Only with Line Breaks," or "DOS Text." All these options will save your file as plain ASCII text, just as if you were using a text editor. For HTML files, if you have a choice between DOS Text and just Text, use DOS Text, and use the Line Breaks option if you have it.

Note
If you do choose to use a word processor for your HTML development, be very careful. Many recent word processors are including HTML modes or mechanisms for creating HTML code. The word processor may decide to take over your HTML coding for you, or mysteriously put you into that mode without telling you first. This may produce unusual results or files that simply don't behave as you expect. If you find that you're running into trouble with a word processor, try using a text editor and see if that helps.

What about the plethora of free and commercial HTML editors that claim to help you write HTML more easily? Most of them are actually simple text editors with some buttons that stick the tags in for you. If you've got one of those, go ahead and use it. If you've got a fancier editor that claims to hide all the HTML for you, put that one aside for the next couple of days and try using a plain-text editor just for a little while. I'll talk more about HTML editors after this example.

Open up that text editor, and type the following code. You don't have to understand what any of this means at this point. You'll learn about it later in this chapter. This is just a simple example to get you started:

<HTML><HEAD>
<TITLE>My Sample HTML page</TITLE></HEAD>
<BODY>
<H1>This is an HTML Page</H1>
</BODY></HTML>
Note
Many of the examples from this book, including this one, are included on the CD-ROM. For this example, its a good idea to type it in to get a feel for it, but for future examples you might want to use the online versions to prevent having to retype everything.

After you create your HTML file, save it to disk. Remember that if you're using a word processor, do Save As and make sure you're saving it as text only. When you pick a name for the file, there are two rules to follow:

Exercise 3.3: View the result.

Now that you have an HTML file, start up your Web browser. You don't have to be connected to the network since you're not going to be opening pages at any other site. Your browser or network connection software may complain about the lack of a network connection, but usually it will eventually give up and let you use it anyway.

Tip
If you're using a Web browser from Windows, using that browser without a network is unfortunately more complicated than on other systems. Most Windows browsers are unable to run without a network, preventing you from looking at your local files without running up online charges. Try starting up your browser while not online to see if this is the case. If your browser has this problem, there are several things you can try. Depending on your network software, you may be able to start your network package (Trumpet or Chameleon), but not actually dial the network. This often is sufficient for many browsers.
If this doesn't work, you'll have to replace the file winsock.dll in your windows directory with what's called a "null sock"-a special file that makes your system think it's on a network when it's not. The CD with this book contains a nullsock.dll file you can use with your Windows browser. If you use Netscape, you'll want to use mozock.dll instead.
First, put your original winsock.dll in a safe place; you'll need to put everything back the way it was to get back onto the Web. Next, rename the null sock file to winsock.dll and copy it to your Windows directory. With the fake winsock file installed, you should be able to use your Windows browser without a network (it may still give you errors, but it should work).

Once your browser is running, look for a menu item or button labeled Open Local, Open File, or maybe just Open. It's a menu item that will let you browse your local disk. (If you're using Lynx, cd to the directory that contains your HTML file and use the command lynx myfile.html to start lynx.) The Open File command (or its equivalent) tells the browser to read an HTML file from your disk, parse it, and display it, just as if it were a page on the Web. Using your browser and the Open Local command, you can write and test your HTML files on your computer in the privacy of your own home.

If you don't see something like what's in the picture (for example, if parts are missing or if everything looks like a heading), go back into your text editor and compare your file to the example. Make sure that all your tags have closing tags and that all your < characters are matched by > characters. You don't have to quit your browser to do this; just fix the file and save it again under the same name.

Then go back to your browser. There should be a menu item or button called Reload. (In Lynx, it's Control+R.) The browser will read the new version of your file, and voilà, you can edit and preview and edit and preview until you get it right.

If you're getting the actual HTML text repeated in your browser rather than what's shown in Figure 3.4, make sure your HTML file has a .html or .htm extension. This file extension is what tells your browser that this is an HTML file. The extension is important.

Figure 3.4 : The sample HTML file.

If things are going really wrong-if you're getting a blank screen or you're getting some really strange characters, something is wrong with your original file. If you've been using a word processor to edit your files, try opening your saved HTML file in a plain-text editor (again Notepad or SimpleText will work just fine). If the text editor can't read it or if the result is garbled, you haven't saved the original file in the right format. Go back into your original editor and try saving it as text only again, and then try it again in your browser until you get it right.

A Note About Formatting

When an HTML page is parsed by a browser, any formatting you may have done by hand-that is, any extra spaces, tabs, returns, and so on-are all ignored. The only thing that formats an HTML page is an HTML tag. If you spend hours carefully editing a plain text file to have nicely formatted paragraphs and columns of numbers, but you don't include any tags, when you read the page into an HTML browser, all the text will flow into one paragraph. All your work will have been in vain.

Note
There's one exception to this rule: a tag called <PRE>. You'll learn about this tag tomorrow in Chapter 5, "More Text Formatting with HTML."

The advantage of having all white space (spaces, tabs, returns) ignored is that you can put your tags wherever you want.

The following examples all produce the same output. (Try it!)

<H1>If music be the food of love, play on.</H1>

<H1>
If music be the food of love, play on. </H1>

<H1>
If music be the food of love, play on.              </H1>

<H1>     If    music     be      the     food    of    love,
play     on. </H1>

Programs To Help You Write HTML

You may be thinking that all this tag stuff is a real pain, especially if you didn't get that small example right the first time. (Don't fret about it; I didn't get that example right the first time, and I created it.) You have to remember all the tags. And you have to type them in right and close each one. What a hassle.

Many freeware and shareware programs are available for editing HTML files. Most of these programs are essentially text editors with extra menu items or buttons that insert the appropriate HTML tags into your text. HTML-based text editors are particularly nice for two reasons: you don't have to remember all the tags, and you don't have to take the time to type them all.

I'll discuss some of the available HTML-based editors in Chapter 6, "HTML Assistants: Editors and Converters." For now, if you have a simple HTML editor, feel free to use it for the examples in this book. If all you have is a text editor, no problem; it just means you'll have to do a little more typing.

What about WYSIWYG editors? There are lots of editors on the market that purport to be WYSIWYG. The problem is, as you learned earlier in this chapter, that there's really no such thing as WYSIWYG when you're dealing with HTML because WYG can vary wildly based on the browser that someone is using to read your page. With that said, as long as you're aware that the result of working in those editors may vary, WYSIWYG editors can be a quick way to create simple HTML files. However, for professional Web development and for using many of the very advanced features, WYSIWYG editors usually fall short, and you'll need to go "under the hood" to play with the HTML code anyhow. Even if you intend to use a WYSIWYG editor for the bulk of your HTML work, I recommend you bear with me for the next couple of days and try these examples in text editors so you get a feel for what HTML really is before you decide to move on to an editor that hides the tags.

In addition to the HTML editors, there are also converters, which take files from many popular word-processing programs and convert them to HTML. With a simple set of templates, you can write your pages entirely in your favorite program then convert the result when you're done.

In many cases, converters can be extremely useful, particularly for putting existing documents on the Web as fast as possible. However, converters suffer from many of the same problems as WYSIWYG editors: the result can vary from browser to browser, and many newer or advanced features aren't available in the converters. Also, most converter programs are fairly limited, not necessarily by their own features, but mostly by the limitations in HTML itself. No amount of fancy converting is going to make HTML do things that it can't yet do. If a particular capability doesn't exist in HTML, there's nothing the converter can do to solve that (and it may end up doing strange things to your HTML files, causing you more work than if you just did all the formatting yourself).

Structuring Your HTML

HTML defines three tags that are used to describe the page's overall structure and provide some simple "header" information. These three tags identify your page to browsers or HTML tools. They also provide simple information about the page (such as its title or its author) before loading the entire thing. The page structure tags don't affect what the page looks like when it's displayed; they're only there to help tools that interpret or filter HTML files.

According to the strict HTML 2.0 definition, these tags are optional. If your page does not contain them, browsers will usually be able to read it anyway. However, it is possible that these page structure tags might become required elements in the future. It's also possible that tools may come along that need them. If you get into the habit of including the page structure tags now, you won't have to worry about updating all your files later.

<HTML>

The first page structure tag in every HTML page is the <HTML> tag. It indicates that the content of this file is in the HTML language.

All the text and HTML commands in your HTML page should go within the beginning and ending HTML tags, like this:

<HTML>
...your page...
</HTML>

<HEAD>

The <HEAD> tag specifies that the lines within the beginning and ending points of the tag are the prologue to the rest of the file. There generally are only a few tags that go into the <HEAD> portion of the page (most notably, the page title, described later). You should never put any of the text of your page into the header.

Here's a typical example of how you would properly use the <HEAD> tag (you'll learn about </TITLE> later):

<HTML>
<HEAD>
<TITLE>This is the Title.</TITLE>
</HEAD>
....
</HTML>

<BODY>

The remainder of your HTML page, including all the text and other content (links, pictures, and so on) is enclosed within a <BODY> tag. In combination with the <HTML> and <HEAD> tags, this looks like:

<HTML>
<HEAD>
<TITLE>This is the Title. It will be explained later on</TITLE>
</HEAD>
<BODY>
....
</BODY>
</HTML>

You may notice here that each HTML tag is nested; that is, both <BODY> and </BODY> tags go inside both <HTML> tags; same with both <HEAD> tags. All HTML tags work like this, forming individual nested sections of text. You should be careful never to overlap tags (that is, to do something like this: <HTML><HEAD><BODY></HEAD></BODY></HTML>); make sure whenever you close an HTML tag that you're closing the most recently opened tag (you'll learn more about this as we go on).

The Title

Each HTML page needs a title to indicate what the page describes. The title is used by your browser's bookmarks or hotlist program, and also by other programs that catalog Web pages. To give a page a title, use the <TITLE> tag.

New Term
The title indicates what your Web page is about and is used to refer to that page in bookmark or hotlist entries.

<TITLE> tags always go inside the page header (the <HEAD> tags) and describe the contents of the page, like this:

<HTML>
<HEAD>
<TITLE>The Lion, The Witch, and the Wardrobe</TITLE>
</HEAD>
<BODY>
....
</BODY>
</HTML>

You can have only one title in the page, and that title can contain only plain text; that is, there shouldn't be any other tags inside the title.

When you pick a title, try to pick one that is both short and descriptive of the content on the page. Additionally, your title should be relevant out of context. If someone browsing on the Web followed a random link and ended up on this page, or if they found your title in a friend's browser history list, would they have any idea what this page is about? You may not intend the page to be used independently of the pages you specifically linked to it, but because anyone can link to any page at any time, be prepared for that consequence and pick a helpful title.

Also, because many browsers put the title in the title bar of the window, you may have a limited number of words available. (Although the text within the <TITLE> tag can be of any length, it may be cut off by the browser when it's displayed.) Here are some other examples of good titles:

<TITLE>Poisonous Plants of North America</TITLE>
<TITLE>Image Editing: A Tutorial</TITLE>
<TITLE>Upcoming Cemetery Tours, Summer 1995</TITLE>
<TITLE>Installing The Software: Opening the CD Case</TITLE>
<TITLE>Laura Lemay's Awesome Home Page</TITLE>

And some not-so-good titles:

<TITLE>Part Two</TITLE>
<TITLE>An Example</TITLE>
<TITLE>Nigel Franklin Hobbes</TITLE>
<TITLE>Minutes of the Second Meeting of the Fourth Conference of the
Committee for the Preservation of English Roses, Day Four, After Lunch</TITLE>

The following examples show how titles look in both Netscape (Figure 3.5) and Lynx
(Figure 3.6).

<TITLE>Poisonous Plants of North America</TITLE>

Figure 3.5: The output in Netscape.

Figure 3.6: The output in Lynx.

Headings

Headings are used to divide sections of text, just like this book is divided. ("Headings," above, is a heading.) HTML defines six levels of headings. Heading tags look like this:

<H1>Installing Your Safetee Lock</H1>

The numbers indicate heading levels (H1 through H6). The headings, when they're displayed, are not numbered. They are displayed either in bigger or bolder text, or are centered or underlined, or are capitalized-something that makes them stand out from regular text.

Think of the headings as items in an outline. If the text you're writing has a structure, use the headings to indicate that structure, as shown in the next code lines. (Notice that I've indented the headings in this example to show the hierarchy better. They don't have to be indented in your page, and, in fact, the indenting will be ignored by the browser.)

<H1>Engine Tune-Up</H1>
   <H2>Change The Oil</H2>
   <H2>Adjust the Valves</H2>
   <H2>Change the Spark Plugs</H2>
      <H3>Remove the Old Plugs</H3>
      <H3>Prepare the New Plugs</H3>
         <H4>Remove the Guards</H4>
         <H4>Check the Gap</H4>
         <H4>Apply Anti-Seize Lubricant</H4>
         <H4>Install the Plugs</H4>
   <H2>Adjust the Timing</H2>

Unlike titles, headings can be any length, including many lines of text (although because headings are emphasized, having many lines of emphasized text may be tiring to read).

It's a common practice to use a first-level heading at the top of your page to either duplicate the title (which is usually displayed elsewhere), or to provide a shorter or less contextual form of the title. For example, if you had a page that showed several examples of folding bedsheets, part of a long presentation on how to fold bedsheets, the title might look something like this:

<TITLE>How to Fold Sheets: Some Examples</TITLE>

The top-most heading, however, might just say:

<H1>Examples</H1>

Don't use headings to display text in boldface type, or to make certain parts of your page stand out more. Although it may look cool on your browser, you don't know what it'll look like when other people use their browsers to read your page. Other browsers may number headings, or format them in a manner that you don't expect. Also, tools to create searchable indexes of Web pages may extract your headings to indicate the important parts of a page. By using headings for something other than an actual heading, you may be foiling those search programs and creating strange results.

The following examples show headings and how they appear in Netscape (Figure 3.7) and Lynx (Figure 3.8):

<H1>Engine Tune-Up</H1>
    <H2>Change The Oil</H2>
    <H2>Change the Spark Plugs</H2>
        <H3>Prepare the New Plugs</H3>
            <H4>Remove the Guards</H4>
            <H4>Check the Gap</H4>

Figure 3.7: The output in Netscape.

Figure 3.8: The output in Lynx.

Paragraphs

Now that you have a page title and several headings, let's add some ordinary paragraphs to the page.

The first version of HTML specified the <P> tag as a one-sided tag. There was no cor-responding </P>, and the <P> tag was used to indicate the end of a paragraph (a paragraph break), not the beginning. So paragraphs in the first version of HTML looked like this:

The blue sweater was reluctant to be worn, and wrestled with her as
she attempted to put it on. The collar was too small, and would not
fit over her head, and the arm holes moved seemingly randomly away
from her searching hands.<P>
Exasperated, she took off the sweater and flung it on the floor.
Then she vindictively stomped on it in revenge for its recalcitrant
behavior.<P>

Most browsers that were created early on in the history of the Web assume that paragraphs will be formatted this way. When they come across a <P> tag, these browsers start a new line and add some extra vertical space between the line they just ended and the one that they just began.

In the HTML 2.0 and the proposed HTML 3.0 specifications, and as supported by most current browsers, the paragraph tag has been revised. In these versions of HTML, the paragraph tags are two-sided (<P>...</P>), but <P> indicates the beginning of the paragraph. Also, the closing tag (</P>) is optional. So the sweater story would look like this in the newer versions of HTML:

<P>The blue sweater was reluctant to be worn, and wrestled with her as
she attempted to put it on. The collar was too small, and would not
fit over her head, and the arm holes moved seemingly randomly away
from her searching hands.</P>
<P>Exasperated, she took off the sweater and flung it on the floor.
Then she vindictively stomped on it in revenge for its recalcitrant
behavior.</P>

It's a good idea to get into the habit of using <P> at the start of a paragraph; this will become important when you learn how to align text left, right, or centered. Older browsers will accept this form of paragraphs just fine. Whether you use the </P> tag or not is up to you; it may help you remember where a paragraph ends, or it may seem unnecessary. I'll be using the closing </P> throughout this book.

Some people like to use extra <P> tags between paragraphs to spread out the text on the page. Once again, the cardinal reminder: Design for content, not for appearance. Someone with a text-based browser or a small screen is not going to care much about the extra space you so carefully put in, and some browsers may even collapse multiple <P> tags into one, erasing all your careful formatting.

The following example shows a sample paragraph and how it appears in Netscape (Figure 3.9) and Lynx (Figure 3.10):

<P>The sweater lay quietly on the floor, seething from its ill
treatment. It wasn't its fault that it didn't fit right. It hadn't
wanted to be purchased by this ill-mannered woman.</P>

Figure 3.9:The output in Netscape.

Figure 3.10: The output in Lynx.

Lists, Lists, and More Lists

In addition to headings and paragraphs, probably the most common HTML element you'll be using is the list. After this section, you'll not only know how to create a list in HTML, but how to create five different kinds of lists-a list for every occasion!

HTML defines five kinds of lists:

List Tags

All the list tags have common elements:

Although the tags and the list items can appear in any arrangement in your HTML code, I prefer to arrange the HTML for producing lists so that the list tags are on their own lines and each new item starts on a new line. This makes it easy to pick out the whole list as well as the individual elements. In other words, I find an arrangement like this:

<P>Dante's Divine Comedy consists of three books:</P>
<UL>
<LI>The Inferno
<LI>The Purgatorio
<LI>The Paradiso
</UL>

easier to read than an arrangement like this, even though both result in the same output in the browser:

<P>Dante's Divine Comedy consists of three books:</P>
<UL><LI>The Inferno<LI>The Purgatorio<LI>The Paradiso</UL>

Numbered Lists

Numbered lists are surrounded by the <OL>...</OL> tags (OL stands for Ordered List), and each item within the list begins with the <LI> (List Item) tag.

The <LI> tag is one-sided; you do not have to specify the closing tag. The existence of the next <LI> (or the closing </OL> tag) indicates the end of that item in the list.

When the browser displays an ordered list, it numbers (and often indents) each of the elements sequentially. You do not have to do the numbering yourself, and if you add or delete items, the browser will renumber them the next time the page is loaded.

New Term
Ordered lists are lists in which each item is numbered.

So, for example, here's an ordered list of steps (a recipe) for creating nachos, with each list item a step in the set of procedures:

<P>Laura's Awesome Nachos</P>
<OL>
<LI>Warm up Refried beans with chili powder and cumin.
<LI>Glop refried beans on tortilla chips.
<LI>Grate equal parts Jack and Cheddar cheese, spread on chips.
<LI>Chop one small onion finely, spread on chips.
<LI>Heat under broiler 2 minutes.
<LI>Add guacamole, sour cream, fresh chopped tomatoes, and cilantro.
<LI>Drizzle with hot green salsa.
<LI>Broil another 1 minute.
<LI>Nosh.
</OL>

Use numbered lists only when you want to indicate that the elements are ordered; that is, that they must appear or occur in that specific order. Ordered lists are good for steps to follow or instructions to the reader. If you just want to indicate that something has some number of elements that can appear in any order, use an unordered list instead.

The following input and output examples show a simple ordered list and how it appears in Netscape (Figure 3.11) and Lynx (Figure 3.2):

<P>To summon the demon, use the following steps:</P>
<OL>1
<LI>Draw the pentagram
<LI>Sacrifice the goat
<LI>Chant the incantation
</OL>

Figure 3.11: The output in Netscape.

Figure 3.12: The output in Lynx.

Unordered Lists

Unordered lists are lists in which the elements can appear in any order. Unordered lists look just like ordered lists in HTML except that the list is indicated using <UL>...</UL> tags instead of OL. The elements of the list are separated by <LI>, just as with ordered lists. For example:

<P>Lists in HTML</P>
<UL>
<LI>Ordered Lists
<LI>Unordered Lists
<LI>Menus
<LI>Directories
<LI>Glossary Lists
</UL>

Browsers usually format unordered lists by inserting bullets or some other symbolic marker; Lynx inserts an asterisk (*).

New Term
Unordered lists are lists in which the items are bulleted or marked with some other symbol.

The following input and output example shows an unordered list and how it appears in Netscape (Figure 3.13) and Lynx (Figure 3.14):

<P>The three Erinyes, or Furies, were:</P>
<UL>
<LI>Tisiphone
<LI>Megaera
<LI>Alecto
</UL>

Figure 3.13: The output in Netscape.

Figure 3.14: The output in Lynx.

Glossary Lists

Glossary lists, sometimes called definition lists, are slightly different from other lists. Each list item in a glossary list has two parts:

Each part of the glossary list has its own tag: <DT> for the term ("definition term"), and <DD> for its definition ("definition definition"). <DT> and <DD> are both one-sided tags, and they usually occur in pairs, although most browsers can handle single terms or definitions. The entire glossary list is indicated by the tags <DL>...</DL> ("definition list").

New Term
Glossary lists are lists in which each list item has two parts: a term and a definition. Glossary lists are sometimes called definition lists.

Here's a glossary list example with a set of herbs and descriptions of how they grow:

<DL>
<DT>Basil<DD>Annual. Can grow four feet high; the scent of its tiny white
flowers is heavenly
<DT>Oregano<DD>Perennial. Sends out underground runners and is difficult
to get rid of once established.
<DT>Coriander<DD>Annual. Also called cilantro, coriander likes cooler
weather of spring and fall.
</DL>

Glossary lists are usually formatted in browsers with the terms and definitions on separate lines, and the left margins of the definitions are indented.

Glossary lists don't have to be used for terms and definitions, of course. They can be used anywhere that the same sort of list is needed. Here's an example:

<DL>
<DT>Macbeth<DD>I'll go no more. I am afraid to think of
what I have done; look on't again I dare not.
<DT>Lady Macbeth<DD>Infirm of purpose! Give me the daggers.
The sleeping and the dead are as but pictures. 'Tis the eye
if childhood that fears a painted devil. If he do bleed, I'll
gild the faces if the grooms withal, for it must seem their
guilt. (Exit. Knocking within)
<DT>Macbeth<DD>Whence is that knocking? How is't wit me when
every noise apalls me? What hands are here? Ha! They pluck out
mine eyes! Will all Neptune's ocean wash this blood clean from
my hand? No. This my hand will rather the multitudinous seas
incarnadine, making the green one red. (Enter lady Macbeth)
<DT>Lady Macbeth<DD>My hands are of your color, but I shame to
wear a heart so white.
</DL>

HTML also defines a "compact" form of glossary list in which less space is used for the list, perhaps by placing the terms and definitions on the same line and highlighting the term, or by lessening the amount of indent used by the definitions.

Note
Most browsers seem to ignore the COMPACT attribute and format compact glossary lists in the same way that normal glossary lists are formatted.

To use the compact form of the glossary list, use the COMPACT attribute inside the opening <DL> tag, like this:

<DL COMPACT>
<DT>Capellini<DD>Round and very thin (1-2mm)
<DT>Vermicelli<DD>Round and thin (2-3mm)
<DT>Spaghetti<DD>Round and thin, but thicker than vermicelli (3-4mm)
<DT>Linguine<DD>Flat, (5-6mm)
<DT>Fettucini<DD>flat, (8-10mm)
</DL>

This input and output example shows how a glossary list is formatted in Netscape (Figure 3.15) and Lynx (Figure 3.16):

<DL>
<DT>Basil<DD>Annual. Can grow four feet high; the scent
of its tiny white flowers is heavenly.
<DT>Oregano<DD>Perennial. Sends out underground runners
and is difficult to get rid of once established.
<DT>Coriander<DD>Annual. Also called cilantro, coriander
likes cooler weather of spring and fall.
</DL>

Figure 3.15: The output in Netscape.

Figure 3.16: The output in Lynx.

Menu and Directory Lists

Menus are lists of items or short paragraphs with no bullets or numbers or other label-like things. They are similar to simple lists of paragraphs, except that some browsers may indent them or format them in some way differently from normal paragraphs. Menu lists are surrounded by <MENU> and </MENU> tags, and each list item is indicated using <LI>, as shown in this example:

<MENU>
<LI>Go left
<LI>Go right
<LI>Go up
<LI>Go down
</MENU>

Directory lists are for items that are even shorter than menu lists, and are intended to be formatted by browsers horizontally in columns-like doing a directory listing on a UNIX system. As with menu lists, directory lists are surrounded by <DIR> and </DIR>, with <LI> for the individual list items, as shown in this example:

<DIR>
<LI>apples
<LI>oranges
<LI>bananas
</DIR>

New Term
Menu lists are used for short lists of single items. Directory lists are even shorter lists of items such as those you'd find in a UNIX or DOS directory listing.

Note
Although menu and directory lists exist in the HTML 2.0 specification, they are not commonly used in Web pages, and in the proposed HTML 3.0, they no longer exist (there are other available tags that produce the same effect). Considering that most browsers seem to format menus and directories in similar ways to the glossary lists (or as unordered lists), and not in the way they are described in the specification, it is probably best to stick with the other three forms of lists.

The following input and output example shows a menu list and a directory list and how they appear in Netscape (Figure 3.17) and Lynx (Figure 3.18):

<MENU>
<LI>Canto 1: The Dark Wood of Error
<LI>Canto 2: The Descent
<LI>Canto 3: The Vestibule
<LI>Canto 4: Circle One: Limbo
<LI>Canto 5: Circle Two: The Carnal
</MENU>

<DIR>
<LI>files
<LI>applications
<LI>mail
<LI>stuff
<LI>phone_numbers
</DIR>

Figure 3.17: The output in Netscape.

Figure 3.18: The output in Lynx.

Nesting Lists

What happens if you put a list inside another list? This is fine as far as HTML is concerned; just put the entire list structure inside another list as one of its elements. The nested list just becomes another element of the first list, and it is indented from the rest of the list. Lists like this work especially well for menu-like entities in which you want to show hierarchy (for example, in tables-of-contents), or as outlines.

Indenting nested lists in HTML code itself helps show their relationship to the final layout:

<OL>
   <UL>
   <LI>WWW
   <LI>Organization
   <LI>Beginning HTML
   <UL>
      <LI>What HTML is
      <LI>How to Write HTML
      <LI>Doc structure
      <LI>Headings
      <LI>Paragraphs
      <LI>Comments
   </UL>
<LI>Links
<LI>More HTML
</OL>

Many browsers format nested ordered lists and nested unordered lists differently from their enclosing lists. For example, they might use a symbol other than a bullet for a nested list, or number the inner list with letters (a, b, c) instead of numbers. Don't assume that this will be the case, however, and refer back to "section 8, subsection b" in your text, because you cannot determine what the exact formatting will be in the final output.

Here's an input and output example of a nested list and how it appears in Netscape (Figure 3.19) and Lynx (Figure 3.20):

<H1>Peppers</H1>
<UL>
<LI>Bell
<LI>Chile
    <UL>
    <LI>Serrano
    <LI>Jalapeno
    <LI>Habanero
    <LI>Anaheim
    </UL>
<LI>Szechuan
<LI>Cayenne
</UL>

Figure 3.19: The output in Netscape.

Figure 3.20: The output in Lynx.

Comments

You can put comments into HTML pages to describe the page itself or to provide some kind of indication of the status of the page; some source code control programs can put page status into comments, for example. Text in comments is ignored when the HTML file is parsed; comments don't ever show up on screen-that's why they're comments. Comments look like this:

<!-- This is a comment -->

Each line of text should be individually commented, and it's usually a good idea not to include other HTML tags within comments. (Although this practice isn't strictly illegal, many browsers may get confused when they encounter HTML tags within comments and display them anyway.)

Here are some examples:

<!-- Rewrite this section with less humor -->
<!-- Neil helped with this section -->
<!-- Go Tigers! -->

Exercise 3.4: Creating a real HTML page.

At this point, you know enough to get started creating simple HTML pages. You understand what HTML is, you've been introduced to a handful of tags, and you've even tried browsing an HTML file. You haven't done any links yet, but you'll get to that soon enough, in the next chapter.

This exercise shows you how to create an HTML file that uses the tags you've learned about up to this point. It will give you a feel for what the tags look like when they're displayed on-screen and for the sorts of typical mistakes you're going to make. (Everyone makes them, and that's why it's often useful to use an HTML editor that does the typing for you. The editor doesn't forget the closing tags, or leave off the slash, or misspell the tag itself.)

So, create a simple example in that text editor of yours. It doesn't have to say much of anything; in fact, all it needs to include are the structure tags, a title, a couple of headings, and a paragraph or two, Here's an example:

<HTML>
<HEAD>
<TITLE>Company Profile, Camembert Incorporated</TITLE>
</HEAD>
<BODY>
<H1>Camembert Incorporated</H1>
<P>"Many's the long night I dreamed of cheese -- toasted, mostly."
-- Robert Louis Stevenson</P>
<H2>What We Do</H2>
<P>We make cheese. Lots of cheese; more than eight tons of cheese
a year.</P>
<H2>Why We Do It</H2>
<P>We are paid an awful lot of money by people who like cheese.
So we make more.</P>
<H2>Our Favorite Cheeses</H2>
<UL>
<LI>Brie
<LI>Havarti
<LI>Camembert
<LI>Mozzarella
</UL>
</BODY>
</HTML>

Save your example to an HTML file, open it in your browser, and see how it came out.

If you have access to another browser on your computer or, even better, one on a different computer, I highly recommend opening the same HTML file there so you can see the differences in appearance between browsers. Sometimes the differences can surprise you; lines that looked fine in one browser might look strange in another browser.

Here's an illustration for you: The cheese factory example looks like Figure 3.21 in Netscape (the Macintosh version) and like Figure 3.22 in Lynx.

Figure 3.21 : The cheese factory in Netscape.

Figure 3.22 : The cheese factory in Lynx.

See what I mean?

Summary

HTML, a text-only markup language used to describe hypertext pages on the World Wide Web, describes the structure of a page, not its appearance.

In this chapter, you've learned what HTML is and how to write and preview simple HTML files. You've also learned about the HTML tags shown in Table 3.1.

Table 3.1. HTML tags from this chapter.
TagUse
<HTML> ... </HTML> The entire HTML page.
<HEAD> ... </HEAD> The head, or prologue, of the HTML page.
<BODY> ... </BODY> All the other content in the HTML page.
<TITLE> ... </TITLE> The title of the page.
<H1> ... </H1> First-level heading.
<H2> ... </H2> Second-level heading.
<H3> ... </H3> Third-level heading.
<H4> ... </H4> Fourth-level heading.
<H5> ... </H5> Fifth-level heading.
<H6> ... </H6> Sixth-level heading.
<P> ... </P> Paragraph.
<OL>...</OL> An ordered (numbered) list. Items in the list each begin with <LI>.
<UL>...</UL> An unordered (bulleted or otherwise-marked) list. Items in the list each begin with <LI>.
<MENU>...</MENU> A menu list (a list of short items or paragraphs).
<DIR>...</DIR> A list of especially short (1-2 word) items. Directory lists are not often used in most HTML files.
<LI> Individual list items in ordered, unordered, menu, or directory lists.
<DL>...</DL> A glossary or definition list. Items in the list consist of pairs of elements: a term and its definition.
<DT> The term part of an item in a glossary list.
<DD> The definition part of an item in a glossary list.
<!-- ... --> Comment.

Q&A

QCan I do any formatting of text in HTML?
AYou can do some formatting to strings of characters; for example, making a word or two bold. And the Netscape extensions allow you to change the font size and color of the text in your Web page (for readers using Netscape). You'll learn about these features tomorrow, in Chapters 5 and 6.
QI'm using Windows. My word processor won't let me save a text file with an extension that's anything except .txt. If I type in index.html, it saves the file as index.html.txt. What can I do?
AYou can rename your files after you've saved them so they have an html or htm extension, but this can be annoying with lots of files. Consider using a text editor or HTML editor for your Web pages.
QI've noticed in many Web pages that the page structure tags (<HTML>, <HEAD>, <BODY>) aren't used. Do I really need to include them if pages work just fine without them?
AYou don't need to, no. Most browsers will handle plain HTML without the page structure tags. But including the tags will allow your pages to be read by more general SGML tools and to take advantage of features of future browsers. And, it's the "correct" thing to do if you want your pages to conform to true HTML format.
QI've seen comments in some HTML files that look like this:
<!-- this is a comment>Is that legal?
AThat's the old form of comments that was used in very early forms of HTML. Although many browsers may still accept it, you should use the new form (and comment each line individually) in your pages.
QMy glossaries came out formatted really strangely! The terms are indented farther in than the definitions!
ADid you mix up the <DD> and <DT> tags? The <DT> tag is always used first (the definition term), and then the <DD> follows (the definition). I mix these up all the time. There are too many D tags in glossary lists.
QI've seen HTML files that use <LI> outside of a list structure, alone on the page, like this:
<LI>And then the duck said, "put it on my bill"
AMost browsers will at least accept this tag outside a list tag and will format it either as a simple paragraph or as a non-indented bulleted item. However, according to the true HTML definition, using an <LI> outside a list tag is illegal, so "good" HTML pages shouldn't do this. And because we are all striving to write good HTML (right?), you shouldn't do this either. Always put your list items inside lists where they belong.