Chapter 24

SGML Presentations


CONTENTS

Ever since computer technology has been available in multiple platforms, one of the problems that has plagued users has been the difficulty, or downright impossibility, of transferring documents from one computer system or software program to another.

The beginning of the cure for this plague came about in 1986 when the Standard Generalized Markup Language (SGML) was adopted by the International Organization for Standardization (ISO) as a standard for the exchange of data worldwide. Since that time, use of SGML has increased rapidly, particularly among the defense, aerospace, automotive, electronics, and telecommunications industries.

How SGML Works

SGML is a standard language for "marking" or coding electronic documents and files that allows users to access information regardless of the system or platform they are using. SGML works by treating the content, format, and structure of a document as three distinct elements, as shown in Figures 24.1 and 24.2.

Figure 24.1: SoftQuad's Panorama, showing an SGML document.

Figure 24.2: The SGML source code for the same document. Note the similarities to HTML code.

Content is the actual information, such as text and images, in the document. The format determines how the words and images appear on the screen or paper-for example, font, point size, italics, and bold. The structure of a document indicates the relationships among the various pieces of content, such as paragraphs, headings, subheadings, or lists. SGML is designed to preserve the structure and content of the document.

For example, in this book you know that you are now reading Chapter 24. From the Table of Contents and from the section heading you also know that Chapter 24 is in Part III. In the same way that the editors have organized this book into logical sections and subsections of information, SGML organizes and divides electronic documents into a recognizable and retrievable structure.

Because it is an international standard, all computer platforms can become capable of interpreting this code regardless of the document's source. The universality of SGML allows for the efficient and accurate transfer of all content and structural information from one computer system to another, while still allowing individual users to modify the format of a document to suit their needs and requirements. No matter how technology changes in the future, the SGML will make documents durable and exchangeable.

Using SGML to Preserve File Structure

The capability to preserve the structural integrity of a document is what makes SGML so revolutionary. For SGML to preserve the document structure, the document must contain discrete markers that identify the structural elements. These markers, called tags, are located at the beginning and end of each structural element. For example, suppose you have a paragraph such as the following:

This is the first paragraph of my document.

You can tell any computer with SGML capabilities to preserve this information as a paragraph element by marking it like this:

<par> This is the first paragraph of my document. </par>

It's no coincidence that this looks a lot like HyperText Markup Language (HTML), the language used to create documents for the World Wide Web. HTML is simply an application of SGML. Thanks to the original SGML research, Web browsers from different formats can understand HTML files universally.

The specifics of the structural designations of HTML, and all other applications of SGML, are located in its Document Type Definition, or DTD.

Document Type Definition

If you regularly use documents that have an exact uniform structure, such as Web documents, time sheets, or product specification forms, you would want to use a template to ensure that these documents remained identical in structure as they pass from one computer system to another and as they are updated and re-created.

The template or framework for the various elements in an SGML application is the DTD. The DTD not only preserves the structure of a specific type of electronic file, but also enforces the rules of that structure. For example, if you had to submit a specific report, the DTD for that type of report could specify that the report must contain sections A, B, and C, and that each of these sections must contain at least one paragraph. In this way, the DTD helps ensure that documents have a uniform, logical structure. A document whose content has been tagged to conform to a particular DTD is called a document instance.

The best SGML software programs allow you to tag information by clicking on pull-down menus. When working within the confines of a particular DTD, the pull-down menus will list only those tags that are valid at the cursor's current position in the document. Therefore, you cannot diverge from the structure of the DTD even if you wanted to.

Different industries and companies obviously require different types of DTDs to facilitate and manage their information. Many SGML systems offer a set of preprogrammed DTDs that comes with the software. Relatively few people will want to write their own DTDs, because this process can be difficult. However, a high-quality SGML product allows users to create a variety of document types. This will give you the ability to create user-defined DTDs.

Benefits of SGML

The primary benefit of SGML is that it dramatically increases the ease with which people can access the information that your company creates. However, other benefits can be found in improved information collection, compilation, and dissemination, as well as in increased cost efficiency.

Increased Access

Using SGML for document creation allows universal access to needed information efficiently and accurately. At a time when different computers, operating systems, and applications abound, only a language that is hardware and software independent, like SGML, will allow all your users to exchange documents with ease. This way, if your art department creates a file on a Mac, your CEO, who uses a PC, can view and edit it before it goes to press.

Information Collection and Compilation

Structured guidelines for the creation of new documents will increase productivity by eliminating the time spent formatting new documents. It improves data integrity by reducing the need to filter data from one format to another and lengthens the period of time that stored information can be used by ensuring that the data will be retrievable regardless of future changes in hardware or software. Remember all those files you have on 5 1/4-inch floppies that are in some ancient DOS word processing format nobody uses anymore? With PDF, your files will have a much longer shelf life.

Information Dissemination

With electronic publishing sweeping the world, SGML enables you to translate information that was prepared for traditional printing methods into a wide variety of formats suitable for publishing on everything from CD-ROM to the World Wide Web. SGML also can improve information dissemination by allowing users to share whole documents or sections of documents without the need for wasteful hard copy reproduction and duplication.

Cost Efficiency

All the previously mentioned benefits of SGML can translate into direct and indirect cost savings by providing greater information accessibility, improved data integrity, increased life span of archival information, and a reduced need for printed products. Think about all the money your company spends on printing human resources documents, corporate directories, and internal memos every year. These costs can be all but eliminated by creating and duplicating your documents in electronic format using PDF. And any changes mean a few keystrokes, not thousands of dollars in printing and distribution costs.

Drawbacks of SGML

When using SGML, you might find the enforced structure of the DTD somewhat limiting. If you have a small number of DTDs available to you and do not have the ability to write new DTDs, working in SGML can be frustrating.

The universality of SGML is great, as long as your systems know the code. SGML translators are not native to most computer systems, so the ability to create and read documents with SGML tags requires an investment in systems and software.

Although SGML is an English-based language, it is not always intuitive and can be as complex as the documents you are trying to tag. Thorough knowledge of SGML does not come easily. For industries that are document intensive, the use of SGML must be considered part of an entire information management strategy. Although standardizing on SGML may require significant time and investment, the benefits (as explained in the preceding section) can make the transition worthwhile.

Does Your Company Need SGML?

When considering whether to standardize some of your company's information using SGML, several factors should be considered. The following questions can help you define your current information management needs:

How you answer these questions will help you determine how SGML fits into your information management strategy. Remember that not all your documents will need to be standardized on SGML-only those that have a definable structure and need to last.

Summary

Over the years, the expansion and application of the SGML system has affected many areas of the digital information explosion. SGML and its execution are often complex and cryptic processes requiring well-trained and informed professionals who can best utilize and tailor SGML to specific needs. This chapter is only an introduction to SGML that should help you understand its uses and inspire a more in-depth investigation of this technology.