Day 5

Chapter 10 Sound and Video Files

An Introduction to Digital Sound
Common Sound Formats
Getting Sound Files
Sampling Sound
Converting Sound Files
Audio for the Web
An Introduction to Digital Video
- Analog and Digital Video
- Compression and Decompression (Codecs)
Movie Formats
Movie Compression
Codec Formats
Digitizing Video
Getting and Converting Video
Video for the Web
For More Information
Summary
Q&A

After an afternoon of Web exploring, you've just reached a page that has a long list of movie samples you can download. Neat, you think, scanning over the list. The problem, however, is that beside the name of each file, there's a description, and it looks something like this:

'Luther's Banana' is a 1.2 megabyte AVI file with a CinePak codec and an 8-bit 22Khz 
two-channel audio track.

If you understood that, you don't need this chapter. If, on the other hand, you're interested in learning about sound and video and how they relate to the Web, or if you've decided that you must know what all those strange words and numbers mean, read on.

In this chapter, I'll talk about digital audio and video: the basics of how they work, the common file formats in use on the Web and in the industry, and some ideas for obtaining sound and video and using it in your Web pages. Here are some of the things you'll learn in this chapter:

Digital audio and video: what they are and how they work
The common sound formats: m-law, AIFF, WAVE, and RealAudio
The common video formats: QuickTime, Video for Windows, and MPEG
Video codecs: what they are and which ones are the most popular and useful
Creating and modifying sound and video files for use on the Web

An Introduction to Digital Sound

Want to know something about how sound on the computer works? Want to create your own audio clips for the Web (be they music, voice, sound effects, or other strange noises)? You've come to the right place. In the first part of the chapter, you'll learn about what digital audio is and the sort of formats that are popular on the Web, and you'll have a quick lesson in how to get sound into your computer so you can put it on the Web.

Sound Waves

You might remember from high school physics that the basic definition of sound is that sound is created by disturbances in the air that produce waves. Those pressure waves are what is perceived as sound by the human ear. In its simplest form, a sound wave looks something like what you see in Figure 10.1.

Figure 10.1 : A basic sound wave.

There are two important things to note about the basic sound wave. First, it has an amplitude, which is the distance between the middle line (silence) and the top or bottom of the wave crests. The greater the amplitude, the louder the sound.

It also has a frequency, which is the speed the wave moves (or, more precisely, the number of waves that move past a point during a certain amount of time). Higher frequencies (that is, faster waves moving past that point) produce high-pitched sounds, and lower frequencies produce low-pitched sounds.

Real sounds are much more complicated than that, of course, with lots of different complex wave forms making up a single sound as you hear it. With the combinations of lots of sound waves and different ways of describing them, there are many other words and concepts I could define here. But frequency and amplitude are the two most important ones, and are the ones that will matter most in the next section.

Converting Sound Waves to Digital Samples

An analog sound wave (the one you just saw in Figure 10.1) is a continuous line with an infinite number of amplitude values along its length. To convert it to a digital signal, your computer takes measurements of the wave's amplitude at particular points in time. Each measurement it takes is called a sample; therefore, converting an analog sound to digital audio is called sampling that sound. Figure 10.2 shows how values along the wave are sampled over time.

Figure 10.2 : Sampling a sound wave.

The more samples you take, the more amplitude values you have and the closer you are to capturing something close to the original sound wave. But because the original wave has an infinite number of values, you can never exactly re-create the original. With very high sampling rates, you can create a representation of the original sound wave so close that the human ear can't tell the difference.

The number of samples taken per second is called the sample rate and is usually measured in kilohertz (KHz). There are several different possible sample rates in use today, but the most popular are 11KHz, 22KHz, and 44KHz.

NOTE

Those numbers are rounded off for simplicity. The actual numbers are usually 11.025KHz, 22.050KHz, and 44.1KHz.

In addition to the sample rate, you also have the sample size, sometimes called the sample resolution. There are generally two choices for sample resolutions, 8-bit and 16-bit. Think of sample resolution in terms of increments between the top and bottom of the wave form. The values don't actually change, but if you have 8-bit increments and 16-bit increments across that distance, the latter are smaller and provide finer detail (see Figure 10.3). It is much the same way that 8-bit versus 16- or 24-bit color works. You can get a much broader range of colors with the higher color resolution, but you always get close to the same color with each.

Figure 10.3 : Sample resolution.

New Term

The sample rate is the number of sound samples taken per second and is measured in Khz. The sample size is usually either 8-bit or 16 bit; 16-bit provides finer "details" in the sound.

When a sound sample is taken, the actual value of the amplitude is rounded off to the nearest increment (in audio jargon, the rounding off is called quantizing). If you're using a 16-bit sample, you're much more likely to get closer to the original value than in an 8-bit sample because the increments are smaller (see Figure 10.4).

Figure 10.4 : Taking a sample.

The difference between the actual amplitude value and the rounded-off value is called quantization error (more audio jargon). Lots of quantization error results in a sort of hissing noise in the final sound file.

All this is a complicated way of saying that 16-bit is better than 8-bit. (So why didn't I just say that? Well, now you know why it's better.) The overall quality of a digital audio sound is loosely related to both its sample size and sample rate. However, because the human ear can pick up quantization errors more easily than errors in a low sample rate, it's always better to go with 16-bit over 8-bit. If you use 8-bit, use the highest possible sample rate to adjust for the errors.

Finally, sounds can also have multiple channels, usually used for creating stereo effects. Typically, one channel is mono, two channels are stereo, four channels are quad, and so on, just as in your stereo.

The higher the sample rate and size, and the more channels, the better the quality of the resulting sound. For example, an 8-bit sound sample at 8KHz is about the quality you get over the telephone, whereas 16-bit stereo at 44KHz is CD-quality audio. Unfortunately, just as with image files, greater sound quality in your audio means larger file sizes. A minute of music at 22KHz with 8-bit sample resolution takes up 1.25MB on your disk, whereas a minute of CD-quality audio (16-bit, 44KHz) runs you 10MB. Stereo, of course, is twice the size of mono.

So what about compression? If these files take up so much room, why not do as the image folks have done and create compression algorithms that reduce the size of these files? Word from the experts is that audio is notoriously difficult to compress. (This makes sense. Unlike images, audio sound waves are incredibly complex, and there aren't the same sort of repeated patterns and consistent variations that allow images to be compressed so easily.) Only a few of the common sound file formats have built-in compression.

Digital Back to Analog

So now you have an analog sound encoded digitally on your computer, and you're going to play it. When you play a digital audio sound, the computer translates the digital samples back into an analog sound wave.

Because a digital sample relies on millions of single digits to represent the sound wave, each of which is held for the same amount of time as the sound was originally sampled, this can produce a jaggy sound wave and a funny-sounding sample (see Figure 10.5).

Figure 10.5 : A jaggy analog signal.

Analog filters are used to smooth out the jags in the wave (see Figure 10.6), which is then sent to your computer speakers.

Figure 10.6 : The jaggy wave smoothed out.

Common Sound Formats

Now that you know how digital sound works, let's go over how digital sound is stored. Unfortunately, even now there isn't a standard for audio on the Web that is similar to the way GIF and JPEG are standard now for images. It's still a hodgepodge of formats, all of them used at different times. This section will at least give you an idea of what's out there and what it means.

µ-law (Mu-law), AU

The most common and readily available sound format that works cross-platform is m-law, pronounced myew-law (or sometimes you-law, because the Greek m character looks like a u). Used by both Sun and NeXT for their standard audio format, m-law format was designed for the telephone industry in the United States. Its European equivalent is called A-law and is, for the most part, the same format. m-law also has several variations that all come under the same name, but all should be readable and playable by a player that claims to support m-law. m-law files are sometimes called AU files because of their .au filename extension.

Samples in m-law format are mono, 8-bit, and 8KHz. But the encoding of the sample is different from most other formats, which allows m-law to have a wider dynamic range (variation between soft and loud parts of a sound) than most sounds encoded with such a small sample size and rate. On the other hand, m-law samples tend to have more hiss than other sound formats.

NOTE

Some sound applications enable you to record m-law samples at a higher sample rate than 8KHz. However, this might make them unplayable across platforms. If you're going to choose m-law, stick with the standard 8-bit, 8KHz sample.

The only advantage of m-law sound samples is their wide cross-platform support. Many sites providing sound samples in a more high-fidelity format such as AIFF or MPEG will provide a m-law sample as well to reach a wider audience.

AIFF/AIFC

AIFF stands for Audio Interchange File Format. AIFF was developed by Apple and is primarily a Macintosh format, but SGI has adopted it as well. In terms of flexibility, AIFF is an excellent format, which allows for 8- or 16-bit samples at many sample rates, in mono or stereo. AIFF files have a .aiff or .aif filename extension.

AIFC is AIFF with compression built in. The basic compression algorithm is MACE (Macintosh Audio Compression/Expansion), with two variations, MACE3 (3-to-1 compression) and MACE6 (6-to-1 compression). Both are lossy compression schemes, so AIFC compressed files will lose some of the sound quality of the original. Most AIFF players also play AIFC, so using one over the other is only a question of file size or sound quality.

Macintosh SND Files

The SND format, sometimes called just plain Macintosh System Sounds, is the format used only on the Macintosh for many simple sounds such as the beeps and quacks that come with the system. SND files are actually files with SND resources (the Macintosh has a resource and data fork for many files) which can contain digital samples or a series of commands playable by the Macintosh Sound Manager. SND files are not widely used on the Web because they are limited to the Macintosh, but SND files are widely available and easily converted to other sound formats.

Windows WAVE

WAVE or RIFF WAVE format, sometimes called WAV from the .wav extension, was developed by Microsoft and IBM, and its inclusion in Windows 3.1 has made it the audio standard on the PC platform. WAVE and AIFF have much in common, mostly in their flexibility. WAVE files can also accommodate samples in any rate, size, and number of channels. In addition, WAVE files can include several different compression schemes.

MPEG Audio

MPEG stands for Moving Picture Experts Group, which is a standards committee interested primarily in compression for digital video. But, because video usually includes an audio track, the group considers issues in audio compression as well. The MPEG audio compression algorithm is far too complex to explain here (in other words, I don't understand it). However, you can get all the technical information you want from the MPEG FAQ, available at most sites that carry Usenet FAQs (one is listed at the end of this chapter).

MPEG audio has become popular on the Web mostly because of the Internet Underground Music Archive, which uses it for its sound samples (visit IUMA at http://www.iuma.com/IUMA/). Using MPEG, you can get excellent sound quality without needing enormous amounts of disk space. The files are still rather large, but the quality is excellent. On the other hand, your readers (listeners) will also need an MPEG audio player for their platform and might need to configure their browser in order to properly use the samples.

RealAudio

RealAudio format, playable using the RealAudio player or plug-in and the RealAudio server, currently comes in two flavors: 14.4 format, playable over 14.4KB modems, provides "monophonic AM quality sound." The 28.8 format, playable over 28.8KB modems or faster connections, provides "monophonic near-FM quality sound."

Both 14.4 and 28.8 formats are highly compressed using a lossy compression algorithm of their own design. RealAudio files tend to be much smaller than their equivalent AIFF or WAVE equivalents, but the sound quality is not as good.

Getting Sound Files

Where can you get sound files to use on the Web? You can get them from a variety of sources:

Some platforms with CD-ROM drives may allow you to record digital sounds directly off a standard audio CD; you'll need a CD-ROM drive that supports this, of course. Keep in mind if you go this route that most published audio material is copyrighted, and its owners may not appreciate your making their songs or sounds available for free on the Internet.
Many Internet archives have collections of small, digitized samples in the appropriate format for the platform they emphasize (for example, SND format files for Macintosh archives, WAV format for Windows, AU for Sun's UNIX, and so on).

Warning

Keep in mind that, like images, sounds you find on the Net may be owned by someone who won't like your using them. Use caution when using "found" sounds.

Commercial "clip sound" products are available, again, in appropriate formats for your platform. These sounds have the advantage of usually being public domain or royalty-free, meaning that you can use them anywhere without needing to get permission or pay a fee.

Sampling Sound

The most interesting sounds for your Web presentation, of course, are those you make yourself. As I mentioned earlier, the process of recording sounds to digital files is called sampling. In this section, you'll learn about the sort of equipment you can get and the software available to sample and save sounds.

New Term

Sampling is the process of encoding analog sound into a digital format.

Note that to get truly high-quality production digital audio for the Web or for any other use, you'll need to spend a lot of money on truly high-quality production equipment, and the choices are very broad. Also note that as time goes on, better technology becomes more widespread and cheaper, so the best I can hope to provide here is a general rundown of the technology. Shop your local computer store or magazines for more information.

Sampling on PCs

To sample sound on a PC, you'll need a sound card. Most sound cards can give you audio capabilities from 8-bit mono at 11 or 22Khz all the way up to 16-bit 44KHz stereo. Go for the 16-bit cards. Not only will you get better quality for the sounds you input, but more games and multimedia titles for the PC are taking advantage of 16-bit sound, and the better quality is much more impressive. Once you have a sound card, you can connect your tape deck or microphone to the line-in jacks on the card or just plug in a standard microphone. Then, it's a question of software.

Windows comes with a simple sound recorder called Sound Recorder (an apt choice for a name), which can record simple sounds in 8-bit mono at 11KHz. For very simple sound recordings such as voices and small sound effects, this might be all you need (see Figure 10.7).

Figure 10.7 : The Windows Sound Recorder.

Your sound card also should be packaged with sound tools that will enable you to record and edit sounds. The standard Sound Blaster card comes with several applications for capturing and editing sound, including the WaveEditor program (see Figure 10.8), which allows sound recording and editing across a broad range of rates and sizes.

Figure 10.8 : Sound Blaster's WaveEditor.

For serious sound editing and processing, you might want to check out CoolEdit. CoolEdit is a shareware sound editor with an enormous number of features. It supports full recording on most sound cards; it can read, convert, and save to a wide range of sound formats; and it even has built-in controls for your CD player. For $25 or $35 with one free upgrade, it's a great deal if you're doing Windows sound editing. You can find out more about CoolEdit at http://www.netzone.com/syntrillium/.

If you're planning to work extensively with both sound and video, you might want to look into Adobe Premiere. Long the choice of multimedia developers for the Macintosh, Premiere provides a great deal of power over both audio and video capture and integration, and it works with most sound boards. It is more expensive, but it's one of the best tools out there.

Sampling on Macintoshes

Macintoshes have had sound capabilities built in for many years now, and most Macs are shipped with either a built-in microphone or a separate plug-in microphone. You can record directly into the microphone (for mono 22KHz, 8-bit sounds) or plug a standard stereo audio jack into the back of the computer. Most newer Macs are capable of recording 16-bit stereo at up to 48KHz (44KHz is CD quality, 48KHz is DAT (Digital Audio Tap) quality); check with the specifications for your model to see what it's capable of.

For basic 8-bit mono, 22KHz sounds that are under 10 seconds, you can record using the Sound control panel, which is part of the standard Mac system software. Just select Add and click the Record button (see Figure 10.9).

Figure 10.9 : Recording from the Sound control panel.

For more control over your sounds, you'll need different software. Lots of tools exist for recording sound on the Mac, from the excellent freeware SoundMachine (for recording and sound conversion) and the shareware SoundHack (for editing), to commercial tools that do both, such as MacroMedia's SoundEdit 16. As I mentioned in the Windows section, Adobe Premiere is also an excellent tool, particularly if you intend to do work with video as well (see Figure 10.10).

Figure 10.10: Premiere's audio options.

Sampling on UNIX Workstations

Most newer UNIX workstations come with built-in microphones that provide 16-bit sampling rates for audio. Check with your manufacturer for specifics.

Converting Sound Files

Once you have a sound file, it may not be in the right format-that is, the format you want it to be in. The programs mentioned in this section can read and convert many popular sound formats.

For UNIX and PC-compatible systems, a program called SOX by Lance Norskog can convert between many sound formats (including AU, WAV, AIFF, and Macintosh SND) and perform some rudimentary processing including filtering, changing the sample rate, and reversing the sample.

On DOS, WAVany by Bill Neisius converts most common sound formats (including AU and Macintosh SND) to WAV format.

Waveform Hold and Modify (WHAM), for Windows, is an excellent sound player, editor, and converter that also works really well as a helper application for your browser.

For the Macintosh, the freeware SoundApp by Norman Franke reads and plays most sound formats, and converts to WAV, Macintosh SND, AIFF, and NeXT sound formats (but mysteriously, not Sun AU). The freeware program Ulaw (yes, it's spelled with a U) will convert Macintosh sounds (SND) to AU format.

FTP sources for each of these programs are listed in Appendix A, "Sources for Further Information."

To convert any sound formats to RealAudio format, you'll need the RealAudio Encoder. It's available free with the RealAudio Server package, or you can download a copy from Real Audio's site at http://www.realaudio.com/.

Audio for the Web

Now that I've presented all the options you have for recording and working with audio, I should give some cautions for providing audio files on the Web.

Just as with images, you won't be able to provide as much as you would like on your Web pages because of limitations in your readers' systems and in the speed of their connections. Here are some hints for using audio on the Web:

Few systems on the Web have 16-bit sound capabilities, and listening to 16-bit sounds on an 8-bit system can result in some strange effects. To provide the best quality of sound for the widest audience, distribute only 8-bit sounds on your Web page. Or, provide different sound files in both 8- and 16-bits.
To provide the best quality of 8-bit sounds, record in the highest sampling rate and size you can, and then use a sound editor to process the sound down to 8-bit. A lot of sound converter programs and editors enable you to downsample the sound in this way. Check out, in particular, a package called SOX for UNIX and DOS systems that includes several filters for improving the quality of 8-bit sound.
Try to keep your file sizes small by downsampling to 8-bit, using a lower sampling rate, and providing mono sounds instead of stereo.
As I noted in the last chapter, always indicate on the page where you describe your sounds what format those sounds are in, whether it is WAVE, AIFF, or other format. Keep in mind that because there is no generic audio standard on the Web, your readers will be annoyed at you if they spend a lot of time downloading a sound and they don't have the software to play it. Providing the file size in the description is also a common politeness for your readers so they know how long they will have to wait for your sound.
If you are very concerned about sound quality and you must provide large audio files on your Web page, consider including a smaller sound clip in m-law format as a preview or for people who don't have the capabilities to listen to the higher-quality sample.
Creating sounds for RealAudio format? Most of these same hints apply. However, you'll also want to check out the hints and suggestions RealAudio gives for getting the best sound quality out of RealAudio files at http://www.realaudio.com/help/content/audiohints.html.

An Introduction to Digital Video

Digital video is tremendously exciting to many in the computer industry at the moment, from hardware manufacturers to software developers (particularly of games and multimedia titles) to people who just like to play with cutting-edge technology. On the Web, digital video usually takes the form of small movie clips, usually in media archives.

I can't provide a complete overview of digital video technology in this book, partly because much of it is quite complicated, and mostly because the digital video industry is changing nearly as fast as the Web is. But for producing small, short videos for the purposes of publishing on the Web, I can provide some of the basics and hints for creating and using digital video.

Analog and Digital Video

Analog video, like analog audio, is a continuous stream of sound and images. In order to get an analog video source into your computer, you'll need a video capture board that samples the analog video at regular intervals to create a digital movie, just as the audio sampling board does for audio. At each interval, the capture board encodes an individual image at a given resolution called a frame. When the video is played back, the frames are played in sequence and give the appearance of motion. The number of frames per second-the speed at which the frames go by-is called the frame rate and is analogous to the sampling rate in digital audio. The better the frame rate, the closer you can get to the original analog source.

In addition to the frame rate, frame size (the actual size in pixels of the frame on your screen) is also important (see Figure 10.11).

Figure 10.11: Frame rates and sizes.

New Term

A frame is an individual image in a video file. The frame rate is how many frames go by per second, and the frame size is the actual pixel dimension of each frame.

The frame rate of standard full-screen video, such as what you get on your VCR, is 30 frames per second. This frame rate is sometimes called full-motion video. Achieving full-screen, full-motion video-the sort of standard that is easy with a $700 camcorder-is the Holy Grail for programmers and authors working with digital video. Most of the time, they must settle for significantly less in frame rates and frame sizes to get smooth playback.

Why? On an analog video source, 30 frames per second is no big deal. The frames go by, and they're displayed. With digital video, each frame must be read from disk, decompressed if necessary, and then spat onto the screen as fast as possible. Therefore, a lot of processing power, a fast hard drive, and an even faster graphics system in your computer are required in order for it to work correctly, even more so for larger frame sizes and faster frame rates.

So what happens if the movie is playing faster than your computer can keep up? Usually your computer will drop frames-that is, throw them away without displaying them. And when frames are being dropped, the frame rate goes down, creating jerkier motions or outright halts in the action. This is not a good situation for your video clip.

What you'll discover when you start playing with it is that producing digital video is often a series of compromises in order to fit into the constraints of the platform you are working with. You'll learn more about these compromises later in this section.

Compression and Decompression (Codecs)

Image and audio formats, as I've noted previously, take up an enormous amount of space. Now combine the two-hundreds, if not thousands, of images, plus an audio soundtrack-and you can begin to imagine how much disk space a digital video file can take up. The bigger the file, the harder it is for the computer system to process it with any amount of speed, and the more likely it is that playback quality will suffer. For these reasons, compression and decompression technology is especially important to digital video files, and lots of work has been done in this area.

In digital video, the algorithm for compression and decompression is usually referred to as a single thing called a codec (short for COmpression/DECompression, pronounced coh-deck). Unlike with image compression, video codecs are not tightly coupled with video file formats. A typical format can use many different kinds of codecs and can usually choose the right one on the fly when the video is played back.

New Term

A video codec is the algorithm used for compressing and decompressing that video file.

You'll learn more about codecs, how they work, and the popular kinds of codecs in use, later in this chapter in the section "Movie Compression."

Movie Formats

Digital video in a file ready to be played back on a computer is often referred to as a movie. A movie contains digital video data (just as a sound file contains digital audio data), but that data can be a live-action film or an animation; movie is simply a generic term to refer to the file itself.

Right now the Big Three movie formats on the Web and in the computer industry at large are QuickTime, Video for Windows (VfW), and MPEG.

QuickTime

Although QuickTime was developed by Apple for the Macintosh, QuickTime files are the closest thing the Web has to a standard cross-platform movie format (with MPEG a close second). The Apple system software includes QuickTime and a simple player (called MoviePlayer or SimplePlayer). For PCs, QuickTime files can be played through the QuickTime for Windows (QTfW) package, and the freely available Xanim program will play them under the X Window System and UNIX. QuickTime movies have the extension .qt or .mov.

QuickTime supports many different codecs, particularly CinePak and Indeo, both of which can be used cross-platform. See the "Codec Formats" section later in this chapter for more information on these formats.

NOTE

If you produce your QuickTime videos on the Macintosh, you must make sure that they are flattened before they can be viewable on other platforms. See the section "Getting and Converting Video" later in this chapter for more information on programs that will flatten QuickTime files for you.

Video for Windows

Video for Windows (VfW) was developed by Microsoft and is the PC standard for desktop video. VfW files are sometimes called AVI files from the .avi extension (AVI stands for Audio/Video Interleave). VfW files are extremely popular on PCs, and hordes of existing files are available in AVI format. However, outside of the PC world, few players exist for playing AVI files directly, making VfW less suitable than QuickTime for video on the Web.

The MPEG Video Format

MPEG is both a file format and a codec for digital video. There are actually three forms of MPEG: MPEG video, for picture only; MPEG audio, which is discussed in the previous section; and MPEG systems, which includes both audio and video tracks.

MPEG files provide excellent picture quality but can be very slow to decompress. For this reason, many MPEG decoding systems are hardware-assisted, meaning that you need a board to play MPEG files reliably without dropping a lot of frames. Although software decoders definitely exist (and there are some very good ones out there), they tend to require a lot of processor power on your system and also usually support MPEG video only (they have no soundtrack).

A third drawback of MPEG video as a standard for the Web is that MPEG movies are very expensive to encode. You need a hardware encoder to do so, and the price ranges for encoders are in the thousands of dollars. As MPEG becomes more popular, those prices are likely to drop. But for now, unless you already have access to the encoding equipment or you're really serious about your digital video, a software-based format is probably the better way to go.

NOTE

An alternative to buying encoding hardware is to contract a video production service bureau to do it for you. Some service bureaus may have the MPEG encoding equipment and can encode your video into MPEG for you, usually charging you a set rate per minute. Like the costs of MPEG hardware, costs for these service bureaus are also dropping and may provide you a reasonable option if you must have MPEG

Movie Compression

As with images and audio, compression is very important for being able to store digital video data, perhaps even more so because movie files have so much data associated with them. Fortunately, lots of compression technologies exist for digital video, so you have lots to choose from.

As I mentioned early on in this section, video compression methods are called codecs, which include both compression and decompression as a pair. Compression generally occurs when a movie is saved or produced; decompression occurs on the fly when the movie is played back. The codec is not part of the movie file itself; the movie file can use one of several codecs, and you can usually choose which one you want to use for your movie when you create it. (When the movie is played, the right codec to decompress it is chosen automatically.)

In this section, I'll talk about methods of video compression and, in the next section, about specific codecs you have available for use in your own files.

Asymmetric and Symmetric Codecs

Codecs are often referred to as being symmetric or asymmetric (see Figure 10.12). These terms refer to balance of the speed of compression and speed of decompression. A symmetric codec takes the same amount of time to compress a movie as it does to decompress it, which is good for production time but not as good for playback. Asymmetric codecs usually take a very long time to compress, but make up for it by being fast to decompress (and remember, the faster it takes to decompress a movie, the better frame rate you can get, and so asymmetric codecs tend to be more desirable). Most codecs are at least a little asymmetric on the compression side; some are very much so.

Figure 10.12: Symmetric versus asymmetric codecs.

New Term

Symmetric codecs take as long to compress a digital video file as they do to compress it. With asymmetric codecs either the compression or the decompression takes longer than the other.

Frame Differencing

But how do codecs work for video? They can either work in much the same way image compressing works, with individual frames being compressed and then decompressed at playback, or they can support what is called frame differencing. Frame differencing is simply a method of movie compression that many codecs use; it is not a codec itself.

Much of the processing time required by digital video during playback is taken up in decompressing and drawing individual frames and then spitting them to the screen at the best frame rate possible. If the CPU gets behind in rendering frames, frames can get dropped, resulting in jerky motion. Frame differencing, therefore, is a way of speeding up the time it takes to uncompress and draw a frame. Differenced frames do not have all the information that a standard frame has; instead, they have only the information that is different from that in the frame before it in the movie. Because the differences are usually a lot smaller than the full frame, that means your computer doesn't have to take as long to process it, which can help to minimize dropped frames. Of course, because a differenced frame is also a lot smaller in terms of information, the resulting file size of the movie is a lot smaller as well. Figure 10.13 shows a simple example of frame differencing.

Figure 10.13: Frame differencing.

New Term

Frame differencing involves storing only the portions of frames that have changed since the previous frame, rather than storing the entire frame.

Frame differencing works best in what are called talking head movies: movies with a lot of static backgrounds, with only a small portion of the frame changing from frame to frame. For movies with a lot of change between frames, frame differencing might not work quite as well.

Key Frames

Frame differencing relies on the existence of what are called key frames in the movie file. Key frames are complete frames upon which the differences in differenced frames are based. Each time a differenced frame comes along, the differences are calculated from the frame before it, which is calculated from the frame before it, and so on, back to the key frame. Figure 10.14 shows how the differenced frames are created.

Figure 10.14: Key frames and differencing.

New Term

Key frames are the frames that differenced frames are different from. Key frames are always complete frames and are inserted at appropriate intervals in the file.

Of course, the further away from the key frame you get, the more information will be different, the more information your computer has to keep track of with every frame, and the more likely it is that you'll start taking up too much processing time and dropping frames. So, having key frames at regular intervals is crucial to making sure that you get the best level of compression and that your movie plays smoothly and consistently. On the other hand, because key frames contain a lot more information than differenced frames, you don't want too many of them; key frames take longer to process in the first place. Usually, you can set the number of key frames in a movie in your movie-editing software. The general rule is to allow one key frame per second of video (or one every 15 frames for 15fps movies).

Hardware Assistance

As I stated earlier, because of the enormous amount of information that needs to be processed when a movie is captured, compressed, and played back, only very fast and powerful computers can handle good-quality video with a decent frame rate and size. Although software codecs exist and are popular for video with small frame rates and sizes, when you move toward the higher end of the technology, you'll usually want to invest in a hardware-assisted codec.

Hardware assistance usually takes the form of a video board you can plug into your computer that has special chips on it for processing digital video-usually files with the MPEG or JPEG codecs, as you'll learn about later in this chapter. In the future, video processing chips could very well be standard in many computers. But, for now, hardware assistance is rare in computers on the Web, and you should not rely upon it for the video you produce.

Codec Formats

There are several excellent codecs available for digital video today, both for software-only and for hardware-assisted recording and playback. The two biggest, CinePak and Indeo, are both cross-platform (Mac, Windows, and UNIX), but motion JPEG is quite popular as well, particularly with capture cards.

CinePak

CinePak, formerly called Compact Video, is the most popular codec for QuickTime files and is available in VfW as well. CinePak is a form of lossy compression, so if you use CinePak, you should make sure your original, uncompressed source is of the best quality possible.

CinePak supports frame differencing and is highly asymmetric, taking an enormous amount of time to compress. (I once saw a 15-second movie take an hour to compress.) On the other hand, when the compression is done, the playback is quite smooth and the file sizes are excellent.

Indeo

Second to CinePak, but catching up fast, is Indeo Video. Indeo was developed by Intel as part of the Intel Smart Video Recorder, an excellent video capture card. Indeo can be lossy or lossless, supports frame differencing, and is much less asymmetric than CinePak. However, it requires more processor time on decompression, making it more likely to drop frames on lower-end computers.

Indeo was initially available only for VfW files, but QuickTime 2.0 now supports it as well, making it a close second for the most popular codec for digital video, and it's catching up fast.

JPEG

JPEG Compression? Isn't that the image standard? Yes, it is, and it's exactly the same form of compression when it is used in digital video (where it's sometimes called motion JPEG). Remember, movies are a set of frames, and each one is an image-usually a photographic-quality image. Each of those images can be compressed quite well using JPEG compression.

There are two drawbacks to JPEG compression as a codec: lack of frame differencing, and slow decompression. Because JPEG is a compression method for still images, it treats each frame as if it were a still image and does no differencing between frames. For playback, this means that each frame must be individually decompressed and displayed, making it more likely that frames will be dropped and performance will degrade. With hardware assistance, however, JPEG decompression speeds can easily surpass those of software-only codecs with frame differencing, and with hardware assistance JPEG provides probably the best quality and the most widely available video format. But, as with all hardware-assisted codecs, few computers on the Web have JPEG capabilities, so producing JPEG files for the Web is probably not a good idea.

On the other hand, JPEG might be appropriate for video capture. Many video boards support JPEG compression for video capture. If you're planning on using CinePak as your final codec, capturing to JPEG first is an excellent first pass (if you have the disk space to store the movie before you finish compressing it).

The MPEG Codec

I'll mention MPEG here as well because MPEG is both a format and a codec. As I mentioned in the section on formats, MPEG provides excellent high-quality compression for digital video, but usually requires hardware assistance in order to decompress well. Also, MPEG encoders tend to be quite expensive, so creating MPEG movies is no small task. For Web purposes, you should probably go with a software codec such as CinePak or Indeo.

NOTE

MPEG compression is extremely complicated and far beyond the scope of this book; if you have interest in MPEG and how it works, I highly recommend you look at the MPEG FAQ (referenced at the end of this chapter).

Digitizing Video

Getting fancy enough that you want to produce your own video for the Web? The process of actually capturing video into your computer, like audio capture, is pretty easy with the right equipment. You install a capture board, hook up your VCR or camera, start your software for doing captures, and off you go.

The specifics, of course, vary from platform to platform, and in recent months there has been an explosion of products available. In this section I'll provide a general overview of the technology; for more information on specific products, you may want to consult with your local computer store or look into reports in computer magazines.

Analog Video Signals and Formats

You won't need to know much about analog video itself unless you intend to get heavily involved in aspects of video production. But you should be aware of two analog video standards: the video signal and the broadcast format.

How you hook up your video equipment to your computer is determined by the video signal your equipment uses. There are two kinds of video signals: composite and S-video. Composite is the standard signal you get out of your TV, VCR, or camcorder, and, for basic video, it's probably the signal you're going to end up using. S-video, which uses a different cable, is a higher-end standard that separates color and brightness, providing a better-quality picture. If you can use S-video, your final movies will be of much better quality. But you'll have to buy special S-video equipment to do it.

After you have everything hooked up, you'll have to know what broadcast format you're sending to your computer. There are three standard formats in use: NTSC (National Television Standards Committee), which is used in most of North America and Japan; PAL (Phase alteration line), which is used in western Europe, the UK, and the Middle East; and SECAM (SystŽmŽ ƒlectronic Pour Coleur Avec MŽmoire), which is used in France and Russia.

Most video capture cards support NTSC and PAL, so most of the time you won't have to worry about the format you have in your camera or your VCR. If you're not sure what format you have, and you are in the United States, it's likely you have NTSC. Outside the United States, make sure you know what you have and if your video card can handle it.

Video on the PC

The market for low-cost desktop video capture cards on the PC has exploded in recent months. If you're interested in doing video on the PC, I strongly recommend that you check with the trade magazines to see what is currently out there and what is recommended.

On a very basic level of video production, an awesome tool for doing very simple video on the PC (and on the Mac) is the QuickCam from Connectix. This little $100 camera sits on your desktop, and can capture both audio and video or take video still pictures. It operates only in grayscale, and the frame rate is rather low for all but tiny pictures. For simple applications such as small files for the Web or video-conferencing, however, it's a great deal.

In terms of video software, VidCap and VidEdit come with the Video for Windows package. VidCap is used to capture video to VfW format (appropriately) and provide several options of codecs, and it can capture video stills as well. VidEdit (shown in Figure 10.15) is used to edit existing video clips. For example, you can change the frame rate, frame size, codec, or audio qualities, as well as cut, copy, and paste portions of the movie itself.

Figure 10.15: VidEdit.

Also available is SmartVid from Intel, part of the Indeo Video system and the Intel Smart Video Recorder (see Figure 10.16). You can get an evaluation copy of SmartVid Beta from Intel's FTP site (ftp://ftp.intel.com/pub/IAL/Indeo_video/smartv.exe) and use it for capturing, converting, and editing video files. SmartVid also has the edge over VidCap for being able to capture to both VfW and QuickTime files using the Indeo codec.

Figure 10.16: Intel's SmartVid.

Finally, there is Adobe Premiere, whose capture options for version 3.0 are shown in Figure 10.17 (version 4 is out). It is wildly popular on the Macintosh among video professionals, and if you plan on doing much video work, you should look into this application. It can capture and extensively edit both audio and video, combine the two from separate sources, add titles, and save files with varying key frames and codecs.

Figure 10.17: Adobe Premiere.

Video on the Mac

Many newer Macintoshes contain a built-in video card to which you can connect a composite video camera or VCR. In addition, you can spend between a couple hundred to several thousand dollars on video capture systems for the Macintosh as well.

The Connectix QuickCam, which I mentioned in the previous section, is also available for the Macintosh, and is of great use for very simple black-and-white video.

For software capturing and simple editing, FusionRecorder comes with many Macintoshes and can capture, edit, and save simple audio and video files. For more serious editing work, Adobe Premiere is (appropriately) the premier editing program for the Mac, and the one used by most professionals. Also available are Avid's VideoShop, which is cheaper and claims to be easier to use, and Radius's VideoFusion (which is also bundled with the Video Vision system).

Video on UNIX

Depending on your workstation, you may have video built into your box, or you may need to buy a third-party card. High-end SGI and Sun systems now come with video input jacks, video capture software, and sometimes even with small color video cameras. Again, check with your manufacturer for details.

Getting and Converting Video

Just as with images and sound, you can get video clips by making them yourself, downloading them from the Net, or purchasing royalty-free clips that you can read on your platform. Sometimes you may need to convert a video file from one format to another, or from one codec to another. For these sorts of operations, often the software you used to capture the original video is the best to use, but if you don't have that software, or if you got a video file from another source, you'll need simpler tools.

To convert video files between formats on Windows systems, a commercial program called XingCD enables you to convert AVI files to MPEG. AVI to QuickTime converters are also available; one is a program called SmartCap from Intel, which can convert between AVI and QuickTime files that use the Indeo compression method. To use AVI files, you'll need the Video for Windows package, available from Microsoft. To use QuickTime movies, you'll need the QuickTime for Windows package, available from Apple. You'll need both to convert from one format to the other.

To convert video files between formats on the Macintosh, you can use the freeware program Sparkle. Sparkle can read and play both MPEG and QuickTime files, and convert between them. In addition, the program AVI->Quick can convert AVI (Video for Windows) files to QuickTime format.

If you're using QuickTime for your movie files and you want that movie to be read on a platform other than the Macintosh, you will need to "flatten" that movie. On the Macintosh, files contain resource and data forks for different bits of the file. Flattening a QuickTime file involves moving all the data in the QuickTime file into the data fork so that other platforms can read it.

A small freeware program called FastPlayer will flatten QuickTime movies on the Mac; on Windows, try a program called Qflat. FTP locations and other information for these programs are in Appendix A.

Video for the Web

Using a basic desktop computerand simple video equipment you might have lying about, you're never going to get really high-quality video at a large frame rate and size. Even professional desktop video researchers are having trouble achieving that goal, and they're spending several thousands of dollars to get there.

What you can get with everyday household items, however, is a short video sample (less than a minute) in a small window with a high enough frame rate to avoid serious jerkiness. But, even then, the file sizes you'll end up with are pretty large. As I've emphasized time and time again, this is not a good thing over the Web where larger file sizes take longer to transmit over a network connection.

So plan to make some compromises now. The physical size of desktop video files depends on several factors:

Frame size: The smaller the area of the video, the less space you take up on the disk. Shoot for 240´180, 160´120, or even smaller.
Frame rate: The fewer frames per second, the less disk space the file takes; but the lower the frame rate, the jerkier the action. Frame rate tends to be one of the more important factors for good video, so when you have a choice, try to save space in areas other than the frame rate. 15fps is considered an excellent rate for digital video, but you can get down to 10fps before things start looking really bad.
Color depth: Just as with images, the fewer colors in the movie, the smaller the file size.
Audio soundtrack: All the hints that I mentioned in the previous section apply here. Or, avoid having a soundtrack altogether if you can.
Compression algorithm: Some codecs are better than others for different kinds of video. Codecs that use frame differencing, for example, are better for movies in which the background doesn't change overly much. Most software programs let you play with different codecs and different key frames, so try several experiments to see what kind of file sizes you can get.

Of course, file size isn't the only consideration. Picture quality and speed of playback are both crucial factors that can affect some or all of these compromises. You might be willing to give up picture quality for smooth playback, or give up color for having audio as well as video.

In terms of actually producing the video, there are several hints for improving picture and sound quality and keeping the file sizes small so they can be more easily transferred over the Web:

Record direct from a camera to the capture card instead of recording from tape. If you must use tape, use the best quality tape you can find.
If you can get S-video equipment, use it.
Record the audio track separately, using the hints in the audio section of this chapter, and then add it later using a video processing program.
As with audio, capture the video at the highest possible quality, and then use software to shrink the frame size, frame rate, number of colors, and so on. The result will be better than if you sampled at the lower rate. Note that you might need a very large hard drive to store the file while you're processing it; multiple gigabyte drives are not uncommon in the video-processing world.
Do your compression last. Capture with JPEG compression if you can, at the highest quality possible. You can then compress the raw file later. Again, you'll need lots and lots of disk space for this.

For More Information

Alison Zhang's Multimedia File Formats on the Internet is an excellent resource for file formats and tools for playing both audio and video. Check it out at http://ac.dal.ca/~dong/contents.htm.

For information about audio formats, there are audio formats FAQs at the usual FAQ sites, including ftp://rtfm.mit.edu/pub/usenet/news.answers/ and ftp://ftp.uu.net/usenet/news.answers/.

Finally, for a more technical introduction to digital audio and video and aspects of both, the Desktop Multimedia Bible by Jeff Burger, Addison Wesley, is exhaustive and covers all aspects of analog and digital audio and video, as well as audio and video production.

If you're interested in learning more about digital video and video production in general, I highly recommend a book called How to Digitize Video, by Nels Johnson with Fred Gault and Mark Florence, from John Wiley & Sons. This book is an extensive reference to all aspects of digital video, contains lots of information about hardware and software solutions, and includes a CD-ROM with Mac and Windows software you can use.

If you're interested in MPEG (which isn't covered very much in the previously mentioned book), your best source for information is probably the MPEG FAQ, which you can get anywhere that archives Usenet FAQs. One source is http://www.cis.ohio-state.edu/hypertext/faq/usenet/mpeg-faq/top.html.

For more information on QuickTime, definitely check out http://quicktime.apple.com/. This site has plenty of information on QuickTime itself as well as sample movies and the terribly excellent QuickTime FAQ, and you can even order the QuickTime software online from here.

Summary

Even though most audio and video files are stored offline in external files on the Web, sound and video can provide an extra bit of "oomph" to your Web presentation, particularly if you have something interesting to be played or viewed. And with many simple low-cost audio and video sampling tools available on the market today, creating sound and video is something you can accomplish even if you don't have an enormous amount of money or a background in audio and video production.

Here's a recap of topics covered in this chapter.

For digital audio files, there is no firm cross-platform standard. Files that are au can be played on the most platforms, but the sound quality is not very good. AIFF and WAVE are about equal in terms of sound quality, but neither is well supported outside its native platform (Mac and Windows, respectively). MPEG Audio has become more popular because of the Internet Underground Music Archive, but encoding MPEG audio is expensive. Finally, RealAudio can be used to play audio on the fly as it's being downloaded but requires extra software on both the server and browser side in order to work.

For digital video, QuickTime and MPEG are the most popular formats, with QuickTime drawing a greater lead because of its wide cross-platform support and software-based players. For QuickTime files, either the CinePak or Indeo Video codecs are preferred, although CinePak is slightly more supported, particularly on UNIX players.

For both audio and video, always choose the best recording equipment you can afford and record or sample at the best rate you can. Then use editing software to reduce the picture quality and size to a point at which the file sizes are acceptable for publishing on an environment such as the Web. Always keep in mind that because sound and video files tend to be large, you should always provide a good description of the file you are linking to, including the format it is in and the file size.

Q&A

Q I want to create one of those pages that has a spy camera that takes pictures of me, or the fish tank, or the toilet, or wherever, every couple of minutes. How can I do that?

A It depends, of course, on the system that you're working on and the capabilities of that system. When you have a camera attached to your computer that can take video stills, you'll need some way to take those pictures once every few minutes. On UNIX systems you can use cron; on Macs and PCs you'll have to look into macro recorders and programs that can capture your mouse and keyboard movements (or your video software might have a timer option, although I haven't seen any that do at the moment).
Then, when you have the image file, converting it to GIF or JPEG format and moving it automatically to your Web server might not be so easy. If your Web server is on the same machine as the camera, this isn't a problem. But if you're FTPing your regular files to your Web server, you'll have to come up with some system of automatically transferring those files to the right location.

Day 5

Chapter 10

Sound and Video Files

CONTENTS