Apache The Definitive Guide, 3rd EditionApache: The Definitive GuideSearch this book

Chapter 3. Toward a Real Web Site

Contents:

More and Better Web Sites: site.simple
Butterthlies, Inc., Gets Going
Block Directives
Other Directives
HTTP Response Headers
Restarts
.htaccess
CERN Metafiles
Expirations

Now that we have the server running with a basic configuration, we can start to explore more sophisticated possibilities in greater detail. Fortunately, the differences between the Windows and Unix versions of Apache fade as we get past the initial setup and configuration, so it's easier to focus on the details of making a web site work.

3.1. More and Better Web Sites: site.simple

We are now in a position to start creating real(ish) web sites, which can be found in the sample code at the web site for the book, http://oreilly.com/catalog/apache3/. For the sake of a little extra realism, we will base the site loosely round a simple web business, Butterthlies, Inc., that creates and sells picture postcards. We need to give it some web addresses, but since we don't yet want to venture into the outside world, they should be variants on your own network ID. This way, all the machines in the network realize that they don't have to go out on the Web to make contact. For instance, we edited the \windows\hosts file on the Windows 95 machine running the browser and the /etc/hosts file on the Unix machine running the server to read as follows:

127.0.0.1 localhost
192.168.123.2 www.butterthlies.com
192.168.123.2 sales.butterthlies.com
192.168.123.3 sales-IP.butterthlies.com
192.168.124.1 www.faraway.com

localhost is obligatory, so we left it in, but you should not make any server requests to it since the results are likely to be confusing.

You probably need to consult your network manager to make similar arrangements.

site.simple is site.toddle with a few small changes. The script go will work anywhere. To get started, do the following, depending on your operating environment:

Figure

test -d logs || mkdir logs
httpd -d 'pwd' -f 'pwd'/conf/httpd.conf

Figure

Open an MS-DOS window and from the command line, type:

c>cd \program files\apache group\apache
c>apache -k start
c>Apache/1.3.26 (Win32) running ... 

Figure

To stop Apache, open a second MS-DOS window:

c>apache -k stop 
c>cd logs 
c>edit error.log 

Figure

This will be true of each site in the demonstration setup, so we will not mention it again.

From here on, there will be minimal differences between the server setups necessary for Win32 and those for Unix. Unless one or the other is specifically mentioned, you should assume that the text refers to both.

It would be nice to have a log of what goes on. In the first edition of this book, we found that a file access_log was created automatically in ...site.simple/logs. In a rather bizarre move since then, the Apache Group has broken backward compatibility and now requires you to mention the log file explicitly in the Config file using the TransferLog directive.

The ... /conf/httpd.conf file now contains the following:

User webuser
Group webgroup

ServerName www.butterthlies.com

DocumentRoot /usr/www/APACHE3/APACHE3/site.simple/htdocs

TransferLog logs/access_log

In ... /htdocs we have, as before, 1.txt :

hullo world from site.simple again!

Type ./go on the server. Become the client, and retrieve http://www.butterthlies.com. You should see:

Index of /
. Parent Directory
. 1.txt

Click on 1.txt for an inspirational message as before.

This all seems satisfactory, but there is a hidden mystery. We get the same result if we connect to http://sales.butterthlies.com. Why is this? Why, since we have not mentioned either of these URLs or their IP addresses in the configuration file on site.simple, do we get any response at all?

The answer is that when we configured the machine on which the server runs, we told the network interface to respond to anyof these IP addresses:

192.168.123.2
192.168.123.3

By default Apache listens to all IP addresses belonging to the machine and responds in the same way to all of them. If there are virtual hosts configured (which there aren't, in this case), Apache runs through them, looking for an IP name that corresponds to the incoming connection. Apache uses that configuration if it is found, or the main configuration if it is not. Later in this chapter, we look at more definite control with the directives BindAddress, Listen, and <VirtualHost>.

It has to be said that working like this (that is, switching rapidly between different configurations) seemed to get Netscape or Internet Explorer into a rare muddle. To be sure that the server was functioning properly while using Netscape as a browser, it was usually necessary to reload the file under examination by holding down the Control key while clicking on Reload. In extreme cases, it was necessary to disable caching by going to Edit Figure Preferences Figure Advanced Figure Cache. Set memory and disk cache to 0, and set cache comparison to Every Time. In Internet Explorer, set Cache Compares to Every Time. If you don't, the browser tends to display a jumble of several different responses from the server. This occurs because we are doing what no user or administrator would normally do, namely, flipping around between different versions of the same site with different versions of the same file. Whenever we flip from a newer version to an older version, Netscape is led to believe that its cached version is up-to-date.

Back on the server, stop Apache with ^C, and look at the log files. In ... /logs/access_log, you should see something like this:

192.168.123.1--- [<date-time>] "GET / HTTP/1.1" 200 177

200 is the response code (meaning "OK, cool, fine"), and 177 is the number of bytes transferred. In ... /logs/error_log, there should be nothing because nothing went wrong. However, it is a good habit to look there from time to time, though you have to make sure that the date and time logged correspond to the problem you are investigating. It is easy to fool yourself with some long-gone drama.

Life being what it is, things can go wrong, and the client can ask for something the server can't provide. It makes sense to allow for this with the ErrorDocument command.

3.1.1. ErrorDocument

The ErrorDocument directive lets you specify what happens when a client asks for a nonexistent document.

ErrorDocument error-code "document(" in Apache v2)
Server config, virtual host, directory, .htaccess

In the event of a problem or error, Apache can be configured to do one of four things:

  1. Output a simple hardcoded error message.

  2. Output a customized message.

  3. Redirect to a local URL to handle the problem/error.

  4. Redirect to an external URL to handle the problem/error.

The first option is the default, whereas options 2 through 4 are configured using the ErrorDocument directive, which is followed by the HTTP response code and a message or URL. Messages in this context begin with a double quotation mark ("), which does not form part of the message itself. Apache will sometimes offer additional information regarding the problem or error.

URLs can be local URLs beginning with a slash (/ ) or full URLs that the client can resolve. For example:

ErrorDocument 500 http://foo.example.com/cgi-bin/tester
ErrorDocument 404 /cgi-bin/bad_urls.pl
ErrorDocument 401 /subscription_info.html
ErrorDocument 403 "Sorry can't allow you access today" 

Note that when you specify an ErrorDocument that points to a remote URL (i.e., anything with a method such as "http" in front of it), Apache will send a redirect to the client to tell it where to find the document, even if the document ends up being on the same server. This has several implications, the most important being that if you use an ErrorDocument 401 directive, it must refer to a local document. This results from the nature of the HTTP basic authentication scheme.



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.