Apache offers a wide range of options for controlling the format of the log files. In line with current thinking, older methods (RefererLog, AgentLog, and CookieLog) have now been replaced by the config_log_module. To illustrate this, we have taken ... /site.authent and copied it to ... /site.logging so that we can play with the logs:
User webuser Group webgroup ServerName www.butterthlies.com IdentityCheck on NameVirtualHost 192.168.123.2 <VirtualHost www.butterthlies.com> LogFormat "customers: host %h, logname %l, user %u, time %t, request %r, status %s,bytes %b," CookieLog logs/cookies ServerAdmin sales@butterthlies.com DocumentRoot /usr/www/APACHE3/site.logging/htdocs/customers ServerName www.butterthlies.com ErrorLog /usr/www/APACHE3/site.logging/logs/customers/error_log TransferLog /usr/www/APACHE3/site.logging/logs/customers/access_log ScriptAlias /cgi_bin /usr/www/APACHE3/cgi_bin </VirtualHost> <VirtualHost sales.butterthlies.com> LogFormat "sales: agent %{httpd_user_agent}i, cookie: %{http_Cookie}i, referer: %{Referer}o, host %!200h, logname %!200l, user %u, time %t, request %r, status %s,bytes %b," CookieLog logs/cookies ServerAdmin sales_mgr@butterthlies.com DocumentRoot /usr/www/APACHE3/site.logging/htdocs/salesmen ServerName sales.butterthlies.com ErrorLog /usr/www/APACHE3/site.logging/logs/salesmen/error_log TransferLog /usr/www/APACHE3/site.logging/logs/salesmen/access_log ScriptAlias /cgi_bin /usr/www/APACHE3/cgi_bin <Directory /usr/www/APACHE3/site.logging/htdocs/salesmen> AuthType Basic AuthName darkness AuthUserFile /usr/www/APACHE3/ok_users/sales AuthGroupFile /usr/www/APACHE3/ok_users/groups require valid-user </Directory> <Directory /usr/www/APACHE3/cgi_bin> AuthType Basic AuthName darkness AuthUserFile /usr/www/APACHE3/ok_users/sales AuthGroupFile /usr/www/APACHE3/ok_users/groups #AuthDBMUserFile /usr/www/APACHE3/ok_dbm/sales #AuthDBMGroupFile /usr/www/APACHE3/ok_dbm/groups require valid-user </Directory> </VirtualHost>
There are a number of directives.
ErrorLog |
ErrorLog filename|syslog[:facility] Default: ErrorLog logs/error_log Server config, virtual host
The ErrorLog directive sets the name of the file to which the server will log any errors it encounters. If the filename does not begin with a slash (/), it is assumed to be relative to the server root.
If the filename begins with a pipe (|), it is assumed to be a command to spawn a file to handle the error log.
Using syslog instead of a filename enables logging via syslogd(8) if the system supports it. The default is to use syslog facility local7, but you can override this by using the syslog:facility syntax, where facility can be one of the names usually documented in syslog(1). Using syslog allows you to keep logs for multiple servers in a centralized location, which can be very convenient in larger installations.
Your security could be compromised if the directory where log files are stored is writable by anyone other than the user who starts the server.
TransferLog |
TransferLog [ file | "| command "] Default: none Server config, virtual host
TransferLog specifies the file in which to store the log of accesses to the site. If it is not explicitly included in the Config file, no log will be generated.
[34]Written by one of the authors of this book (BL).
AgentLog |
AgentLog file-pipe AgentLog logs/agent_log Server config, virtual host Not in Apache v2
The AgentLog directive sets the name of the file to which the server will log the User-Agent header of incoming requests. file-pipe is one of the following:
A filename A filename relative to the ServerRoot. "| <command>"
This is a program to receive the agent log information on its standard input. Note that a new program will not be started for a VirtualHost if it inherits the AgentLog from the main server.
WARNING: If a program is used, then it will be run under the user who started httpd. This will be root if the server was started by root; be sure that the program is secure.
Also, see the Apache security tips document discussed in Chapter 11 for details on why your security could be compromised if the directory where log files are stored is writable by anyone other than the user that starts the server.
This directive is provided for compatibility with NCSA 1.4.
LogLevel |
LogLevel level Default: error Server config, virtual host
LogLevel controls the amount of information recorded in the error_log file. The levels are as follows:
"Child cannot open lock file. Exiting"
"getpwuid: couldn't determine user name from uid"
"socket: Failed to get a socket, exiting child"
"Premature end of script headers"
"child process 1234 did not exit, sending another SIGHUP"
"httpd: caught SIGBUS, attempting to dump core in ..."
"Server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers)..."
Each level will report errors that would have been printed by higher levels. Use debug for development, then switch to, say, crit for production. Remember that if each visitor on a busy site generates one line in the error_log, the hard disk will soon fill up and stop the system.
LogFormat |
LogFormat format_string [nickname] Default: "%h %l %u %t \"%r\" %s %b" Server config, virtual host
LogFormat sets the information to be included in the log file and the way in which it is written. The default format is the Common Log Format (CLF), which is expected by off-the-shelf log analyzers such as wusage (http://www.boutell.com/) or ANALOG, so if you want to use one of them, leave this directive alone.[35] The CLF format is as follows:
[35]Actually, some log analyzers support some extra information in the log file, but you need to read the analyzer's documentation for details.
host ident authuser date request status bytes
[day/month/year:hour:minute:second tzoffset].
The log format can be customized using a format_string. The commands in it have the format %[condition]key_letter ; the condition need not be present. If it is and the specified condition is not met, the output will be a -. The key_letter s are as follows:
%...a: Remote IP-address %...A: Local IP-address %...B: Bytes sent, excluding HTTP headers. %...b: Bytes sent, excluding HTTP headers. In CLF format i.e. a '-' rather than a 0 when no bytes are sent. %...{Foobar}C: The contents of cookie "Foobar" in the request sent to the server. %...D: The time taken to serve the request, in microseconds. %...{FOOBAR}e: The contents of the environment variable FOOBAR %...f: Filename %...h: Remote host %...H The request protocol %...{Foobar}i: The contents of Foobar: header line(s) in the request sent to the server. %...l: Remote logname (from identd, if supplied) %...m The request method %...{Foobar}n: The contents of note "Foobar" from another module. %...{Foobar}o: The contents of Foobar: header line(s) in the reply. %...p: The canonical Port of the server serving the request %...P: The process ID of the child that serviced the request. %...q The query string (prepended with a ? if a query string exists, otherwise an empty string) %...r: First line of request %...s: Status. For requests that got internally redirected, this is the status of the *original* request --- %...>s for the last. %...t: Time, in common log format time format (standard english format) %... {format}t: The time, in the form given by format, which should be in strftime(3) format. (potentially localized) %...T: The time taken to serve the request, in seconds. %...u: Remote user (from auth; may be bogus if return status (%s) is 401) %...U: The URL path requested, not including any query string. %...v: The canonical ServerName of the server serving the request. %...V: The server name according to the UseCanonicalName setting. %...X: Connection status when response is completed. 'X' = connection aborted before the response completed. '+' = connection may be kept alive after the response is sent. '-' = connection will be closed after the response is sent. (This directive was %...c in late versions of Apache 1.3, but this conflicted with the historical ssl %...{var}c syntax.)
The format string can contain ordinary text of your choice in addition to the % directives.
CustomLog |
CustomLog file|pipe format|nickname Server config, virtual host
The first argument is the filename to which log records should be written. This is used exactly like the argument to TransferLog; that is, it is either a full path, relative to the current server root, or a pipe to a program.
The format argument specifies a format for each line of the log file. The options available for the format are exactly the same as those for the argument of the LogFormat directive. If the format includes any spaces (which it will in almost all cases), it should be enclosed in double quotes.
Instead of an actual format string, you can use a format nickname defined with the LogFormat directive.
site.authent is set up with two virtual hosts, one for customers and one for salespeople, and each has its own logs in ... /logs/customers and ... /logs/salesmen. We can follow that scheme and apply one LogFormat to both, or each can have its own logs with its own LogFormat s inside the <VirtualHost> directives. They can also have common log files, set up by moving ErrorLog and TransferLog outside the <VirtualHost> sections, with different LogFormat s within the sections to distinguish the entries. In this last case, the LogFormat files could look like this:
<VirtualHost www.butterthlies.com> LogFormat "Customer:..." ... </VirtualHost> <VirtualHost sales.butterthlies.com> LogFormat "Sales:..." ... </VirtualHost>
Let's experiment with a format for customers, leaving everything else the same:
<VirtualHost www.butterthlies.com> LogFormat "customers: host %h, logname %l, user %u, time %t, request %r status %s, bytes %b," ...
We have inserted the words host, logname, and so on to make it clear in the file what is doing what. In real life you probably wouldn't want to clutter the file up in this way because you would look at it regularly and remember what was what or, more likely, process the logs with a program that would know the format. Logging on to www.butterthlies.com and going to summer catalog produces this log file:
customers: host 192.168.123.1, logname unknown, user -, time [07/Nov/ 1996:14:28:46 +0000], request GET / HTTP/1.0, status 200,bytes - customers: host 192.168.123.1, logname unknown, user -, time [07/Nov/ 1996:14:28:49 +0000], request GET /hen.jpg HTTP/1.0, status 200, bytes 12291, customers: host 192.168.123.1, logname unknown, user -, time [07/Nov /1996:14:29:04 +0000], request GET /tree.jpg HTTP/1.0, status 200, bytes 11532, customers: host 192.168.123.1, logname unknown, user -, time [07/Nov/ 1996:14:29:19 +0000], request GET /bath.jpg HTTP/1.0, status 200, bytes 5880,
This is not too difficult to follow. Notice that while we have logname unknown, the user is -, the usual report for an unknown value. This is because customers do not have to give an ID; the same log for salespeople, who do, would have a value here.
We can improve things by inserting lists of conditions based on the error codes after the % and before the command letter. The error codes are defined in the HTTP 1.0 specification:
200 OK 302 Found 304 Not Modified 400 Bad Request 401 Unauthorized 403 Forbidden 404 Not found 500 Server error 503 Out of resources 501 Not Implemented 502 Bad Gateway
The list from HTTP 1.1 is as follows:
100 Continue 101 Switching Protocols 200 OK 201 Created 202 Accepted 203 Non-Authoritative Information 204 No Content 205 Reset Content 206 Partial Content 300 Multiple Choices 301 Moved Permanently 302 Moved Temporarily 303 See Other 304 Not Modified 305 Use Proxy 400 Bad Request 401 Unauthorized 402 Payment Required 403 Forbidden 404 Not Found 405 Method Not Allowed 406 Not Acceptable 407 Proxy Authentication Required 408 Request Time-out 409 Conflict 410 Gone 411 Length Required 412 Precondition Failed 413 Request Entity Too Large 414 Request-URI Too Large 415 Unsupported Media Type 500 Internal Server Error 501 Not Implemented 502 Bad Gateway 503 Service Unavailable 504 Gateway Time-out 505 HTTP Version not supported
You can use ! before a code to mean "if not." !200 means "log this if the response was not OK." Let's put this in salesmen:
<VirtualHost sales.butterthlies.com> LogFormat "sales: host %!200h, logname %!200l, user %u, time %t, request %r, status %s,bytes %b," ...
An attempt to log in as fred with the password don't know produces the following entry:
sales: host 192.168.123.1, logname unknown, user fred, time [19/Aug/ 1996:07:58:04 +0000], request GET HTTP/1.0, status 401, bytes -
However, if it had been the infamous bill with the password theft, we would see:
host -, logname -, user bill, ...
because we asked for host and logname to be logged only if the request was not OK. We can combine more than one condition, so that if we only want to know about security problems on sales, we could log usernames only if they failed to authenticate:
LogFormat "sales: bad user: %400,401,403u"
We can also extract data from the HTTP headers in both directions:
%[condition]{user-agent}i
This prints the user agent (i.e., the software the client is running) if condition is met. The old way of doing this was AgentLog logfile and ReferLog logfile.
Copyright © 2003 O'Reilly & Associates. All rights reserved.