Apache The Definitive Guide, 3rd EditionApache: The Definitive GuideSearch this book

8.2. Rewrite

The preceding section described the Alias module and its allies. Everything these directives can do, and more, can be done instead by mod_rewrite.c, an extremely compendious module that is almost a complete software product in its own right. But for simple tasks Alias and friends are much easier to use.

The documentation is thorough, and the reader is referred to http://www.engelschall.com/pw/apache/rewriteguide/ for any serious work. You should also look at http://www.apache.org/docs/mod/mod_rewrite.html. This section is intended for orientation only.

Rewrite takes a rewriting pattern and applies it to the URL. If it matches, a rewriting substitution is applied to the URL. The patterns are regular expressions familiar to us all in their simplest form — for example, mod.*\.c, which matches any module filename. The complete science of regular expressions is somewhat extensive, and the reader is referred to ... /src/regex/regex.7, a manpage that can be read with nroff -man regex.7 (on FreeBSD, at least). Regular expressions are also described in the POSIX specification and in Jeffrey Friedl's Mastering Regular Expressions (O'Reilly, 2002).

It might well be worth using Perl to practice with regular expressions before using them in earnest. To make complicated expressions work, it is almost essential to build them up from simple ones, testing each change as you go. Even the most expert find that convoluted regular expressions often do not work the first time.

The essence of regular expressions is that a number of special characters can be used to match parts of incoming URLs. The substitutions available in mod_rewrite can include mapping functions that take bits of the incoming URL and look them up in databases or even apply programs to them. The rules can be applied repetitively and recursively to the evolving URL. It is possible (as the documentation says) to create "rewriting loops, rewriting breaks, chained rules, pseudo if-then-else constructs, forced redirects, forced MIME-types, forced proxy module throughout." The functionality is so extensive that it is probably impossible to master it in the abstract. When and if you have a problem of this sort, it looks as if mod_rewrite can solve it, given enough intellectual horsepower on your part!

The module can be used in four situations:

The directives look simple enough.

RewriteEngine

RewriteEngine on_or_off
Server config, virtual host, directory

Enables or disables the rewriting engine. If off, no rewriting is done at all. Use this directive to switch off functionality rather than commenting out Rewrite-Rule lines.

RewriteLog


RewriteLog filename
Server config, virtual host

Sends logging to the specified filename. If the name does not begin with a slash, it is taken to be relative to the server root. This directive should appear only once in a Config file.

RewriteLogLevel

RewriteLogLevel number
Default number: 0
Server config, virtual host

Controls the verbosity of the logging: 0 means no logging, and 9 means that almost every action is logged. Note that any number above 2 slows Apache down.

RewriteMap

RewriteMap mapname {txt,dbm,prg,rnd,int}: filename
Server config, virtual host

Defines an external mapname file that inserts substitution strings through key lookup.Keys may be stored in a variety of formats, described as follows. The module passes mapname a query in the form:

$(mapname : Lookupkey | DefaultValue) 

If the Lookupkey value is not found, DefaultValue is returned.

The type of mapname must be specified by the next argument:

txt
Indicates plain-text format — that is, an ASCII file with blank lines, comments that begin with #, or useful lines, in the format:

MatchingKey
SubstituteValue
dbm
Indicates DBM hashfile format — that is, a binary NDBM (the "new" dbm interface, now about 15 years old, also used for dbm auth) file containing the same material as the plain-text format file. You create it with any ndbm tool or by using the Perl script dbmmanage from the support directory of the Apache distribution.

prg
Indicates program format — that is, an executable (a compiled program or a CGI script) that is started by Apache. At each lookup, it is passed the key as a string terminated by newline on stdin and returns the substitution value, or the word NULL if lookup fails, in the same way on stdout. The manual gives two warnings:

  • Keep the program or script simple because if it hangs, it hangs the Apache server.

  • Don't use buffered I/O on stdout because it causes a deadlock. In C, use:

    setbuf(stdout,NULL)

    In Perl, use:

    select(STDOUT); $|=1;]

rnd
Indicates randomized plain text, which is similar to the standard plain-text variant but has a special postprocessing feature: after looking up a value, it is parsed according to contained "|" characters that have the meaning of "or". In other words, they indicate a set of alternatives from which the actual returned value is chosen randomly. Although this sounds crazy and useless, it was actually designed for load balancing in a reverse-proxy situation, in which the looked-up values are server names — each request to a reverse proxy is routed to a randomly selected server behind it. See also Section 12.6 in Chapter 12.

int
Indicates an internal Apache function. Two functions exist: toupper( ) and tolower( ), which convert the looked-up key to all upper- or all lowercase.

RewriteBase

RewriteBase BaseURL
directory, .htaccess

The effects of this command can be fairly easily achieved by using the rewrite rules, but it may sometimes be simpler to encapsulate the process. It explicitly sets the base URL for per-directory rewrites. If RewriteRule is used in an .htaccess file, it is passed a URL that has had the local directory stripped off so that the rules act only on the remainder. When the substitution is finished, RewriteBase supplies the necessary prefix. To quote the manual's example in .htaccess:

Alias /xyz /abc/def"
RewriteBase   /xyz
RewriteRule   ^oldstuff\.html$  newstuff.html

In this example, a request to /xyz/oldstuff.html gets rewritten to the physical file /abc/def/newstuff.html. Internally, the following happens:

Request
/xyz/oldstuff.html

Internal processing
/xyz/oldstuff.html     -> /abc/def/oldstuff.html  (per-server Alias)
/abc/def/oldstuff.html -> /abc/def/newstuff.html  (per-dir    RewriteRule)
/abc/def/newstuff.html -> /xyz/newstuff.html      (per-dir    RewriteBase)
/xyz/newstuff.html     -> /abc/def/newstuff.html  (per-server Alias)
Result
/abc/def/newstuff.html

RewriteCond

RewriteCond TestString CondPattern
Server config, virtual host, directory

One or more RewriteCond directives can precede a RewriteRule directive to define conditions under which it is to be applied. CondPattern is a regular expression matched against the value retrieved for TestString, which contains server variables of the form %{NAME_OF_VARIABLE}, where NAME_OF_VARIABLE can be one of the following list:

API_VERSION

PATH_INFO

SERVER_PROTOCOL

AUTH_TYPE

QUERY_STRING

SERVER_SOFTWARE

DOCUMENT_ROOT

REMOTE_ADDR

THE_REQUEST

ENV:any_environment_variable

REMOTE_HOST

TIME

HTTP_ACCEPT

REMOTE_USER

TIME_DAY

HTTP_COOKIE

REMOTE_IDENT

TIME_HOUR

HTTP_FORWARDED

REQUEST_FILENAME

TIME_MIN

HTTP_HOST

REQUEST_METHOD

TIME_MON

HTTP_PROXY_CONNECTION

REQUEST_URI

TIME_SEC

HTTP_REFERER

SCRIPT_FILENAME

TIME_WDAY

HTTP_USER_AGENT

SERVER_ADMIN

TIME_YEAR

HTTP:any_HTTP_header

SERVER_NAME

 

IS_SUBREQ

SERVER_PORT

 
These variables all correspond to the similarly named HTTP MIME headers, C variables of the Apache server, or the current time. If the regular expression does not match, the RewriteRule following it does not apply.

RewriteLock

RewriteLock Filename
Server config

This directive sets the filename for a synchronization lockfile, which mod_rewrite needs to communicate with RewriteMap programs. Set this lockfile to a local path (not on a NFS-mounted device) when you want to use a rewriting map program. It is not required for other types of rewriting maps.

RewriteOptions

RewriteOptions Option
Default: None
Server config, virtual host, directory, .htaccess

The RewriteOptions directive sets some special options for the current per-server or per-directory configuration. Currently, there is only one Option:

inherit

This forces the current configuration to inherit the configuration of the parent. In per-virtual-server context this means that the maps, conditions, and rules of the main server are inherited. In per-directory context this means that conditions and rules of the parent directory's .htaccess configuration are inherited.

RewriteRule

RewriteRule Pattern Substitution [flags]
Server config, virtual host, directory

This directive can be used as many times as necessary. Each occurrence applies the rule to the output of the preceding one, so the order matters. Pattern is matched to the incoming URL; if it succeeds, the Substitution is made. An optional argument, flags, can be given. The flags, which follow, can be abbreviated to one or two letters:

redirect|R
Force redirect.

proxy|P
Force proxy.

last|L
Last rule — go to top of rule with current URL.

chain|C
Apply following chained rule if this rule matches.

type|T=mime-type
Force target file to be mime-type.

nosubreq|NS
Skip rule if it is an internal subrequest.

env|E=VAR:VAL
Set an environment variable.

qsappend|QSA
Append a query string.

passthrough|PT
Pass through to next handler.

skip|S=num
Skip the next num rules.

next|N
Next round — start at the top of the rules again.

gone|G
Returns HTTP response 410 — "URL Gone."

forbidden|F
Returns HTTP response 403 — "URL Forbidden."

nocase|NC
Makes the comparison case insensitive.

For example, say we want to rewrite URLs of the form:

/Language/~Realname/.../File 

into:

/u/Username/.../File.Language 

We take the rewrite map file and save it under /anywhere/map.real-to-user. Then we only have to add the following lines to the Apache server Config file:

RewriteLog   /anywhere/rewrite.log 
RewriteMap   real-to-user  txt:/anywhere/map.real-to-host 
RewriteRule  ^/([^/]+)/~([^/]+)/(.*)$   /u/${real-to-user:$2|nobody}/$3.$1

8.2.1. A Rewrite Example

The Butterthlies salespeople seem to be taking their jobs more seriously. Our range has increased so much that the old catalog based around a single HTML document is no longer workable because there are too many cards. We have built a database of cards and a utility called cardinfo that accesses it using the arguments:

cardinfo cardid query

where cardid is the number of the card and query is one of the following words: "price," "artist," or "size." The problem is that the salespeople are too busy to remember the syntax, so we want to let them log on to the card database as if it were a web site. For instance, going to http://sales.butterthlies.com/info/2949/price would return the price of card number 2949. The Config file is in ... /site.rewrite :

User webuser
Group webgroup
# Apache requires this server name, although in this case it will 
# never be used.
# This is used as the default for any server that does not match a
# VirtualHost section.
ServerName www.butterthlies.com

NameVirtualHost 192.168.123.2

<VirtualHost www.butterthlies.com>
ServerAdmin sales@butterthlies.com
DocumentRoot /usr/www/APACHE3/site.rewrite/htdocs/customers
ServerName www.butterthlies.com
ErrorLog /usr/www/APACHE3/site.rewrite/logs/customers/error_log
TransferLog /usr/www/APACHE3/site.rewrite/logs/customers/access_log
</VirtualHost>

<VirtualHost sales.butterthlies.com>
ServerAdmin sales_mgr@butterthlies.com
DocumentRoot /usr/www/APACHE3/site.rewrite/htdocs/salesmen
Options ExecCGI indexes
ServerName sales.butterthlies.com
ErrorLog /usr/www/APACHE3/site.rewrite/logs/salesmen/error_log
TransferLog /usr/www/APACHE3/site.rewrite/logs/salesmen/access_log
RewriteEngine on
RewriteLog logs/rewrite
RewriteLogLevel 9
RewriteRule ^/info/([^/]+)/([^/]+)$   /cgi-bin/cardinfo?$2+$1 [PT]
ScriptAlias /cgi-bin /usr/www/APACHE3/cgi-bin
</VirtualHost>

In real life cardinfo would be an elaborate program. However, here we just have to show that it could work, so it is extremely simple:

#!/bin/sh
#
echo "content-type: text/html"
echo sales.butterthlies.com
echo "You made the query $1 on the card $2"

To make sure everything is in order before we do it for real, we turn RewriteEngine off and access http://sales.butterthlies.com/cgi-bin/cardinfo. We get back the following message:

The requested URL /info/2949/price was not found on this server.

This is not surprising. We now stop Apache, turn RewriteEngine on and restart with ./go. Look at the crucial line in the Config file:

RewriteRule ^/info/([^/]+)/([^/]+)$ /cgi-bin/cardinfo?$2+$1 [PT]

Translated into English, this means the following: at the start of the string, match /info/, followed by one or more characters that aren't /, and put those characters into the variable $1 (the parentheses do this; $1 because they are the first set). Then match a /, then one or more characters aren't /, and put those characters into $2. Then match the end of the string, and pass the result through [PT] to the next rule, which is ScriptAlias. We end up as if we had accessed http://sales.butterthlies.com/cgi-bin/cardinfo?<card ID>+<query>.

If the CGI script is on a different web server for some reason, we could write:

RewriteRule ^/info/([^/]+)/([^/]+)$ http://somewhere.else.com/cgi-bin/
    cardinfo?$2+$1 [PT]

Note that this pattern won't match /info/123/price/fred because it has too many slashes in it.

If we run all this with ./go and access http://sales.butterthlies.com/info/2949/price from the client, we see the following message:

You made the query price on card 2949


Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.