The preceding section described the Alias module and its allies. Everything these directives can do, and more, can be done instead by mod_rewrite.c, an extremely compendious module that is almost a complete software product in its own right. But for simple tasks Alias and friends are much easier to use.
The documentation is thorough, and the reader is referred to http://www.engelschall.com/pw/apache/rewriteguide/ for any serious work. You should also look at http://www.apache.org/docs/mod/mod_rewrite.html. This section is intended for orientation only.
Rewrite takes a rewriting pattern and applies it to the URL. If it matches, a rewriting substitution is applied to the URL. The patterns are regular expressions familiar to us all in their simplest form — for example, mod.*\.c, which matches any module filename. The complete science of regular expressions is somewhat extensive, and the reader is referred to ... /src/regex/regex.7, a manpage that can be read with nroff -man regex.7 (on FreeBSD, at least). Regular expressions are also described in the POSIX specification and in Jeffrey Friedl's Mastering Regular Expressions (O'Reilly, 2002).
It might well be worth using Perl to practice with regular expressions before using them in earnest. To make complicated expressions work, it is almost essential to build them up from simple ones, testing each change as you go. Even the most expert find that convoluted regular expressions often do not work the first time.
The essence of regular expressions is that a number of special characters can be used to match parts of incoming URLs. The substitutions available in mod_rewrite can include mapping functions that take bits of the incoming URL and look them up in databases or even apply programs to them. The rules can be applied repetitively and recursively to the evolving URL. It is possible (as the documentation says) to create "rewriting loops, rewriting breaks, chained rules, pseudo if-then-else constructs, forced redirects, forced MIME-types, forced proxy module throughout." The functionality is so extensive that it is probably impossible to master it in the abstract. When and if you have a problem of this sort, it looks as if mod_rewrite can solve it, given enough intellectual horsepower on your part!
The module can be used in four situations:
By the administrator inside the server Config file to apply in all contexts. The rules are applied to all URLs of the main server and all URLs of the virtual servers.
By the administrator inside <VirtualHost> blocks. The rules are applied only to the URLs of the virtual server.
By the administrator inside <Directory> blocks. The rules are applied only to the specified directory.
By users in their .htaccess files. The rules are applied only to the specified directory.
The directives look simple enough.
RewriteEngine |
RewriteEngine on_or_off Server config, virtual host, directory
Enables or disables the rewriting engine. If off, no rewriting is done at all. Use this directive to switch off functionality rather than commenting out Rewrite-Rule lines.
RewriteLog |
RewriteLog filename Server config, virtual host
Sends logging to the specified filename. If the name does not begin with a slash, it is taken to be relative to the server root. This directive should appear only once in a Config file.
RewriteLogLevel |
RewriteLogLevel number Default number: 0 Server config, virtual host
Controls the verbosity of the logging: 0 means no logging, and 9 means that almost every action is logged. Note that any number above 2 slows Apache down.
RewriteMap |
RewriteMap mapname {txt,dbm,prg,rnd,int}: filename Server config, virtual host
Defines an external mapname file that inserts substitution strings through key lookup.Keys may be stored in a variety of formats, described as follows. The module passes mapname a query in the form:
$(mapname : Lookupkey | DefaultValue)
If the Lookupkey value is not found, DefaultValue is returned.
The type of mapname must be specified by the next argument:
MatchingKey SubstituteValue
Keep the program or script simple because if it hangs, it hangs the Apache server.
Don't use buffered I/O on stdout because it causes a deadlock. In C, use:
setbuf(stdout,NULL)
In Perl, use:
select(STDOUT); $|=1;]
RewriteBase |
RewriteBase BaseURL directory, .htaccess
The effects of this command can be fairly easily achieved by using the rewrite rules, but it may sometimes be simpler to encapsulate the process. It explicitly sets the base URL for per-directory rewrites. If RewriteRule is used in an .htaccess file, it is passed a URL that has had the local directory stripped off so that the rules act only on the remainder. When the substitution is finished, RewriteBase supplies the necessary prefix. To quote the manual's example in .htaccess:
Alias /xyz /abc/def" RewriteBase /xyz RewriteRule ^oldstuff\.html$ newstuff.html
In this example, a request to /xyz/oldstuff.html gets rewritten to the physical file /abc/def/newstuff.html. Internally, the following happens:
/xyz/oldstuff.html -> /abc/def/oldstuff.html (per-server Alias) /abc/def/oldstuff.html -> /abc/def/newstuff.html (per-dir RewriteRule) /abc/def/newstuff.html -> /xyz/newstuff.html (per-dir RewriteBase) /xyz/newstuff.html -> /abc/def/newstuff.html (per-server Alias)
RewriteCond |
RewriteCond TestString CondPattern Server config, virtual host, directory
One or more RewriteCond directives can precede a RewriteRule directive to define conditions under which it is to be applied. CondPattern is a regular expression matched against the value retrieved for TestString, which contains server variables of the form %{NAME_OF_VARIABLE}, where NAME_OF_VARIABLE can be one of the following list:
API_VERSION |
PATH_INFO |
SERVER_PROTOCOL |
AUTH_TYPE |
QUERY_STRING |
SERVER_SOFTWARE |
DOCUMENT_ROOT |
REMOTE_ADDR |
THE_REQUEST |
ENV:any_environment_variable |
REMOTE_HOST |
TIME |
HTTP_ACCEPT |
REMOTE_USER |
TIME_DAY |
HTTP_COOKIE |
REMOTE_IDENT |
TIME_HOUR |
HTTP_FORWARDED |
REQUEST_FILENAME |
TIME_MIN |
HTTP_HOST |
REQUEST_METHOD |
TIME_MON |
HTTP_PROXY_CONNECTION |
REQUEST_URI |
TIME_SEC |
HTTP_REFERER |
SCRIPT_FILENAME |
TIME_WDAY |
HTTP_USER_AGENT |
SERVER_ADMIN |
TIME_YEAR |
HTTP:any_HTTP_header |
SERVER_NAME |
|
IS_SUBREQ |
SERVER_PORT |
RewriteLock |
RewriteLock Filename Server config
This directive sets the filename for a synchronization lockfile, which mod_rewrite needs to communicate with RewriteMap programs. Set this lockfile to a local path (not on a NFS-mounted device) when you want to use a rewriting map program. It is not required for other types of rewriting maps.
RewriteOptions |
RewriteOptions Option Default: None Server config, virtual host, directory, .htaccess
The RewriteOptions directive sets some special options for the current per-server or per-directory configuration. Currently, there is only one Option:
inherit
This forces the current configuration to inherit the configuration of the parent. In per-virtual-server context this means that the maps, conditions, and rules of the main server are inherited. In per-directory context this means that conditions and rules of the parent directory's .htaccess configuration are inherited.
RewriteRule |
RewriteRule Pattern Substitution [flags] Server config, virtual host, directory
This directive can be used as many times as necessary. Each occurrence applies the rule to the output of the preceding one, so the order matters. Pattern is matched to the incoming URL; if it succeeds, the Substitution is made. An optional argument, flags, can be given. The flags, which follow, can be abbreviated to one or two letters:
For example, say we want to rewrite URLs of the form:
/Language/~Realname/.../File
into:
/u/Username/.../File.Language
We take the rewrite map file and save it under /anywhere/map.real-to-user. Then we only have to add the following lines to the Apache server Config file:
RewriteLog /anywhere/rewrite.log RewriteMap real-to-user txt:/anywhere/map.real-to-host RewriteRule ^/([^/]+)/~([^/]+)/(.*)$ /u/${real-to-user:$2|nobody}/$3.$1
The Butterthlies salespeople seem to be taking their jobs more seriously. Our range has increased so much that the old catalog based around a single HTML document is no longer workable because there are too many cards. We have built a database of cards and a utility called cardinfo that accesses it using the arguments:
cardinfo cardid query
where cardid is the number of the card and query is one of the following words: "price," "artist," or "size." The problem is that the salespeople are too busy to remember the syntax, so we want to let them log on to the card database as if it were a web site. For instance, going to http://sales.butterthlies.com/info/2949/price would return the price of card number 2949. The Config file is in ... /site.rewrite :
User webuser Group webgroup # Apache requires this server name, although in this case it will # never be used. # This is used as the default for any server that does not match a # VirtualHost section. ServerName www.butterthlies.com NameVirtualHost 192.168.123.2 <VirtualHost www.butterthlies.com> ServerAdmin sales@butterthlies.com DocumentRoot /usr/www/APACHE3/site.rewrite/htdocs/customers ServerName www.butterthlies.com ErrorLog /usr/www/APACHE3/site.rewrite/logs/customers/error_log TransferLog /usr/www/APACHE3/site.rewrite/logs/customers/access_log </VirtualHost> <VirtualHost sales.butterthlies.com> ServerAdmin sales_mgr@butterthlies.com DocumentRoot /usr/www/APACHE3/site.rewrite/htdocs/salesmen Options ExecCGI indexes ServerName sales.butterthlies.com ErrorLog /usr/www/APACHE3/site.rewrite/logs/salesmen/error_log TransferLog /usr/www/APACHE3/site.rewrite/logs/salesmen/access_log RewriteEngine on RewriteLog logs/rewrite RewriteLogLevel 9 RewriteRule ^/info/([^/]+)/([^/]+)$ /cgi-bin/cardinfo?$2+$1 [PT] ScriptAlias /cgi-bin /usr/www/APACHE3/cgi-bin </VirtualHost>
In real life cardinfo would be an elaborate program. However, here we just have to show that it could work, so it is extremely simple:
#!/bin/sh # echo "content-type: text/html" echo sales.butterthlies.com echo "You made the query $1 on the card $2"
To make sure everything is in order before we do it for real, we turn RewriteEngine off and access http://sales.butterthlies.com/cgi-bin/cardinfo. We get back the following message:
The requested URL /info/2949/price was not found on this server.
This is not surprising. We now stop Apache, turn RewriteEngine on and restart with ./go. Look at the crucial line in the Config file:
RewriteRule ^/info/([^/]+)/([^/]+)$ /cgi-bin/cardinfo?$2+$1 [PT]
Translated into English, this means the following: at the start of the string, match /info/, followed by one or more characters that aren't /, and put those characters into the variable $1 (the parentheses do this; $1 because they are the first set). Then match a /, then one or more characters aren't /, and put those characters into $2. Then match the end of the string, and pass the result through [PT] to the next rule, which is ScriptAlias. We end up as if we had accessed http://sales.butterthlies.com/cgi-bin/cardinfo?<card ID>+<query>.
If the CGI script is on a different web server for some reason, we could write:
RewriteRule ^/info/([^/]+)/([^/]+)$ http://somewhere.else.com/cgi-bin/ cardinfo?$2+$1 [PT]
Note that this pattern won't match /info/123/price/fred because it has too many slashes in it.
If we run all this with ./go and access http://sales.butterthlies.com/info/2949/price from the client, we see the following message:
You made the query price on card 2949
Copyright © 2003 O'Reilly & Associates. All rights reserved.