The proxy server's performance can be improved by caching incoming pages so that the next time one is called for, it can be served straight up without having to waste time going over the Web. We can do the same thing for outgoing pages, particularly pages generated on the fly by CGI scripts and database accesses (bearing in mind that this can lead to stale content and is not invariably desirable).
Another reason for using a proxy server is to cache data from the Web to save the bandwidth of the world's clogged telephone systems and therefore to improve access time on our server. Note, however, that it in practice it often saves bandwidth at the expense of increased access times.
The directive CacheRoot, cunningly inserted in the Config file shown earlier, and the provision of a properly permissioned cache directory allow us to show this happening. We start by providing the directory ... /site.proxy/cache, and Apache then improves on it with some sort of directory structure like ... /site.proxy/cache/d/o/j/gfqbZ@49rZiy6LOCw.
The file gfqbZ@49rZiy6LOCw contains the following:
320994B6 32098D95 3209956C 00000000 0000001E X-URL: http://192.168.124.1/message HTTP/1.0 200 OK Date: Thu, 08 Aug 1996 07:18:14 GMT Server: Apache/1.1.1 Content-length: 30 Last-modified Thu, 08 Aug 1996 06:47:49 GMT I am a web site far out there
Next time someone wants to access http://192.168.124.1/message, the proxy server does not have to lug bytes over the Web; it can just go and look it up.
There are a number of housekeeping directives that help with caching.
CacheRoot |
CacheRoot directory Default: none Server config, virtual host
This directive sets the directory to contain cache files; must be writable by Apache.
CacheSize |
CacheSize size_in_kilobytes Default: 5 Server config, virtual host
This directive sets the size of the cache area in kilobytes. More may be stored temporarily, but garbage collection reduces it to less than the set number.
CacheGcInterval |
CacheGcInterval hours Default: never Server config, virtual host
This directive specifies how often, in hours, Apache checks the cache and does a garbage collection if the amount of data exceeds CacheSize.
CacheMaxExpire |
CacheMaxExpire hours Default: 24 Server config, virtual host
This directive specifies how long cached documents are retained. This limit is enforced even if a document is supplied with an expiration date that is further in the future.
CacheLastModifiedFactor |
CacheLastModifiedFactor factor Default: 0.1 Server config, virtual host
If no expiration time is supplied with the document, then estimate one by multiplying the time since last modification by factor. CacheMaxExpire takes precedence.
CacheDefaultExpire |
CacheDefaultExpire hours Default: 1 Server config, virtual host
If the document is fetched by a protocol that does not support expiration times, use this number. CacheMaxExpire does not override it.
CacheDirLevels and CacheDirLength |
CacheDirLevels number Default: 3 CacheDirLength number Default: 1 Server config, virtual host
The proxy module stores its cache with filenames that are a hash of the URL. The filename is split into CacheDirLevels of directory using CacheDirLength characters for each level. This is for efficiency when retrieving the files (a flat structure is very slow on most systems). So, for example:
CacheDirLevels 3 CacheDirLength 2
converts the hash "abcdefghijk" into ab/cd/ef/ghijk. A real hash is actually 22 characters long, each character being one of a possible 64 (26), so that three levels, each with a length of 1, gives 218 directories. This number should be tuned to the anticipated number of cache entries (218 being roughly a quarter of a million, and therefore good for caches up to several million entries in size).
CacheNegotiatedDocs |
CacheNegotiatedDocs Default: none Server config, virtual host
If present in the Config file, this directive allows content-negotiated documents to be cached by proxy servers. This could mean that clients behind those proxys could retrieve versions of the documents that are not the best match for their abilities, but it will make caching more efficient.
This directive only applies to requests that come from HTTP 1.0 browsers. HTTP 1.1 provides much better control over the caching of negotiated documents, and this directive has no effect on responses to HTTP 1.1 requests. Note that very few browsers are HTTP 1.0 anymore.
Copyright © 2003 O'Reilly & Associates. All rights reserved.