Apache The Definitive Guide, 3rd EditionApache: The Definitive GuideSearch this book

12.5. Scalability

Moving a web site from one machine serving a few test requests to an industrial-strength site capable of serving the full flood of web demand may not be a simple matter.

12.5.1. Performance

A busy site will have performance issues, which boil down to the question: "Are we serving the maximum number of customers at the minimum cost?"

12.5.1.1. Tools

You can see how resources are being used under Unix from the utilities: top, vmstat, swapinfo, iostat, and their friends. (See Essential System Administration, by Aeleen Frisch [O'Reilly, 2002].)

12.5.1.2. Apache's mod_info

mod_info can be used to monitor and diagnose processes that deal with HTTPD. See Chapter 10.

12.5.1.3. Bandwidth

Your own hardware may be working wonderfully, but it's being strangled by bandwidth limitations between you and the Web backbone. You should be able to make rough estimates of the bandwidth you need by multiplying the number of transactions per second by the number of bytes transferred (making allowance for the substantial HTTP headers that go with each web page). Having done that, check what is actually happening by using a utility like ipfm from http://www.via.ecp.fr/~tibob/ipfm/:

HOST                    IN        OUT      TOTAL 
host1.domain.com        12345     6666684  6679029 
host2.domain.com        1232314   12345    1244659 
host3.domain.com        6645632   123      6645755
...

Or use cricket (http://cricket.sourceforge.net/) to produce pretty graphs.

12.5.1.4. Load balancing

mod_backhand is free software for load balancing, covered later in this chapter. For expensive software look for ServerIron, BigIP, LoadDirector, on the Web.

12.5.1.5. Image server, text server

The amount of RAM at your disposal limits the number of copies of Apache (as httpd or httpsd) that you can run, and that limits the number of simultaneous clients you can serve. You can reduce the size of some of the httpd instances by having a cutdown version for images, PDF files, or text while running a big version for scripts.

What normally makes the difference in size is the necessity to load a scripting language such as Perl or PHP into httpd. Because these provide persistent storage of modules and variables between requests, they tend to consume far more RAM than servers that only serve static pages and images. The normal answer is to run two copies of Apache, one for the static stuff and one for the scripts. Each copy has to bind to a different IP and port combination, of course, and usually the number of instances of the dynamic one has to be limited to avoid thrashing.

12.5.2. Shared Versus Replicated DBs

You may want to speed up database accesses by replicating your database across several machines so that they can serve clients independently. Replication is easy if the data is static, i.e., catalogs, texts, libraries of images, etc. Replication is hard if the database is often updated as it would be with active clients. However, you can sidestep replication by dividing your client database into chunks (for instance, by surname: A-D, E-G,...etc.), each served by a single machine. To increase speed, you divide it smaller and add more hardware.



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.