A major problem in managing a big site is that it is always in flux. The person in charge therefore has to manage a constant flow of new material from the development machines, through the beta test systems, to the live site. This process can be very complicated and he will need as much help from automation as he can get.
The development hardware has to address two issues: the functionality of the code — running on any machine — and the interaction of the different machines on the live site.
The development of the code — by one or several programmers — will benefit enormously from using a version control system like CVS (see http://www.cvshome.org/). CVS allows you to download files from the archive, work on them, and upload them again. The changes are logged and a note is broadcast to everyone else in the project.[48] At any time you can go back to any earlier version of a file. You can also create "branches" — temporary diversions from the main development that run in parallel.
[48]Notes can be broadcast if you've added scripts to do it — these are widely available, though they don't come with CVS itself.
CVS can operate through a secure shell so that developers can share code securely over the Internet. We used it to control the writing of this edition of this book. It is also used to manage the development of Apache itself, and, in fact, most free software.
The network of development machines needs to resemble the network of live machines so that load balancing and other intersystem activities can be verified. It is possible to simulate multiple machines by running multiple services on one machine. However, this can miss accidental dependences that arise, so it is not a good idea for the beta test stage.
The beta test site should be separate from the development machines. It should be a replica of the real site in every sense (though perhaps scaled down — e.g., if the live site is 10 load-balanced machines, the beta test site might only have 2), so that all the different ways that networked computers can interfere with each other can have full rein. It should be set up by the sys admins but tested by a very special sort of person: not a programmer, but someone who understands both computing and end users. Like a test pilot, she should be capable of making the crassest mistakes while noting exactly what she did and what happened next.
The configuration of the live site will be dictated by a number of factors — the functionality of the site plus the expected traffic. Quite often a site can be divided into several parts, which are best handled on different machines. One might handle data-intensive actions — serving a large stock of images for instance. Another might be concerned with computations and a database, while a third might handle secure access. They might be replicated for backup and maybe mirrored in another continent to minimize long-haul web traffic and improve client access. Load sharing and automatic-backup software will be an issue here (see later).
An established site will have its own upgrade procedure. If not, it should — and do so by incorporating at least some elements that follow.
As development goes ahead, the transfer of data and scripts between the three sites should be managed by scripts that produce comprehensive logs. This way, when something goes wrong, it can be traced and fixed. These scripts should also explicitly set ownerships and permissions for all the files transferred.
Once you have an active web site, you — or your marketing people — will want to know as much as you can about who is using it, why they are, and what they think of the experience. Apache has comprehensive logging facilities, and you can write scripts to analyze them; alternatively, you can write scripts to accumulate data in your database as you go along. Either way, you do not want your business rivals finding their way to this sensitive information or monitoring your web traffic while you look at it, so you may want to use SSL to protect your access to your maintenance pages. These pages may well allow you to view, alter, and update confidential customer information: normal prudence and the demands of data protection laws would suggest you screen these activities with SSL.
Copyright © 2003 O'Reilly & Associates. All rights reserved.