Book HomeJava and XSLTSearch this book

Chapter 12. Databases and Perl

Contents:

DBM Databases and DBM Hashes
Design of DBI
DBI Methods
DBI Environment Variables

Since one of Perl's greatest strengths is working with text, a genuine concern is how to store data. Flat files are one possibility, but they don't scale very well for large amounts of data. When working with lots of data, you'll likely need database software to satisfy your capacity and performance requirements.

There are two general solutions to using databases with Perl. For simple database purposes, Database Management (DBM) will serve your needs. DBM is a library supported by many (if not all) Unix systems and many non-Unix systems as well. DBM is more efficient than flat text files because of how it packs records into the database files and the (large) size of data it can store and retrieve efficiently. Perl's interface to your system's DBM is based on a hash, so you can add, store, and delete data by key.

When you need to store a larger variety of data and need other goodies such as searchability on multiple records, you'll probably need to use a database that supports SQL. To this end, you can buy a prepackaged database product such as Oracle or Sybase, or a shareware equivalent such as MySQL or PostgreSQL. For these larger database projects, you should use DBI and DBD. DBI is a module that provides a consistent interface for database solutions. DBD is a database-specific driver that translates DBI calls as needed for that database.

In this chapter, we'll cover DBM and talk at length about DBI/DBD.

12.1. DBM Databases and DBM Hashes

DBM is a simple database management facility for Unix systems. It allows programs to store a collection of key/value pairs in binary form, thus providing rudimentary database support for Perl. Practically all Unix systems ship with built-in DBM support, some with a separate libdb and others with DBM calls built into libc. In the absence of DBM support on your system, you can use gdbm from GNU, which is an extension to vanilla DBM or BerkeleyDB-3.x from http://www.sleepycat.com/.

To use DBM databases in Perl, you can associate a hash with a DBM database through the AnyDBM module that uses tie( ). This hash (called a DBM array) is then used to access and modify the DBM database. Previously, you could use dbmopen( ) to open, read, write, and delete a database, but while dbmopen( ) remains available, you should use the AnyDBM module that's always suited to your underlying DBM implementation.[6]

[6]If you're using BerkeleyDB-2.x or newer, you should not use AnyDBM_File, but should instead install and use the BerkeleyDB module.

For example, with AnyDBM:

#!/usr/local/bin/perl -w

    use AnyDBM_File;
    use Fcntl; # needed for O_ thingies

    my %h;
    my $db_name = 'perl_in_a_nutshell2.dbmx';

    # tie %h. will fail if $db_name can't be created and $db_name can't be 
    # written
    tie(%h, 'AnyDBM_File', $db_name, O_RDWR|O_CREAT, 0640)
        or die("can't create \%h: $!");

    # Populate %h
    foreach my $letter ('a' .. 'z') {
        $h{$letter} = uc($letter);
    }

    while(my($key, $value) = each(%h)) {
        print "$key -> $value\n";
    }

    untie(%h);

The %ARRAYNAME parameter is a Perl hash. (If it already has values, the values are discarded under DBM_File modules, although you can have multiple keys under BerkeleyDB-2.x and newer.) %ARRAYNAME is tied to a DBM database called $db_name. This database may be stored on disk as a single file, or as two files called $db_name.dir and $db_name.pag, depending on the DBM implementation.

The $mode parameter is a number that controls the permissions of the pair of files if the files need to be created. The number is in octal, so make sure that you use permissions such as 0640 instead of 640, which are different numbers in octal. If the files already exist, $mode has no effect. For example:

tie(%BOOKS, "bookdb", O_RDWR|O_CREAT, 0666); # Open %BOOKS onto bookdb

This invocation associates the hash %BOOKS with the disk files bookdb.dir and bookdb.pag in the current directory. If the files don't already exist, they are created with a mode of 0666, modified by the current umask.

tie( ) returns undef upon any failure in opening the $db_name and sets $!.

Once the database is opened, anything you do to the DBM hash is immediately written to the database. See Chapter 4, "The Perl Language" for more information on hashes.

tie(%BOOKS, "bookdb", O_RDWR|O_CREAT, 0666)
    or die("can't open bookdb: $!"); # Open %BOOKS onto bookdb
    $BOOKS{"1-56592-286-7"} = "Perl in a Nutshell";

The DBM array stays open throughout the program. When the program termi- nates, the association is terminated. You can also break the association in a manner similar to closing a filehandle by using untie( ). See Chapter 5, "Function Reference" for more information on tie.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.