Perl CookbookPerl CookbookSearch this book

Chapter 14. Database Access

Contents:

Introduction
Making and Using a DBM File
Emptying a DBM File
Converting Between DBM Files
Merging DBM Files
Sorting Large DBM Files
Storing Complex Data in a DBM File
Persistent Data
Saving Query Results to Excel or CSV
Executing an SQL Command Using DBI
Escaping Quotes
Dealing with Database Errors
Repeating Queries Efficiently
Building Queries Programmatically
Finding the Number of Rows Returned by a Query
Using Transactions
Viewing Data One Page at a Time
Querying a CSV File with SQL
Using SQL Without a Database Server
Program: ggh—Grep Netscape Global History

Charles Dickens, David Copperfield

I only ask for information.

14.0. Introduction

Everywhere you find data, you find databases. At the simplest level, every file can be considered a database. At the most complex level, expensive and complex relational database systems handle thousands of transactions per second. In between are countless improvised schemes for fast access to loosely structured data. Perl can work with all of them.

Early in the history of computers, people noticed that flat file databases don't scale to large data sets. Flat files were tamed using fixed-length records or auxiliary indices, but updating became expensive, and previously simple applications bogged down with I/O overhead.

After some head-scratching, clever programmers devised a better solution. As hashes in memory provide more flexible access to data than do arrays, hashes on disk offer more convenient kinds of access than array-like text files. These benefits in access time cost you space, but disk space is cheap these days (or so the reasoning goes).

The DBM library gives Perl programmers a simple, easy-to-use database. You use the same standard operations on hashes bound to DBM files as you do on hashes in memory. In fact, that's how you use DBM databases from Perl. You use tie to associate a hash with a class and a file. Then whenever you access the hash, the class consults or changes the DBM database on disk. The old dbmopen function also did this, but only let you use one DBM implementation in your program, so you couldn't copy from one format to another.

Recipe 14.1 shows how to create a DBM database and gives tips on using it efficiently. Although you can do with DBM files the same things you do with regular hashes, their disk-based nature leads to performance concerns that don't exist with in-memory hashes. Because DBM files are disk-based and can be shared between processors, use a sentinel lock file (see Recipe 7.24) to regulate concurrent access to them.Recipes Recipe 14.2 and Recipe 14.4 explain these concerns and show how to work around them. DBM files also make possible operations that aren't available using regular hashes. Recipe 14.5 explains two of these things.

Various DBM implementations offer varying features. Table 14-1 shows several possible DBM libraries you can choose from.

Table 14-1. DBM libraries and their features

Feature

NDBM

SDBM

GDBM

DB

Linkage comes with Perl

yes

yes

yes

yes

Source bundled with Perl

no

yes

no

no

Source redistributable

no

yes

gpl[25]

yes

FTPable

no

yes

yes

yes

Easy to build

N/A

yes

yes

ok[26]

Often comes with Unix

yes[27]

no

no[28]

no[28]

Builds okay on Unix

N/A

yes

yes

yes[29]

Builds okay on Windows

N/A

yes

yes

yes[30]

Code size

[31]

small

big

big[32]

Disk usage

[31]

small

big

ok

Speed

[31]

slow

ok

fast

Block size limits

4k

1k[33]

none

none

Byte-order independent

no

no

no

yes

User-defined sort order

no

no

no

yes

Partial key lookups

no

no

no

yes

[25]Using GPLed code in your program places restrictions upon you. See http://www.gnu.org for more details.

[26]See the DB_File library method. Requires symbolic links.

[27]On mixed-universe machines, this may be in the BSD compatibility library, which is often shunned.

[28]Except for free Unix ports such as Linux, FreeBSD, versions of Perl on Windows systems were widely available, including the standard port build from the normal Perl distribution and several proprietary ports. Like most CPAN modules, DB builds only on the standard port.

[31]Depends on how much your vendor has tweaked it.

[32]Can be reduced if you compile for one access of compatibility with older files).

NDBM comes with most BSD-derived machines. GDBM is a GNU DBM implementation. SDBM is part of the X11 distribution and also the standard Perl source distribution. DB refers to the Berkeley DB library. While the others are essentially reimplementations of the original DB library, the Berkeley DB code gives you three different types of database on disk and attempts to solve many of the disk, speed, and size limitations that hinder the other implementations.

Code size refers to the size of the compiled libraries. Disk usage refers to the size of the database files it creates. Block size limits refer to the database's maximum key or value size. Byte-order independence refers to whether the database system relies on hardware byte order or whether it instead creates portable files. A user-defined sort order lets you tell the library in what order to return lists of keys. Partial key lookups let you make approximate searches on the database.

Most Perl programmers prefer the Berkeley DB implementations. Many systems already have this library installed, and Perl can use it. For others, you are advised to fetch and install it from CPAN. It will make your life much easier.

DBM files provide key/value pairs. In relational database terms, you get a database with one table that has only two columns. Recipe 14.6 shows you how to use the MLDBM module from CPAN to store arbitrarily complex data structures in a DBM file.

As good as MLDBM is, it doesn't get around the limitation that you only retrieve rows based on one single column, the hash key. If you need complex queries, the difficulties can be overwhelming. In these cases, consider a separate database management system (DBMS). The DBI project provides modules to work with Oracle, Sybase, mSQL, MySQL, Ingres, and others.

An interesting medium between a full relational database server and a DBM file is the DBD::SQLite module. This provides an SQL interface to a relational database, but without a server process—the module reads and writes the single file that contains all your tables. This gives you the power of SQL and multiple tables without the inconvenience of RDBMS administration. A benefit of manipulating tables from the one process is a considerable gain in speed.

See http://dbi.perl.org/doc/index.html and http://search.cpan.org/modlist/Database_Interfaces. DBI supports most major and minor databases, including Oracle, ODBC, Sybase, Informix, MySQL, PostgreSQL, and XBase. There are also DBD interfaces to data sources such as SQLite, Excel files, and CSV files.



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.