Book HomeManaging and Using MySQLSearch this book

3.9. Advanced Features

Using the SQL presented thus far in this chapter should handle 90% of your database programming needs. On occasion, however, you will need some extra power not available in the basic SQL functionality. We close out the chapter with a discussion of a few of these features.

3.9.1. Full Text Searching

MySQL introduced the ability to search on text elements within a text field in Version 3.23.23 through a special index called a FULLTEXT index. It specifically enables you to do something like:

INSERT INTO Document (url, page_text )
VALUES ('index.html', 'The page contents.');
SELECT url FROM Document WHERE MATCH ( page_text ) AGAINST ('page');

INSERT adds a row to a Document table containing the URL of a web page and its text content. SELECT then looks for the URLs of all documents with the word page embedded in their text.

3.9.1.1. The Basics

The magic behind full text searching lies in a FULLTEXT index. The CREATE statement for the Document table might look like this:

CREATE TABLE Document (
    url       VARCHAR(255) NOT NULL PRIMARY KEY,
    page_text TEXT         NOT NULL,
    FULLTEXT ( page_text )
);

The FULLTEXT index enables you to search the index using words or phrases that will not match exactly and then weigh the relevance of any matches. As with other indexes, you can create multicolumn FULLTEXT indexes:

CREATE TABLE Document (
    url       VARCHAR(255) NOT NULL PRIMARY KEY,
    title     VARCHAR(100) NOT NULL,
    page_text TEXT         NOT NULL,
    FULLTEXT ( title, page_text )
);

With this table structure, you can now search for documents that have the word MySQL anywhere in the title or body of the page. You must keep your searches structured against the index, not against the columns. In other words, you can match against title and page_text together with this table, but you cannot look for words that exist only in the title unless you create a separate FULLTEXT index on it alone. Your combined search will look like the following:

SELECT url FROM Document
WHERE MATCH ( title, page_text ) AGAINST ('MySQL');

3.9.1.2. Relevance values

The search that occurs here is a natural language search against the text in the specified columns. It is case insensitive. The result of the match is actually a relevance value that MySQL uses to rank the results. By default, MySQL shows the results with the most relevant results listed first and eliminates only those results with no relevance at all.

mysql> SELECT url FROM Document 
    -> WHERE MATCH ( title, page_text ) AGAINST ('java');
+------------+
| url        |
+------------+
| java.html  |
+------------+

You can use the relevance to your advantage. To get a better picture of the relevance values, you can execute the following query:

mysql> SELECT url, WHERE MATCH ( title, page_text ) AGAINST ('java') 
    -> FROM Document;
+------------+---------------------------------------------+
| url        | MATCH ( title, page_text ) AGAINST ('java') |
+------------+---------------------------------------------+
| index.html |                                           0 |
| java.html  |                             1.8016148151891 |
| perl.html  |                                           0 |
| c.html     |                                           0 |
+------------+---------------------------------------------+

In this case, the index.html file is a web page about MySQL and java.html is about how MySQL and Java work together. As you might expect, the results show that index.html has no relevance to Java, while java.html has quite a bite of relevance.

TIP: In the above example, you can include MATCH in both the SELECT clause and the WHERE clause without incurring any extra overhead. MySQL is smart enough to notice that the two matches are identical and thus execute them a single time. Using MATCH in both places is useful when you want to make use of relevance.

You might expect that a match against MySQL would turn up high relevance for both documents. In reality, however, it turns up zero relevance. Because the phrase MySQL is present in more than half the rows, it is considered a stopword and thus discounted. A stopword is simply a word with no value for text matching. Common stopwords are "the" and "but." They are critical to achieving meaningful results in a full text search. Unfortunately, our sample database has four rows about MySQL and thus MySQL itself is considered a stopword.

3.9.1.3. Boolean mode

MySQL 4.0.1 introduced the ability to perform more complex searches with MySQL full text searching using much the same syntax you would use in Internet search engines. These complex searches are called Boolean mode searches. To execute a Boolean mode search, you use the same syntax except:

For example:

SELECT url, title FROM Document
WHERE MATCH ( title, page_text ) AGAINST ( '+MySQL -Java' IN BOOLEAN MODE );

This query enables you to look for all documents that include MySQL but exclude Java. Common words that might otherwise be used as stopwords can be used in Boolean mode searches. Without any modifiers, a term in a Boolean mode search is considered optional. You can modify your Boolean mode searches with the following operators:

+
The word must be in the matched index.

-
The word must not be in the matched index.

~
The word's relevance is inverted. This operator is useful for removing words that MySQL is giving high relevance to, but you do not want in your hits. Unlike the - operator, returned rows can have the term in question.

<
Decreases the word's contribution to the overall relevance of a match.

>
Increases a word's contribution to the overall relevance of a match.

*
This operator comes after the word it modifies. It works much like a wildcard, matching any words that start with the specified word.

()
Groups words into subexpressions.

Now you can run complex queries such as:

SELECT url, title FROM Document
WHERE MATCH ( title, page_text ) AGAINST ( '+MySQL -optimiz* +(>Java <Perl)' IN BOOLEAN MODE);

This query asks for all of the rows about MySQL and either Java or Perl that do not have words beginning with optimiz. It will then rank Java documents higher than Perl documents.

3.9.1.4. Tips

MySQL determines relevance based on several criteria. It ignores any stopwords as well as words of three characters or less. A match against an index that finds a single hit in the database will have a higher relevance than a match against an index with many hits. In other words, rare words have greater value than common words, and overly common words become stopwords that have no value at all. It is therefore important to the utility of MySQL's full text searching that you have a large set of data to search against. Small data sets such as the one above with only four rows will produce odd results—like no hits on the word MySQL!

When you are adding a lot of data at once, such as when you are indexing the web pages on your web site, you should drop the FULLTEXT index, insert your updated web pages, then recreate the index. Inserts on tables with FULLTEXT indexes are quite expensive and work better if you do the indexing all at once.

The MySQL team is working hard on this fairly recent addition to MySQL. In the near future, you can expect the ability to look for phrases instead of just words as well as the ability to define your own words that must always be indexed. In fact, by the time you read this book, some new features have probably already been added to MySQL full text searching.

3.9.2. Transactions

MySQL recently introduced transactions along with SQL for executing statements in a transactional context. By default, MySQL is in a state called autocommit. Autocommit mode means that any SQL you send to MySQL is executed immediately. In some cases, however, you may want to execute two or more SQL statements together as a single unit of work.

A transfer between two bank accounts is the perfect example of such a transaction. The banking system needs to make sure that the debit from the first account and the credit to the second account occur as a single unit of work. If they are treated separately, the server could in theory crash between the debit and the credit. The result would be that you would lose that money!

By making sure the two statements occur as a single unit of work, transactions ensure that the first statement can be "rolled back" in the event that the second statement fails. To use transactions in MySQL, you first need to create a table using a transactional table type such as BDB or InnoDB.[10]

[10]Not all platforms support these table types.

If your MySQL install was not compiled with support for these table types, you cannot use transactions unless you reinstall. The SQL to create a transactional table specifies one of the transactional types:

CREATE TABLE ACCOUNT (
    ACCOUNT_ID BIGINT UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT,
    BALANCE    DOUBLE) 
TYPE = BDB;

For a transaction against a transactional table to work, you need to turn off autocommit. You can do this with the command:

SET AUTOCOMMIT=0;

Now you are ready to begin using MySQL transactions. Transactions start with the BEGIN command:

BEGIN;

Your mysql client is now in a transactional context with respect to the server. Any change you make to a transactional table will not be made permanent until you commit it. Changes to nontransactional tables, however, will take place immediately. In the case of the account transfer, we issue the following statements:

UPDATE ACCOUNT SET BALANCE = 50.25 WHERE ACCOUNT_ID = 1;
UPDATE ACCOUNT SET BALANCE = 100.25 WHERE ACCOUNT_ID = 2;

Once you're done with any changes, complete the transaction by using the COMMIT command:

COMMIT;

The true advantage of transactions, of course, comes into play should an error occur in executing the second statement. To abort the entire transaction before a commit, issue the ROLLBACK command:

ROLLBACK;

In reality, the logic behind such complex transactional operations, including commits and rollbacks, requires solid design and well-structured error handling. We will cover these programmatic elements of transaction management in Chapter 8.

Of course, it would be useful if MySQL performed the actual math. It can do just that as long as you store the values you want with a SELECT call:

SELECT @FIRST := BALANCE FROM ACCOUNT WHERE ACCOUNT_ID = 1;
SELECT @SECOND := BALANCE FROM ACCOUNT WHERE ACCOUNT_ID = 2;
UPDATE ACCOUNT SET BALANCE = @FIRST - 25.00 WHERE ACCOUNT_ID = 1;
UPDATE ACCOUNT SET BALANCE = @SECOND + 25.00 WHERE ACCOUNT_ID = 2;

In addition to the COMMIT command, a handful of other commands will automatically end any current transaction as if a COMMIT had been issued. These commands are:

Chapter 8 covers some of the more intricate details of using transactions in database applications.

3.9.3. Table Locking

Table locking is the poor man's transaction. In short, MySQL lets you lock down a group of tables so that only a single client can use it. Unlike transactions, you are not limited by the type of table. You cannot, however, roll back any actions taken against a locked table.

Locking has two basic functions:

MySQL supports three kinds of locks: read, read local, and write. Both kinds of read locks lock the table for reading by a client and all other clients. As long as the lock is in place, no one can write to the locked tables. Read and read local locks differ in that read local allows a client to execute nonconflicting INSERT statements as long as no changes to the MySQL files from outside of MySQL occur while the lock is held. If changes might occur by agents outside of MySQL, a read lock is required.

A write lock locks the specified tables against all access—read or write—by any other client. To lock a table, use the following command:

LOCK TABLES ACCOUNT WRITE;

Now that the ACCOUNT table is locked, you can read from it and modify the data behind it and be certain that no one else will change the data you read between your read and write operations:

SELECT @BAL:=BALANCE FROM ACCOUNT WHERE ACCOUNT_ID = 1;
UPDATE ACCOUNT SET BALANCE = @BAL * 0.03 WHERE ACCOUNT_ID = 1;

Finally, you need to release the locks:

UNLOCK TABLES;

It is really important that you unlock the tables! Failing to do so can result in preventing further access to those tables. Finally, any table locking should be short lived. Long-lived locks seriously degrade database performance.

3.9.4. Functions

Functions in SQL are similar to functions in other programming languages such as C and Perl. The function takes zero or more arguments and returns some value. For example, the function SQRT(16) returns 4. Within a MySQL SELECT statement, functions may be used in one of two ways:

As a value to be retrieved
This form involves a function in the place of a column in the list of columns to be retrieved. The return value of the function, evaluated for each selected row, is part of the returned result set as if it were a column in the database.[11]

[11]You can use aliasing, covered earlier in the chapter, to give the resulting columns "friendly" names.

For example:

SELECT name, FROM_UnixTIME(date) 
FROM events
WHERE time > 90534323

This query selects the name of each event and the date of the event formatted in human-readable form for all events more recent than the given time. FROM_UnixTIME( ) transforms a standard Unix time value into a human-readable form.[12]

[12]Remember that SQL is case insensitive. This particular function is simply written FROM_UnixTIME( ) by convention. You can use FROM_UNIXTIME() or From_UnixTime( ) if they feel more natural to you.

# The LENGTH( ) function returns the character length of
# a given string.
SELECT title, text, LENGTH(text)
FROM papers
WHERE author = 'Stacie Sheldon'

This query selects the title of a paper, the full text of the paper, and the length of the text in bytes for all of the papers authored by Stacie Sheldon. The LENGTH( ) function returns the character length of a given string.

As part of a WHERE clause
This form involves a function used in place of a constant when evaluating a WHERE clause. The value of the function is used for comparison for each row of the table. For example:

SELECT name 
FROM entries 
WHERE id = ROUND( (RAND( )*34) + 1 )

This query randomly selects the name of an entry from a pool of 35 entries. The RAND( ) function generates a random number between 0 and 1. This random value is then multiplied by 34 to turn the value into a number between 0 and 34. Incrementing the value by 1 provides a number between 1 and 35. The ROUND( ) function rounds the result to the nearest integer. The result is a whole number between 1 and 35 and will therefore match one of the ID numbers in the table.

SELECT name, FROM_UnixTIME(date)
FROM events
WHERE time > (Unix_TIMESTAMP( ) - (60 * 60 * 24) )

You may use functions in both the value list and the WHERE clause. This query selects the name and date of each event less than a day old. With no arguments, the UNIX_TIMESTAMP( ) function returns the current time in Unix format.

SELECT name
FROM people
WHERE password = ENCRYPT(name, LEFT(name, 2))

You may also use the value of a table field within a function. This example returns the names of people who used their names as passwords. The ENCRYPT( ) function returns a Unix password-style encryption of the specified string using the supplied two-character salt. The LEFT( ) function returns the left-most n characters of the specified string.

3.9.4.1. Date functions

The most common functions you use will likely be the MySQL functions that enable you to manipulate dates. You already saw some of these functions earlier for translating a Unix-style date into a human-readable form of the date. MySQL, of course, provides more powerful functions for doing things such as calculating the time between two dates:

SELECT TO_DAYS(NOW( )) - TO_DAYS('2000-12-31');

This example provides the number of days that have passed in this millennium. The NOW( ) function returns the DATETIME representing the moment in time when the command was executed. Less obviously, the TO_DAYS( ) function returns the number of days since the year 1 B.C., represented by the specified DATE or DATETIME.[13]

[13]MySQL is actually incapable of representing this date. Valid date ranges in MySQL are from January 1, 1000, to December 31, 9999. There is also no support in MySQL for alternative calendar systems such as the Hebrew, Chinese, or Muslim calendars.

Not everyone likes to see dates formatted the way MySQL provides them by default. Fortunately, MySQL lets you format dates to your own liking using the DATE_FORMAT function. It takes a DATE or DATETIME and a format string indicating how you want the date formatted:

mysql> SELECT DATE_FORMAT('1969-02-17', '%W, %M %D, %Y');
+--------------------------------------------+
| DATE_FORMAT('1969-02-17', '%W, %M %D, %Y') |
+--------------------------------------------+
| Monday, February 17th, 1969                |
+--------------------------------------------+
1 row in set (0.39 sec)

Chapter 15 contains a full list of valid tokens for the DATE_FORMAT( ) function.

3.9.4.2. String functions

In addition to date functions, you are likely to use string functions. We saw one such function above: the LENGTH( ) function. This function provides the number of characters in the specified string. The most common string function you are likely to use, however, is the TRIM( ) function, which removes extra spaces from columns.

One interesting function is the SOUNDEX( ) function. It translates a word into its soundex representation. The soundex representation is a way of representing the sound of a string so that you can compare two strings to see if one is misspelled:

mysql> SELECT SOUNDEX('too');
+----------------+
| SOUNDEX('too') |
+----------------+
| T000           |
+----------------+
1 row in set (0.42 sec)

mysql> SELECT SOUNDEX('two');
+----------------+
| soundex('two') |
+----------------+
| T000           |
+----------------+
1 row in set (0.00 sec)

For these two homonyms, the SOUNDEX( ) function provided the same value. Consequently, an application can leverage this function to check spelling variations.

3.9.5. Outer Joins

MySQL supports a more powerful joining than the simple inner joins we saw earlier. Specifically, MySQL supports something called a left outer join (also known as simply an outer join), which you specify with the keywords LEFT JOIN. This type of join is similar to an inner join, except that it includes data in the first column named that does not match any in the second column. If you remember our author and book tables from earlier in the chapter, you will remember that our join would not list any authors who did not have books in our database. You may want to show entries from one table that have no corresponding data in the table to which you are joining. That is where an outer join comes into play:

SELECT book.title, author.name 
FROM author
LEFT JOIN book ON book.author = author.id

This query is similar to the inner join that you already understand:

SELECT book.title, author.name
FROM author, book
WHERE book.author = author.id

Note that an outer join uses the keyword ON instead of WHERE. The key difference in results is that the new syntax of the outer join will include authors such as Neil Gaiman, for whom no book is in our database. The results of the outer join would therefore look like this:

+----------------+----------------+
| book.title     | author.name    |
+----------------+----------------+
| The Green Mile | Stephen King   |
| Guards, Guards!| Terry Pratchett|
| Imzadi         | Peter David    |
| Gold           | Isaac Asimov   |
| Howling Mad    | Peter David    |
| NULL           | Neil Gaiman    |
+----------------+----------------+

MySQL takes this concept one step further by using a natural outer join. A natural outer join will combine the rows from two tables that have identical column names with identical types and identical values:

SELECT my_prod.name 
FROM my_prod
NATURAL LEFT JOIN their_prod

This natural join will list all product names with identical entries in the my_prod and their_prod tables.

3.9.6. Unions

One of the newest MySQL features as of MySQL 4.0 is the support for SQL unions. A union is simply a tool for combining the results from multiple selects into a single result set listing. A MySQL union looks like this:

SELECT first_name, last_name
FROM Author
UNION
    SELECT fname, lname
    FROM Editor;

This query will provide a list of all authors and editors in the database. The list will include in the first column the values of the first_name column for authors and the values of the fname column for editors. The second column will include the last_name values for authors and lname values for editors.

If one person is an author and an editor, he will appear a single time in the list. You can, however, get MySQL to show the person twice in the results by using the ALL keyword:

SELECT first_name, last_name
FROM Author
UNION ALL
    SELECT fname, lname
    FROM Editor;

3.9.7. Batch Processing

Batch loading is the act of loading a lot of data into or pulling a lot of data out of MySQL all at once. MySQL supports two types of batch loading.

3.9.7.1. Command-line loads

In the simplest kind of batch load, you stick all your SQL commands in a file and send the contents of that file to MySQL:

mysql -h somehost -u uid -p < filename

In other words, you are using the command line to pipe the SQL commands into the mysql command-line utility. The examples on this book's web site contain several SQL command files that you can load into MySQL in this manner before you run the examples.

3.9.7.2. The LOAD command

The LOAD command enables you to load data from a file containing only data (no SQL commands). For example, if you had a file containing the names of all the books in your collection with one book on each line and the title and author separated by a tab, you could use the following command to load that data into your book table:

LOAD DATA LOCAL INFILE 'books.dat' INTO TABLE BOOK;

This command assumes that the file books.dat has one line for each database record to be inserted. It further assumes that there is a value for every column in the table or \N for null values. So, if the BOOK table has three columns, each line of books.dat should have three tab-separated values.

The LOCAL keyword tells the mysql command line to look for the file on the same machine as the client.[14] Without it, MySQL looks for the file on the server. Of course, if you are trying to load something on the server, you need to have been granted the special FILE privilege. Finally, keep in mind that nonlocal loads refer to files relative to the installation directory of MySQL.

[14]Reading from files local to the client is available only on MySQL 3.22.15 and later.

If you have a comma-separated value file such as an Excel file, you can change the delimiter of the LOAD command:

LOAD DATA LOCAL INFILE 'books.dat' 
INTO TABLE BOOK 
FIELDS TERMINATED BY ',';

If a file contains values that would cause duplicate records in the database, you can use the REPLACE and IGNORE keywords to dictate the correct behavior. REPLACE will cause the values from the file to replace the ones in the database, where the IGNORE keyword will cause the duplicate values to be ignored. The default behavior is to ignore duplicates.

3.9.7.3. Pulling data from MySQL

Finally, MySQL provides a tool for copying the results of a SELECT from the database into a file:

SELECT * INTO OUTFILE 'books.dat' 
FIELDS TERMINATED BY ','
FROM BOOK;

This query copies all rows in the BOOK table into the file books.dat. You could then use this file to load into an Excel spreadsheet or another database. Because this file is created on the server, it is created relative to the base directory for the database in use. On a Mac OS X basic installation, for example, this file is created as /usr/local/var/test/test.dat.

A more complex version of this command enables you to put quotes (or any other characters) around fields:

SELECT * INTO OUTFILE 'books.dat' 
FIELDS ENCLOSED BY '"' TERMINATED BY ',' 
FROM BOOK;

Of course, you probably want only the string fields (CHAR, VARCHAR, etc.) enclosed in quotes. You can accomplish this by adding the OPTIONALLY keyword:

SELECT * INTO OUTFILE 'books.dat' 
FIELDS OPTIONALLY ENCLOSED BY '"' TERMINATED BY ',' 
FROM BOOK;

Chapter 15 contains a full range of options for loading and extracting data from MySQL.



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.