In a table, each column has a type. As we mentioned earlier, an SQL data type is similar to a data type in traditional programming languages. While many languages define a bare-minimum set of types necessary for completeness, SQL goes out of its way to provide types such as DATE that will be useful to everyday users. You could store a DATE type in a more basic numeric type, but having a type specifically dedicated to the nuances of date processing adds to SQL's ease of use—one of SQL's primary goals.
Chapter 16 provides a full reference of SQL types supported by MySQL. Table 3-1 is an abbreviated listing of the most common types.
MySQL supports the UNSIGNED attribute for all numeric types. This modifier forces the column to accept only positive (unsigned) numbers. Unsigned fields have an upper limit that is double that of their signed counterparts. For instance, an unsigned TINYINT—MySQL's single-byte numeric type—has a range of 0 to 255 instead of the -128 to 127 range of its signed counterpart.
MySQL provides more types than those mentioned in Table 3-1. In day-to-day programming, however, you will use these types most often. The size of the data you wish to store plays a large role in the design of your MySQL tables.
Before you create a table, you should know what kind of data you wish to store in the table. Beyond obvious decisions about whether your data is character-based or numeric, you should know the approximate size of the data to be stored. If it is a numeric field, what is its maximum possible value? What is its minimum possible value? Could that change in the future? If the minimum is always positive or zero, you should consider an unsigned type. You should always choose the smallest numeric type that can support your largest conceivable value. If, for example, you have a field that represents the population of a state, use an unsigned INT field. No state can have a negative population. Furthermore, an unsigned INT field is certainly large enough to represent a state's population, unless that population grows to be roughly the current population of the entire Earth.
Managing character types is a little more complicated. Not only do you have to worry about the minimum and maximum string lengths, but you also have to worry about the average size and the amount of variation. For our current purposes, an index is a field or combination of fields on which you plan to search—basically, the fields in your WHERE clause. Indexing is, however, much more complicated, so we will provide further details later in the chapter. What's important to note here is that indexing on character fields works best when the field is a fixed length. If there is little or, preferably, no variation in the length of your character-based fields, then a CHAR type is appropriate. An example of a good candidate for a CHAR field is a country code. The ISO provides a comprehensive list of standard two-character representations of country codes (US for the U.S., FR for France, etc.).[7] Because these codes are always exactly two characters, a CHAR(2) is the best way to maintain the country code based on the ISO representation
[7]States and provinces do not work the same way in internationalized applications. If you want to write an application that works in an international environment, make the columns for state and province codes CHAR(3), because Australia uses three-character state codes. Also note that there is a three-character ISO country-code standard.
A value does not need to be constant length to use a CHAR field. It should, however, have very little variance. Phone numbers, for example, can be stored safely in a CHAR(13) field even though phone number lengths vary from nation to nation. The variance is little enough that there is no point in making a phone number field variable in length. Keep in mind that with a CHAR field, no matter how big the actual string being stored is, the field always takes up exactly the number of characters specified as the field's size—no more, no less. Any difference between the length of the text being stored and the length of the field is made up by padding the value with spaces. While the few potential extra characters being wasted on a subset of the phone number data is not anything to worry about, you do not want to be wasting much more.
Variable-length text fields are appropriate for text fields with widely varying lengths. A good, common example of a field that demands a variable-length data type is a web URL. Most web addresses are relatively short (e.g., http://www.ora.com, http://www.imaginary.com, http://www.mysql.com) and consequentially do not pose problems. Occasionally, however, you will run into web addresses such as:
http://www.winespectator.com/Wine/Spectator/>_notes|5527293926834323221480431354?Xv11=&Xr5=&Xv1=&type-region-search-code=&Xa14=flora+springs&Xv4=
If you construct a CHAR field large enough to hold this URL, you will be wasting a significant amount of space for almost every other URL being stored. A variable-length field lets you define a field length that can store the odd, long-length value while not wasting all that space for the common, short-length values.
Variable-length text fields in MySQL use only as much space as necessary to store an individual value into the field. A VARCHAR(255) field that holds the string "hello world," for example, takes up only 12 bytes (1 byte for each character plus an extra byte to store the length).
TIP: MySQL varies from the ANSI standard by not padding VARCHAR fields. Any extra spaces are removed from a value before it is stored.
You cannot store strings with lengths greater than the field length you have specified. With a VARCHAR(4) field, you can store at most a string with four characters. If you attempt to store the string "happy birthday," MySQL will truncate the string to "happ." The downside is that there is no way to store the odd string that exceeds your designated field size. Table 3-2 shows the storage space required by the different text data types to store the 144-character Wine Spectator URL shown earlier, the space required to store an average-sized 30-character URL, and the maximum string size for that data type.
Data type |
Storage for 144-char string |
Storage for 30-char string |
Maximum string size |
---|---|---|---|
150 |
150 |
255 |
|
145 |
31 |
255 |
|
145 |
31 |
255 |
|
146 |
32 |
65535 |
|
147 |
33 |
16777215 |
|
148 |
34 |
4294967295 |
In this table, note that storage requirements grow 1 byte at a time for the variable-length types of MEDIUMTEXT and LONGTEXT. This growth is due to the space required to store the size in variable-length fields. TEXT uses an extra byte to store the potentially greater length of the text it contains. Similarly, MEDIUMTEXT uses an extra 2 bytes over VARCHAR, and LONGTEXT uses an extra 3 bytes.
If after years of uptime with your database, you find that the world has changed and a field that once comfortably existed as a VARCHAR(25) must now hold strings as long as 30 characters, you are not out of luck. MySQL provides a command called ALTER TABLE that enables you to redefine a field type without losing any data:
ALTER TABLE mytable MODIFY mycolumn LONGTEXT
MySQL provides a set of binary data types that closely mirror their character counterparts. The MySQL binary types are CHAR BINARY, VARCHAR BINARY, TINYBLOB, BLOB, MEDIUMBLOB, and LONGBLOB. The practical distinction between character types and their binary counterparts is the concept of encoding. Binary data is basically a chunk of data that MySQL makes no effort to interpret. Character data, on the other hand, is assumed to represent textual data from human alphabets. It is thus encoded and sorted based on rules appropriate to the character set in question. On an ASCII system, MySQL sorts binary data in a case-sensitive, ASCII order.
MySQL provides two other special kinds of types. The ENUM type allows you specify (enumerate) at table creation a list of possible values that can be inserted into that field. For example, if you have a column named fruit into which you want to allow only the values apple, orange, kiwi, or banana, you would assign this column the type ENUM:
CREATE TABLE meal(meal_id INT NOT NULL PRIMARY KEY, fruit ENUM('apple', 'orange', 'kiwi', 'banana'))
When you insert a value into that column, it must be one of the specified fruits. Because MySQL knows ahead of time which values are valid for the column, it can abstract them to some underlying numeric type. In other words, instead of storing apple in the column as a string, MySQL stores it internally as a single-byte number. However, you still refer to it as apple in a query or when you retrieve the value from MySQL. You also use apple when you call the table or view results from the table.
The MySQL ET type works in the same way, except it lets you store multiple values in a field at the same time and uses bits instead of bytes.
Every piece of data you will ever encounter can be stored using numeric or character types. Technically, you could even store numbers as character types. Just because you can do so, however, does not mean you should. Consider, for example, storing a date in the database. You could store that value as a Unix-style BIGINT or as a combination of several columns for the day, month, and year. How do you look for rows with a date value greater than two days after a specific date? Either you calculate the numeric representation of that date or employ a complex operation for a simple query mixing day, month, and year values.
Isn't all of that a major pain? Wouldn't it be nice if MySQL handled all of these issues for you? In fact, MySQL does. It provides several complex data types to help with abstract common concepts. It supports the concept of dates through the DATE data type. Other such data types include DATETIME and TIMESTAMP.
Copyright © 2003 O'Reilly & Associates. All rights reserved.