Book HomeLearning Perl, 3rd EditionSearch this book

2.3. Strings

Strings are sequences of characters (like hello). Strings may contain any combination of any characters.[48]

[48]Unlike C or C++, there's nothing special about the NUL character in Perl, because Perl uses length counting, not a null byte, to determine the end of the string.

The shortest possible string has no characters. The longest string fills all of your available memory (although you wouldn't be able to do much with that). This is in accordance with the principle of "no built-in limits" that Perl follows at every opportunity. Typical strings are printable sequences of letters and digits and punctuation in the ASCII 32 to ASCII 126 range. However, the ability to have any character in a string means you can create, scan, and manipulate raw binary data as strings -- something with which many other utilities would have great difficulty. For example, you could update a graphical image or compiled program by reading it into a Perl string, making the change, and writing the result back out.

Like numbers, strings have a literal representation, which is the way you represent the string in a Perl program. Literal strings come in two different flavors: single-quoted string literals and double-quoted string literals.

2.3.1. Single-Quoted String Literals

A single-quoted string literal is a sequence of characters enclosed in single quotes. The single quotes are not part of the string itself -- they're just there to let Perl identify the beginning and the ending of the string. Any character other than a single quote or a backslash between the quote marks (including newline characters, if the string continues onto successive lines) stands for itself inside a string. To get a backslash, put two backslashes in a row, and to get a single quote, put a backslash followed by a single quote. In other words:

'fred'    # those four characters: f, r, e, and d
'barney'  # those six characters
''        # the null string (no characters)
'Don\'t let an apostrophe end this string prematurely!'
'the last character of this string is a backslash: \\'
'hello\n' # hello followed by backslash followed by n
'hello
there'    # hello, newline, there (11 characters total)
'\'\\'    # single quote followed by backslash

Note that the \n within a single-quoted string is not interpreted as a newline, but as the two characters backslash and n. Only when the backslash is followed by another backslash or a single quote does it have special meaning.

2.3.2. Double-Quoted String Literals

A double-quoted string literal is similar to the strings you may have seen in other languages. Once again, it's a sequence of characters, although this time enclosed in double quotes. But now the backslash takes on its full power to specify certain control characters, or even any character at all through octal and hex representations. Here are some double-quoted strings:

"barney"        # just the same as 'barney'
"hello world\n" # hello world, and a newline
"The last character of this string is a quote mark: \""
"coke\tsprite"  # coke, a tab, and sprite

Note that the double-quoted literal string "barney" means the same six-character string to Perl as does the single-quoted literal string 'barney'. It's like what we saw with numeric literals, where we saw that 0377 was another way to write 255.0. Perl lets you write the literal in the way that makes more sense to you. Of course, if you wish to use a backslash escape (like \n to mean a newline character), you'll need to use the double quotes.

The backslash can precede many different characters to mean different things (generally called a backslash escape). The nearly complete[49] list of double-quoted string escapes is given in Table 2-1.

[49]Recent versions of Perl have introduced "Unicode" escapes, which we aren't going to be talking about here.

Table 2-1. Double-quoted string backslash escapes

Construct

Meaning

\n

Newline

\r

Return

\t

Tab

\f

Formfeed

\b

Backspace

\a

Bell

\e

Escape (ASCII escape character)

\007

Any octal ASCII value (here, 007 = bell)

\x7f

Any hex ASCII value (here, 7f = delete)

\cC

A "control" character (here, Ctrl-C)

\\

Backslash

\"

Double quote

\l

Lowercase next letter

\L

Lowercase all following letters until \E

\u

Uppercase next letter

\U

Uppercase all following letters until \E

\Q

Quote non-word characters by adding a backslash until \E

\E

Terminate \L, \U, or \Q

Another feature of double-quoted strings is that they are variable interpolated, meaning that some variable names within the string are replaced with their current values when the strings are used. We haven't formally been introduced to what a variable looks like yet, so we'll get back to this later in this chapter.

2.3.3. String Operators

String values can be concatenated with the . operator. (Yes, that's a single period.) This does not alter either string, any more than 2+3 alters either 2 or 3. The resulting (longer) string is then available for further computation or to be stored into a variable. For example:

"hello" . "world"       # same as "helloworld"
"hello" . ' ' . "world" # same as 'hello world'
'hello world' . "\n"    # same as "hello world\n"

Note that the concatenation must be explicitly requested with the . operator, unlike in some other languages where you merely have to stick the two values next to each other.

A special string operator is the string repetition operator, consisting of the single lowercase letter x. This operator takes its left operand (a string) and makes as many concatenated copies of that string as indicated by its right operand (a number). For example:

"fred" x 3       # is "fredfredfred"
"barney" x (4+1) # is "barney" x 5, or "barneybarneybarneybarneybarney"
5 x 4            # is really "5" x 4, which is "5555"

That last example is worth spelling out slowly. The string repetition operator wants a string for a left operand, so the number 5 is converted to the string "5" (using rules described in detail later), giving a one-character string. This new string is then copied four times, yielding the four-character string 5555. Note that if we had reversed the order of the operands, as 4 x 5, we would have made five copies of the string 4, yielding 44444. This shows that string repetition is not commutative.

The copy count (the right operand) is first truncated to an integer value (4.8 becomes 4) before being used. A copy count of less than one results in an empty (zero-length) string.

2.3.4. Automatic Conversion Between Numbers and Strings

For the most part, Perl automatically converts between numbers to strings as needed. How does it know whether a number or a string is needed? It all depends upon the operator being used on the scalar value. If an operator expects a number (like + does), Perl will see the value as a number. If an operator expects a string (like . does), Perl will see the value as a string. So you don't need to worry about the difference between numbers and strings; just use the proper operators, and Perl will make it all work.

When a string value is used where an operator needs a number (say, for multiplication), Perl automatically converts the string to its equivalent numeric value, as if it had been entered as a decimal floating-point value.[50] So "12" * "3" gives the value 36. Trailing nonnumber stuff and leading whitespace are discarded, so "12fred34" * " 3" will also give 36 without any complaints.[51] At the extreme end of this, something that isn't a number at all converts to zero. This would happen if you used the string "fred" as a number.

[50]The trick of using a leading zero to mean a nondecimal value works for literals, but never for automatic conversion. Use hex( )or oct( )to convert those kinds of strings.

[51]Unless you request warnings, which we'll discuss in a moment.

Likewise, if a numeric value is given when a string value is needed (say, for string concatenation), the numeric value is expanded into whatever string would have been printed for that number. For example, if you want to concatenate the string Z followed by the result of 5 multiplied by 7,[52] you can say this simply as:

[52]We'll see about precedence and parentheses shortly.

"Z" . 5 * 7 # same as "Z" . 35, or "Z35"

In other words, you don't really have to worry about whether you have a number or a string (most of the time). Perl performs all the conversions for you.[53] And if you're worried about efficiency, don't be. Perl generally remembers the result of a conversion so that it's done only once.

[53]It's usually not an issue, but these conversions can cause small round-off errors. That is, if you start with a number, convert it to a string, then convert that string back to a number, the result may not be the same number as you started with. It's not just Perl that does this; it's a consequence of the conversion process, so it happens to any powerful programming language.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.