Book HomeLearning Perl, 3rd EditionSearch this book

A.6. Answers to Chapter 7 Exercises

  1. Here's one way to do it:

    /fred/

    Of course, you have to put that into the test program! This is pretty simple. The more important part of this exercise is trying it out on the sample strings. It doesn't match Fred, showing that regular expressions are case-sensitive. (We'll see how to change that later.) It does match frederick and Alfred, since both of those strings contain the four-letter string fred.. (Matching whole words only, so that frederick and Alfred wouldn't match, is another feature we'll see later.)

    If the test program is working correctly,[388] it should show those two matches as something like |<fred>erick| and |Al<fred>|, using the angle brackets to show where fred was found inside each string.

    [388]If the test program didn't work correctly, you probably didn't download it as we suggested. And you probably didn't test what you typed, as we also suggested. But in that case, you probably didn't do the exercises either; you're just reading these answers in the back of the book, and so the test program (which you didn't actually run) performed flawlessly. In that case, this footnote is pointless.

  2. Here's one way to do it:

    /a+b*/

    That matches the letter a one or more times (that's the plus), followed by b zero or more times (that's the star). Well, that's what the exercise asked for, but you may have come up with something different. After all, if you're looking for any number of b's, you know you'll always find what you're looking for. So you could have written /a+/ instead, and matched the same strings.[389]

    [389]To be sure, you'll match different parts of the strings. But any string that matches /a+b*/ will also match /a+/, and vice versa.

    For that matter, when you want to match one or more a's, you know that the match will succeed when you find even the first one. So, /a/ will match the same set of strings as the first two. The description "any string containing at least one a followed by any number of b's" means the exact same thing as "any string containing a." Of the sample strings, this matches all of them except fred.

    There are even more ways to make this pattern than we show here. Often, in trying to write a pattern, you will need to decide which one of many possible patterns best suits your needs.

  3. Here's one way to do it:

    /\\*\**/

    That's what the text asked for: a backslash (typed twice, since we mean a real backslash[390]) zero or more times (that's the first star), followed by an asterisk (backslashed, since star is a metacharacter) zero or more times (that's the last star). Whew!

    [390]Whenever you mean a real backslash in Perl, type two of them. A lone backslash may try to do something magical, but two of them will always mean a real backslash.

    And what about the sample strings? Did it match any of them? You bet: it matches all of them! It's because the backslashes and asterisks aren't required in the pattern; that is, this pattern can match the empty string. Here's a rule you can rely upon: when a pattern may freely match the empty string, it'll always match, since the empty string can be found in any string. In fact, it'll always match in the first place that you look.

    So, this pattern matches all four characters in \\**, as you'd expect. It matches the empty string at the beginning of fred, which you may not have expected. In the string barney \\\***, it matches the empty string at the beginning. You might wish it would hunt down the backslashes and stars at the end of that string, but it doesn't bother. It looks at the beginning, sees zero backslashes followed by zero asterisks, declares the match a success, and goes home to watch television. And in *wilma\, it matches just the star at the beginning; as you see, this pattern never gets away from the beginning of the string, since it always matches at the first opportunity.

    Now, if someone asked you for a pattern to match any number of backslashes followed by any number of asterisks, you'd be technically correct to give them this one. But chances are, that's not what they really wanted. Spoken languages like English may be ambiguous and not say exactly what they mean, but regular expressions always mean exactly what they say they mean.

    In this case, maybe the person who asked for the pattern forgot to say that he or she always wants to match at least one character, when the pattern matches at all. We can do that. If there's at least one backslash, /\\+\**/ will match. (That's just like what we had before, but there's a plus in place of the first star, meaning one or more backslashes.) If there's not at least one backslash, then in order to match at least one character, we'll need at least one asterisk, so we want /\*+/. When you put those two possibilities together, you get:

    /\\+\**|\*+/

    Ugly, isn't it? Regular expressions are powerful but not beautiful. And they've contributed to Perl being maligned as a "write-only language." To be sure that no one criticizes your code in that way, though, it's kind to put an explanatory comment near any pattern that's not obvious. On the other hand, when you've been using these for a year, you will have a different definition of "obvious" than you have today.

    How does this new pattern work with the sample strings? With \\**, it matches all four characters, just like the last one. It won't match fred, which is probably the right behavior given the problem description. For barney \\\***, it matches the six characters at the end, as you hoped. And for *wilma\, it matches the asterisk at the beginning.

  4. Here's one way to do it:

    while (<>) {
      if (/wilma/) {
        print;
      }
    }

    This is a grep-like program. For each line of text (contained in $_), we check to see whether the pattern matches. If it matches, we print it. This program uses print's default: if you don't tell it to print something else, it prints $_. So we have written a program that uses $_ all the way through, but never mentions it anywhere. Perl folks love to use the defaults and save time typing, so you'll see a lot of programs that do this.

    And if, for extra credit, you wanted to match a capitalized Wilma as well, /wilma|Wilma/ would do the job. Or, more simply, you could have written /(w|W)ilma/. People who have used other regular expression implementations and already know about character classes, which we'll discuss in the next chapter, could make that last one even shorter (and more efficient).[391]

    [391]If you made the whole pattern case-insensitive, shame on you. We haven't learned that yet. Besides, that would match WILMA, which shouldn't match, according to the exercise description.

  5. Here's one way to do it:

    while (<>) {
      if (/wilma/) {
        if (/fred/) {
          print;
        }
      }
    }

    This tests /fred/ only after we find /wilma/ matches, but fred could appear before or after wilma in the line; each test is independent of the other.

    If you wanted to avoid the extra nested if test, you might have written something like this:[392]

    [392]Folks who know about the logical-and operator, which we saw in Chapter 10, "More Control Structures", could do both tests /fred/ and /wilma/ in the same if conditional. That's more efficient, more scalable, and an all-around better way than the ones given here. But we haven't seen logical-and yet.

    while (<>) {
      if (/wilma.*fred|fred.*wilma/) {
        print;
      }
    }

    This works because we'll either have wilma before fred, or fred before wilma. If we had written just /wilma.*fred/, that wouldn't have matched a line like fred and wilma flintstone, even though that line mentions both of them.

    We made this an extra-credit exercise because many folks have a mental block here. We showed you an "or" operation (with the vertical bar, "|"), but we never showed you an "and" operation. That's because there isn't one in regular expressions.[393] If you want to know whether one pattern and another are both successful, just test both of them.

    [393]But there are some tricky and advanced ways of doing what some folks would call an "and" operation. These are generally less efficient than using Perl's logical-and, though, depending upon what optimizations Perl and its regular expression engine can make.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.