You could also use File::Find to find out some other things about files, such as their size. For the callback's convenience, the current working directory is the item's containing directory, and the item's name within that directory is found in $_.
Maybe you have noticed that, in the previous code, $File::Find::name was used for the item's name. So which name is real, $_ or $File::Find::name?
$File::Find::name gives the name relative to the starting directory, but during the callback, the working directory is the one that holds the item just found. For example, suppose that you want find to look for files in the current working directory, so you give it (".") as the list of directories to search. If you call find when the current working directory is /usr, find looks below that directory. When find has located /usr/bin/perl, the current working directory (during the callback) is /usr/bin. $_ holds "perl"; $File::Find::name holds "./bin/perl", which is the name relative to the directory in which you started the search.
All of this means that the file tests, such as -s, automatically report on the just-found item. Although this is convenient, the current directory inside the callback is different from the search's starting directory.
What if you want to use File::Find to accumulate the total size of all files seen? The callback subroutine doesn't support either parameters to be passed in, nor a result returned from the subroutine. But that doesn't matter. When dereferenced, a subroutine reference can "see" all visible lexical variables when the reference to the subroutine is taken. For example:
use File::Find; my $total_size = 0; find(sub { $total_size += -s if -f }, "."); print $total_size, "\n";
As before, the find routine is called with two parameters: a reference to an anonymous subroutine and the starting directory. When names are found within that directory (and its subdirectories), the subroutine is called.
Note that the subroutine accesses the $total_size variable. This variable is declared outside the scope of the subroutine but still visible to the subroutine. Thus, even though find invokes the callback subroutine (and would not have direct access to $total_size), the callback subroutine accesses and updates the variable.
The kind of subroutine that can access all lexical variables that existed at the time it was declared is called a closure (a term borrowed from the world of mathematics).
Furthermore, the access to the variable from within the closure ensures that the variable remains alive as long as the subroutine reference is alive. For example, let's number the output files:[27]
[27]This code seems to have an extra semicolon at the end of the line that assigns to $callback, doesn't it? But remember, the construct sub { ... } is an expression. Its value (a coderef) is assigned to $callback, and there's a semicolon at the end of that statement. It's easy to forget to put the proper punctuation after the closing curly brace of an anonymous subroutine declaration.
use File::Find; my $callback; { my $count = 0; $callback = sub { print ++$count, ": $File::Find::name\n" }; } find($callback, ".");
Here, you declare a variable to hold the callback. This variable cannot be declared within the naked block (the block following that is not part of a larger Perl syntax construct), or it would be recycled at the end of that block. Next, the lexical $count variable is initialized to 0. An anonymous subroutine is then declared, and a reference to it is placed into $callback. This subroutine is a closure because it refers to the lexical $count variable.
At the end of the naked block, the $count variable goes out of scope. However, because it is still referenced by subroutine in $callback, it stays alive, now as an anonymous scalar variable.[28]
[28]To be more accurate, the closure declaration increases the reference count of the referent, as if another reference had been taken explicitly. Just before the end of the naked block, the reference count of $count is two, but after the block has exited, the value still has a reference count of one. Although no other code may access $count, it will still be kept in memory as long as the reference to the sub is available in $callback or elsewhere.
When the callback is invoked from find, the value of the variable formerly known as $count is incremented from 1 to 2 to 3, and so on.
Copyright © 2003 O'Reilly & Associates. All rights reserved.