You may find the modules discussed in this section very useful while debugging your code. They will allow you to learn a lot about Perl internals.
Perl optimizes many things away at compile time, which explains why Perl is so fast under mod_perl. If you want to see what code is actually executed at runtime, use the -MO=Deparse Perl option.
For example, if you aren't sure whether Perl will do what you expect it to, it will show you what Perl is really going to do. Consider this trap we discussed earlier:
open IN, "filename" || die $!;
This looks like perfectly valid code, and indeed it compiles without any errors, but let's see what Perl is executing:
panic% perl -MO=Deparse -e 'open IN, "filename" || die $!' open IN, 'filename';
As you can see, the die( ) part was optimized away. open( ) is a list operator (since it accepts a list of arguments), and list operators have lower precedence than the || operator. Therefore, Perl executes the following:
open IN, ("filename" || die $!);
Since in our example we have used "filename", which is a true value, the rest of the expression in the parentheses above is discarded. The code is reduced to:
open IN, "filename";
at compile time. So if the file cannot be opened for some reason, the program will never call die( ), since Perl has removed this part of the statement.
To do the right thing you should either use parentheses explicitly to specify the order of execution or use the low-precedence or operator. Both examples below will do the right thing:
panic% perl -MO=Deparse -e 'open(IN, "filename") || die $!' die $! unless open IN, 'filename'; panic% perl -MO=Deparse -e 'open IN, "filename" or die $!' die $! unless open IN, 'filename';
As you can see, Perl compiles both sources into exactly the same code.
Notice that if the "filename" argument is not true, the code gets compiled to this:
panic% perl -MO=Deparse,-p -e 'open IN, "" || die $!' open(IN, die($!));
which causes the program to die($!) without any reason in $!:
panic% perl -e 'open IN, "" || die $!' Died at -e line 1.
while if we do the right thing, we should see the reason for the open( ) failure:
panic% perl -e 'open IN, "" or die $!' No such file or directory at -e line 1.
Also consider:
panic% perl -MO=Deparse,-p -e 'select MYSTD || die $!' select(MYSTD);
Since select( ) always returns a true value, the right part of the expression will never be executed. Therefore, Perl optimizes it away. In the case of select( ), it always returns the currently selected file handle, and there always is one.
We have used this cool -MO=Deparse technique without explaining it so far. B::Deparse is a backend module for the Perl compiler that generates Perl source code, based on the internal compiled structure that Perl itself creates after parsing a program. Therefore, you may find it useful while developing and debugging your code. We will show here one more useful thing it does. See its manpage for an extensive usage manual.
When you use the -p option, the output also includes parentheses (even when they are not required by precedence), which can make it easy to see if Perl is parsing your expressions the way you intended. If we repeat the last example:
panic% perl -MO=Deparse,-p -e 'open IN, "filename" or die $!' (open(IN, 'filename') or die($!));
we can see the exact execution precedence. For example, if you are writing constructor code that can serve as a class method and an instance method, so you can instantiate objects in both ways:
my $cool1 = PerlCool->new( ); my $cool2 = $cool1->new( );
and you are unsure whether you can write this:
package PerlCool; sub new { my $self = shift; my $type = ref $self || $self; return bless { }, type; }
or whether you have to put in parentheses:
my $type = ref ($self) || $self;
you can use B::Deparse to verify your assumptions:
panic% perl -MO=Deparse,-p -e 'ref $self || $self' (ref($self) or $self);
Indeed, ref( ) has a higher precedence than ||, and therefore this code will do the right thing:
my $type = ref $self || $self;
On the other hand, it might confuse other readers of your code, or even yourself some time in the future, so if you are unsure about code readability, use the parentheses.
Of course, if you forget the simple mathematical operations precedences, you can ask the backend compiler to help you. This one is obvious:
panic% perl -MO=Deparse,-p -e 'print $a + $b * $c % $d' print(($a + (($b * $c) % $d)));
This one is not so obvious:
panic% perl -MO=Deparse,-p -e 'print $a ** -$b ** $c' print(($a ** (-($b ** $c))));
B::Deparse tells it all, but you probably shouldn't leave such a thing in your code without explicit parentheses.
Finally, let's use B::Deparse to help resolve the confusion regarding the statement we saw earlier:
$c = $a > $b and $a < $b ? 1 : 0; panic% perl -MO=Deparse -e '$c = $a > $b and $a < $b ? 1 : 0;' $a < $b ? '???' : '???' if $c = $a > $b; -e syntax OK
Just as we explained earlier, the and operator has a lower precendence than the = operator. We can explicitly see this in the output of B::Deparse, which rewrites the statement in a less obscure way.
Of course, it's worth learning the precedences of the Perl operators from the perlop manpage so you don't have to resort to using B::Deparse.
You can watch your code as it's being compiled and executed by Perl via the -D runtime option. Perl will generate different output according to the extra options (letters or numbers) specified after -D. You can supply one or more options at the same time. Here are the available options for Perl Version 5.6.0 (reproduced from the perlrun manpage):
1 p Tokenizing and parsing 2 s Stack snapshots 4 l Context (loop) stack processing 8 t Trace execution 16 o Method and overloading resolution 32 c String/numeric conversions 64 P Print preprocessor command for -P 128 m Memory allocation 256 f Format processing 512 r Regular expression parsing and execution 1024 x Syntax tree dump 2048 u Tainting checks 4096 L Memory leaks (needs -DLEAKTEST when compiling Perl) 8192 H Hash dump -- usurps values( ) 16384 X Scratchpad allocation 32768 D Cleaning up 65536 S Thread synchronization
Let's look at some of these options. Consider this one-line example:
panic% perl -le '$_="yafoo"; s/foo/bar/; print' yabar
which simply substitutes the string "foo" with the string "bar" in the variable $_ and prints out its value. Now let's see how Perl compiles and executes the regular expression substitution part of the code. We will use Perl's -Dr (or -D512) option:
panic% perl -Dr -le '$_="yafoo"; s/foo/bar/; print' Compiling REx 'foo' size 3 first at 1 rarest char f at 0 1: EXACT <foo>(3) 3: END(0) anchored 'foo' at 0 (checking anchored isall) minlen 3 Omitting $` $& $' support. EXECUTING... Guessing start of match, REx 'foo' against 'yafoo'... Found anchored substr 'foo' at offset 2... Starting position does not contradict /^/m... Guessed: match at offset 2 Matching REx 'foo' against 'foo' Setting an EVAL scope, savestack=3 2 <ya> <foo> | 1: EXACT <foo> 5 <yafoo> <> | 3: END Match successful! yabar Freeing REx: 'foo'
As you can see, there are two stages: compilation and execution. During the compilation stage, Perl records the stages it should go through when matching the string, notes what length it should match for, and notes whether one of the $', $&, or $' special variables will be used.[57] During the execution we can see the actual process of matching. In our example the match was successful.
[57]You should avoid using these at all, since they add a performance hit, and once used in any regular expression they will be set in every other regular expression, even if you didn't ask for them.
The trace doesn't mention the replace segment of the s/// construct, since it's not a part of the regular expression per se.
The -Dx (or -D1024) option tells Perl to print the syntax tree dump. We'll use some very simple code so the execution tree will not be too long to be presented here:
panic% perl -Dx -le 'print 12*60*60' { 6 TYPE = leave = ==> DONE FLAGS = (VOID,KIDS,PARENS) REFCNT = 0 { 1 TYPE = enter = ==> 2 } { 2 TYPE = nextstate = ==> 3 FLAGS = (VOID) LINE = 1 PACKAGE = "main" } { 5 TYPE = print = ==> 6 FLAGS = (VOID,KIDS) { 3 TYPE = pushmark = ==> 4 FLAGS = (SCALAR) } { 4 TYPE = const = ==> 5 FLAGS = (SCALAR) SV = IV(43200) } } }
This code shows us the tree of opcodes after the compilation process. Each opcode is prefixed with a number, which then is used to determine the order of execution. You can see that each opcode is linked to some other opcode (a number following the = = => tag). If you start from the opcode numbered 1, jump to the opcode it's linked to (2, in this example), and continue this way, you will see the execution pass of the code. Since the code might have conditional branches, Perl cannot predetermine a definite order at compile time; therefore, when you follow the execution, the numbers will not necessarily be in sequence.
Of course, internally Perl uses opcode pointers in memory, not numbers. Numbers are used in the debug printout only for our convenience.
Another interesting fact that we learn from this output is that Perl optimizes everything it can at compile time. For example, when you need to know how many seconds are in 12 hours, you could calculate it manually and use the resulting number. But, as we see from:
SV = IV(43200)
Perl has already done the calculation at compile time, so no runtime overhead occurs if you say 12*60*60 and not 43200. The former is also more self-explanatory, while the latter may require an explicit comment to tell us what it is.
Now let's bundle a few other options together and see a subroutine argument stack snapshot via s, context stack processing via l, and trace execution via t all at once:
panic% perl -Dtls -le 'print 12*60*60' => (-e:1) const(IV(12)) => IV(12) (-e:1) const(IV(60)) => IV(12) IV(60) (-e:1) multiply => (-e:1) const(IV(720)) => IV(720) (-e:1) const(IV(60)) => IV(720) IV(60) (-e:1) multiply (-e:1) ENTER scope 2 at op.c:6501 (-e:1) LEAVE scope 2 at op.c:6811 (-e:0) LEAVE scope 1 at perl.c:1319 (-e:0) ENTER scope 1 at perl.c:1327 (-e:0) Setting up jumplevel 0xbffff8cc, was 0x40129f40
You can see how Perl pushes constants 12 and 60 onto an argument stack, executes multiply( ), gets a result of 720, pushes it back onto the stack, pushes 60 again, and executes another multiplication. The tracing and argument stack options show us this information. All this happens at compile time.
In addition, we see a number of scope entering and leaving messages, which come from the context stack status report. These options might be helpful when you want to see Perl entering and leaving block scopes (loops, subroutines, files, etc.). As you can see, bundling a few options together gives very useful reports.
Since we have been using command-line execution rather than code placed in the file, Perl uses -e as the code's filename. Line 0 doesn't exist; it's used for special purposes.
Having finished the compilation, now we proceed to the execution part:
EXECUTING... => (-e:0) enter (-e:0) ENTER scope 2 at pp_hot.c:1535 Entering block 0, type BLOCK => (-e:0) nextstate => (-e:1) pushmark => * (-e:1) const(IV(43200)) => * IV(43200) (-e:1) print 43200 => SV_YES (-e:1) leave Leaving block 0, type BLOCK (-e:0) LEAVE scope 2 at pp_hot.c:1657 (-e:0) LEAVE scope 1 at perl.c:395
Here you can see what Perl does on each line of your source code. So basically the gist of this code (bolded in the example) is pushing the constant integer scalar (const(IV)) onto the execution stack, and then calling print( ). The SV_YESsymbol indicates that print( ) returns a scalar value. The rest of the output consists of controlling messages, where Perl changes scopes.
Of course, as the code gets more complicated, the traces will get longer and trickier to understand. But sometimes these traces can be as indispensable as interactive debugging.
You can use the -D[letter|number] techniques from within mod_perl as well by setting the PERL5OPT environment variable. For example, using the bashshell to see the compilation and execution traces, you can start the server in this way:
panic% PERL5OPT=-Dt ./httpd_perl -X
You will see a lot of output while the server starts. Once it finishes the tracing, open the error_log file and issue a request to your code. The tracing output will show up in this file.
Devel::Peek is a very useful module for looking at the Perl internals. It's especially useful for debugging XS code. With Devel::Peek we can look at Perl variables' data structures. This code:
use Devel::Peek; my $x = 'mod_perl rules'; Dump $x;
prints:
SV = PV(0x804c674) at 0x80571fc REFCNT = 1 FLAGS = (PADBUSY,PADMY,POK,pPOK) PV = 0x805ce78 "mod_perl rules"\0 CUR = 14 LEN = 15
We can see that this variable is a scalar, whose reference count is 1 (there are no other variables pointing to it). Its value is the string "mod_perl rules", terminated by \0 (one more character is used for the string-terminating \0 character, which is handled behind the scenes, transparently to the user), whose length is 15 characters including the terminating \0 character. The data structure starts at 0x80571fc, and its string value is stored starting from the address 0x805ce78.
If you want to look at more complicated structures, such as a hash or an array, you should create references to them and pass the references to the Dump( ) function.
The Apache::Peek module is built for use with mod_perl's Devel::Peek, so you can use it to peek at mod_perl's code internals.
In Chapter 10 we showed a few examples where Devel::Peek and Apache::Peek have been found very useful. To learn about Perl variables' internals, refer to the perlguts manpage.
Devel::Symdump allows us to access Perl's symbol table. This package is object oriented. To instantiate an object, you should provide the name of the package to traverse. If no package is provided as an argument, the main package is used. If the object is created with new( ), Devel::Symdump analyzes only the packages that are given as arguments; if rnew( ) is used, nested modules are analyzed recursively.
Once the object is instantiated, the methods packages( ), scalars( ), arrays( ), hashes( ), functions( ), ios( ), and unknowns( ) can be used. Each method returns an array of fully qualified symbols of the specified type in all packages that are held within a Devel::Symdump object, but without the leading "$", "@", or "%". In a scalar context, they will return the number of such symbols. Unknown symbols are usually either formats or variables that don't yet have defined values.
For example:
require Devel::Symdump; @packs = qw(Devel::Symdump); $obj = Devel::Symdump->new(@packs); print join "\n", $obj->scalars; Devel::Symdump::rnew Devel::Symdump::inh_tree Devel::Symdump::_partdump Devel::Symdump::DESTROY ...more symbols stripped
You may find this package useful to see what symbols are defined, traverse trees of symbols from inherited packages, and more. See the package's manpage for more information.
Apache::Symdump uses Devel::Symdump to record snapshots of the Perl symbol table in ServerRoot/logs/symdump.$$.$n. Here $$ is the process ID and $n is incremented each time the handler is run.
To enable this module, add the following to httpd.conf:
PerlLogHandler Apache::Symdump
This module is useful for watching the growth of the processes and hopefully, by taking steps against the growth, reducing it. One of the reasons for process growth is the definition of new symbols. You can use the diff utility to compare snapshots and get an idea of what might be making a process grow. For example:
panic% diff -u symdump.1892.0 symdump.1892.1
where 1892 is PID. Normally, new symbols come from modules or scripts that were not preloaded, the Perl method cache, and so on. Let's write a simple script that uses DB_File, which wasn't preloaded (see Example 21-13).
use strict; require DB_File; my $r = shift; $r->send_http_header("text/plain"); $r->print("Hello $$\n");
If we issue a few requests and then compare two consecutive request dumps for the same process, nothing happens. That's because the module is loaded on the first request, and therefore from now on the symbol table will be the same. So in order to help Apache::Symdump to help us detect the load, we will require the module only on the second reload (see Example 21-14).
use strict; use vars qw($loaded); require DB_File if defined $loaded; $loaded = 1; my $r = shift; $r->send_http_header("text/plain"); $r->print("Hello $$\n");
Running the diff:
panic% diff symdump.9909.1 symdump.9909.2 |wc -l 301
reveals that there were 301 symbols defined, mostly from the DB_File and Fcntl packages. We can also see what new files were loaded, by applying diff on the incdump.$$.$n files, which dump the contents of %INC after each request:
panic% diff incdump.9909.1 incdump.9909.2 1a2 > /usr/lib/perl5/5.6.1/i386-linux/auto/DB_File/autosplit.ix = /usr/lib/perl5/5.6.1/i386-linux/auto/DB_File/autosplit.ix 21a23 > DB_File.pm = /usr/lib/perl5/5.6.1/i386-linux/DB_File.pm
Remember that Apache::Symdump does not clean up its snapshot files, so you have to do it yourself:
panic% rm logs/symdump.* logs/incdump.*
Apache::Status also uses Devel::Symdump to allow you to inspect symbol tables through your browser.
This module sends what may be helpful debugging information to the client, rather than to the error_log file.
This module specifies only the dump( ) method:
use Apache::Debug ( ); my $r = shift; Apache::Debug::dump($r, "some comment", "another comment", ...);
For example, if we take this simple script:
use Apache::Debug ( ); use Apache::Constants qw(SERVER_ERROR); my $r = shift; Apache::Debug::dump($r, SERVER_ERROR, "Uh Oh!");
it prints out the HTTP headers as received by server and various request data:
SERVER_ERROR Uh Oh! cwd=/home/httpd/perl $r->method : GET $r->uri : /perl/test.pl $r->protocol : HTTP/1.0 $r->path_info : $r->filename : /home/httpd/perl/test.pl $r->allow_options : 8 $s->server_admin : root@localhost $s->server_hostname : localhost $s->port : 8000 $c->remote_host : $c->remote_ip : 127.0.0.1 $c->remote_logname : $c->user : $c->auth_type : scalar $r->args : $r->args: $r->content: $r->headers_in: Accept = image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */* Accept-Charset = iso-8859-1,*,utf-8 Accept-Encoding = gzip Accept-Language = en Connection = Keep-Alive Host = localhost:8000 Pragma = no-cache User-Agent = Mozilla/4.76 [en] (X11; U; Linux 2.2.17-21mdk i686)
The following are a few other modules that you may find of use, but in this book we won't delve deeply into their details:
Apache::DumpHeaders is used to watch an HTTP transaction, looking at the client and server headers.
Apache::DebugInfo offers the ability to monitor various bits of per-request data. Similar to Apache::DumpHeaders.
Devel::StackTrace encapsulates the information that can be found through using the caller( ) function and provides a simple interface to this data.
Apache::Symbol provides XS tricks to avoid a mandatory "Subroutine redefined" warning when reloading a module that contains a subroutine that is eligible for inlining. Useful during development when using Apache::Reload or Apache::StatINC to reload modules.
Copyright © 2003 O'Reilly & Associates. All rights reserved.