Outlines are a simple (and thus popular) way of structuring data. The hierarchy of detail implied by an outline maps naturally to our top-down way of thinking about the world. The only problem is that it's not obvious how to represent outlined data as a Perl data structure.
Take, for example, this simple outline of some musical genres:
Alternative .Punk ..Emo ..Folk Punk .Goth ..Goth Rock ..Glam Goth Country .Old Time .Bluegrass .Big Hats Rock .80s ..Big Hair ..New Wave .60s ..British ..American
Here we use a period to indicate a subgroup. There are many different formats in which that outline could be output. For example, you might write the genres out in full:
Alternative Alternative - Punk Alternative - Punk - Emo Alternative - Punk - Folk Punk Alternative - Goth ...
You might number the sections:
1 Alternative 1.1 Punk 1.1.1 Emo 1.1.2 Folk Punk 1.2 Goth ...
or alphabetize:
Alternative Alternative - Goth Alternative - Goth - Glam Goth Alternative - Goth - Goth Rock Alternative - Punk Alternative - Punk - Emo ...
or show inheritance:
Alternative Punk - Alternative Emo - Punk - Alternative Folk Punk - Punk - Alternative Goth - Alternative Goth Rock - Goth - Alternative ...
These transformations are all much easier than it might seem. The trick is to represent the levels of the hierarchy as elements in an array. For example, you'd represent the third entry in the sample outline as:
@array = ("Alternative", "Goth", "Glam Goth");
Now reformatting the entry is trivial. There's an elegant way to parse the input file to get this array representation:
while (<FH>) { chomp; $tag[$in = s/\G\.//g] = $_; # do something with @tag[0..$in] }
The substitution deletes leading periods from the current entry, returning how many it deleted. This number indicates the indentation level of the current entry.
Alphabetizing is now simple using the Unix sort program:
$ISA = "-"; open(STDOUT, "|sort -b -t'$ISA' -df"); while (<DATA>) { chomp; $tag[$in = s/\G\.//g] = $_; print join(" $ISA ", @tag[0 .. $in]); } close STDOUT; _ _END_ _ Alternative .Punk ..Emo ..Folk Punk .Goth
Numbering the outline is equally simple:
while (<DATA>) { chomp; $count[$in = s/\G\.//g]++; delete @count[($in+1) .. $#count]; print join(".", @count), " $_"; } _ _END_ _ Alternative .Punk ..Emo ..Folk Punk .Goth ..Goth Rock
Notice that renumbering is our only application where we've deleted elements from the array. This is because we're not keeping names of hierarchy levels in the array; now we're keeping counts. When we go up a level (e.g., from three levels down to a new second-level heading), we reset the counter on the old level.
Copyright © 2003 O'Reilly & Associates. All rights reserved.