Book HomeLearning Perl, 3rd EditionSearch this book

13.3. Links and Files

To understand more about what's going on with files and directories, it helps to understand the Unix model of files and directories, even if your non-Unix system doesn't work in exactly this way. As usual, there's more to the story than we're able to explain here, so check any good book on Unix internal details if you need the full story.

A mounted volume is a hard disk drive (or something else that works more-or-less like that, such as a disk partition, a floppy disk, a CD-ROM, or a DVD-ROM). It may contain any number of files and directories. Each file is stored in a numbered inode, which we can think of as a particular piece of disk real estate. One file might be stored in inode 613, while another is in inode 7033.

To locate a particular file, though, we'll have to look it up in a directory. A directory is a special kind of file, maintained by the system. Essentially, it is a table of filenames and their inode numbers.[286] Along with the other things in the directory, there are always two special directory entries. One is . (called "dot"), which is the name of that very directory; and the other is .. ("dot-dot"), which is the directory one step higher in the hierarchy (i.e., the directory's parent directory).[287]

[286]On Unix systems (others don't generally have inodes, hard links, and such), you can use the ls command's -i option to see files' inode numbers. Try a command like ls -ail. When two or more inode numbers are the same for multiple items on a given filesystem, there's really just one file involved, one piece of the disk.

[287]The Unix system root directory has no parent. In that directory, .. is the same directory as ., which is the system root directory itself.

Figure 13-1 provides an illustration of two inodes. One is for a file called chicken, and the other is Barney's directory of poems, /home/barney/poems, which contains that file. The file is stored in inode 613, while the directory is stored in inode 919. (The directory's own name, poems, doesn't appear in the illustration, because that's stored in another directory.) The directory contains entries for three files (including chicken) and two directories (one of which is the reference back to the directory itself, in inode 919), along with each item's inode number.

Figure 13-1

Figure 13-1. The chicken before the egg

When it's time to make a new file in a given directory, the system adds an entry with the file's name and the number of a new inode. How can the system tell that a particular inode is available, though? Each inode holds a number called its link count. The link count is always zero if the inode isn't listed in any directory, so any inode with a link count of zero is available for new file storage. When the inode is added to a directory, the link count is incremented; when the listing is removed, the link count is decremented. For the file chicken as illustrated above, the inode count of 1 is shown in the box above the inode's data.

But some inodes have more than one listing. For example, we've already seen that each directory entry includes ., which points back to that directory's own inode. So the link count for a directory should always be at least two: its listing in its parent directory and its listing in itself. In addition, if it has subdirectories, each of those will add a link, since each will contain ...[288] In Figure 13-1, the directory's inode count of 2 is shown in the box above its data. A link count is the number of true names for the inode.[289]

[288]This implies that the link count of a directory is always equal to two plus the number of directories it contains. On some systems that's true, in fact, but some other systems work differently.

[289]In the traditional output of ls -l, the number of hard links to the item appears just to the right of the permission flags (like "-rwxr-xr-x"). Now you know why this number is more than one for directories and nearly always 1 for ordinary files.

Could an ordinary file inode have more than one listing in the directory? It certainly could. Suppose that, working in the directory shown above, Barney uses the Perl's link function to create a new link:

link "chicken", "egg"
  or warn "can't link chicken to egg: $!";

This is similar to typing "ln chicken egg" at the Unix shell prompt. If link succeeds, it returns true. If it fails, it returns false and sets $!, which Barney is checking in the error message. After this runs, the name egg is another name for the file chicken, and vice versa; neither name is "more real" than the other, and (as you may have guessed) it would take some detective work to find out which came first. Figure 13-2 shows a picture of the new situation, where there are two links to inode 613.

Figure 13-2

Figure 13-2. The egg is linked to the chicken

These two filenames are thus talking about the same place on the disk. If the file chicken holds 200 bytes of data, egg holds the same 200 bytes, for a total of 200 bytes (since it's really just one file with two names). If Barney appends a new line of text to file egg, that line will also appear at the end of chicken.[290]

[290]If you experiment with making links and changing text files, be aware that most text editors don't edit the file "in place" but instead save a modified copy. If Barney were to edit egg with a text editor, he'd most likely end up with a new file called egg and the old file called chicken -- two separate files, rather than two links to the same file.

Now, if Barney were to accidentally (or intentionally) delete chicken, that data will not be lost -- it's still available under the name egg. And vice versa: if he were to delete egg, he'd still have chicken. Of course, if he deletes both of them, the data will be lost.[291]

[291]Although the system won't necessarily overwrite this inode right away, there's no easy way in general to get the data back once the link count has gone to zero. Have you made a backup recently?

There's another rule about the links in directory listings: the inode numbers in a given directory listing all refer to inodes on that same mounted volume.[292] This rule ensures that if the physical medium (the diskette, perhaps) is moved to another machine, all of the directories stick together with their files. That's why you can use rename to move a file from one directory to another, but only if both directories are on the same filesystem (mounted volume). If they were on different disks, the inode's data would have to be relocated, which is too complex an operation for a simple system call.

[292]The one exception is the special .. entry in the volume's root directory, which refers to the directory in which that volume is mounted.

And yet another restriction on links is that they can't make new names for directories. That's because the directories are arranged in a hierarchy. If you were able to change that, utility programs like find and pwd could easily become lost trying to find their way around the filesystem.

So, links can't be added to directories, and they can't cross from one mounted volume to another. Fortunately, there's a way to get around these restrictions on links, by using a new and different kind of link: a symbolic link .[293] A symbolic link (also called a soft link to distinguish it from the true or hard links that we've been talking about up to now) is a special entry in a directory that tells the system to look elsewhere. Let's say that Barney (working in the same directory of poems as before) creates a symbolic link with Perl's symlink function, like this:

[293]Some veryold Unix systems don't support symlinks, but those are pretty rare nowadays.

symlink "dodgson", "carroll"
  or warn "can't symlink dodgson to carroll: $!";

This is similar to what would happen if Barney used the command "ln -s dodgson carroll" from the shell. Figure 13-3 shows a picture of the result, including the poem in inode 7033.

Figure 13-3

Figure 13-3. A symlink to inode 7033

Now if Barney chooses to read /home/barney/poems/carroll, he gets the same data as if he had opened /home/barney/poems/dodgson, because the system follows the symbolic link automatically. But that new name isn't the "real" name of the file, because (as you can see in the diagram) the link count on inode 7033 is still just one. That's because the symbolic link simply tells the system, "If you got here looking for carroll, now you want to go off to find something called dodgson instead."

A symbolic link can freely cross mounted filesystems or provide a new name for a directory, unlike a hard link. In fact, a symbolic link could point to any filename, one in this directory or in another one -- or even to a file that doesn't exist! But that also means that a soft link can't keep data from being lost as a hard link can, since the symlink doesn't contribute to the link count. If Barney were to delete dodgson, the system would no longer be able to follow the soft link.[294] Even though there would still be an entry called carroll, trying to read from it would give an error like file not found. The file test -l 'carroll' would report true, but -e 'carroll' would be false: it's a symlink, but it doesn't exist.

[294]Deleting carroll would merely remove the symlink, of course.

Since a soft link could point to a file that doesn't yet exist, it could be used when creating a file as well. Barney has most of his files in his home directory, /home/barney, but he also needs frequent access to a directory with a long name that is difficult to type: /usr/local/opt/system/httpd/root-dev/users/staging/barney/cgi-bin. So he sets up a symlink named /home/barney/my_stuff, which points to that long name, and now it's easy for him to get to it. If he creates a file (from his home directory) called my_stuff/bowling, that file's real name is /usr/local/opt/system/httpd/root-dev/users/staging/barney/cgi-bin/bowling. Next week, when the system administrator moves these files of Barney's to /usr/local/opt/internal/httpd/www-dev/users/staging/barney/cgi-bin, Barney just repoints the one symlink, and now he and all of his programs can still find his files with ease.

It's normal for either /usr/bin/perl or /usr/local/bin/perl (or both) to be symbolic links to the true Perl binary on your system. This makes it easy to switch to a new version of Perl. Say you're the system administrator, and you've built the new Perl. Of course, your older version is still running, and you don't want to disrupt anything. When you're ready for the switch, you simply move a symlink or two, and now every program that begins with #!/usr/bin/perl will automatically use the new version. In the unlikely case that there's some problem, it's a simple thing to replace the old symlinks and have the older Perl running the show again. (But, like any good admin, you notified your users to test their code with the new /usr/bin/perl-7.2 well in advance of the switch, and you told them that they can keep using the older one during the next month's grace period by changing their programs' first lines to #!/usr/bin/perl-6.1, if they need to.)

Perhaps suprisingly, both hard and soft links are very useful. Many non-Unix operating systems have neither, and the lack is sorely felt. On some non-Unix systems, symbolic links may be implemented as a "shortcut" or an "alias" -- check the perlport manpage for the latest details.

To find out where a symbolic link is pointing, use the readlink function. This will tell you where the symlink leads, or it will return undef if its argument wasn't a symlink:

my $where = readlink "carroll";             # Gives "dodgson"

my $perl = readlink "/usr/local/bin/perl";  # Maybe tells where perl is

You can remove either kind of link with unlink -- and now you see where that operation gets its name. unlink simply removes the directory entry associated with the given filename, decrementing the link count and thus possibly freeing the inode.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.