File Handles

Next: References and Complex Data Up: Input/Output Previous: Command line arguments Contents Index

File Handles

For input and output to files, rather than the terminal screen or keyboard, a filehandle must be associated with the file name, and subsequent read/write operations done on this filehandle. The association of the filehandle to the physical file is done through an open call:

open(my $fh, 'filename.txt');

where, in this example, $fh is the filehandle associated to filename.txt. This way of opening a file will be in read mode; more generally, one can specify the mode explicitly, as in

open(my $rfh, '<in.txt');  # open in read mode 
open(my $wfh, '>out.txt');  # open in write mode 
open(my $afh, '>>add.txt');  # open in append mode

You must have appropriate permissions to do the requested operations on these files. Note also that, in read mode, the file must exist prior to opening, and in write mode, any existing file of the same name will be overwritten.

For binary files, such as images, on Win32 generally after you open the file handle you should call

   binmode($fh);

for either reading or writing.

For one reason or another, opening a file may fail - for example, you may not have sufficient permission, or, for read mode, the file doesn't exist. Because of this, it is very good practice to use, if appropriate, a die statement to abort the program if the open call fails. The syntax of this is

open(my $fh, 'filename.txt') 
  or die qq{Cannot open filename.txt for reading: $!};

Then, if the open call fails, the program will cease execution, and print out the specified error message. In this message, the special Perl variable $! will be set to the system error message (eg, No such file or directory, in the case of a file not existing).

After opening a file, you can print to it (if opened in write or append mode) by specifying the filehandle in the print statement:

 print $fh qq{This is some text\n};

The printf statement also accepts a filehandle:

my $pi = 3.1415926539052;
printf $fh ( qq{%.5f}, $pi);

To open a file and loop over all lines in the file, the following construction can be used:

  open(my $fh, '<data.txt') or die qq{Cannot open data.txt: $!};
  while( $line = <$fh>) {
    print qq{The line read in is $line};
  }

What the while loop does is cycle through each line of the file, assigning the particular line to the variable $line, until the end of the file is reached, after which the loop is finished. Note that the line contains the newline character; if you want to remove it, use chomp($line);.

Often data files contain, on each line, various types of information in different columns. For example, suppose we have a file music.txt with some information arranged as:

Spears Britney female pop
Lightfoot Gordon male folk

representing the last name, first name, gender, and category of some singers. We can extract the information in each line as follows, using a split function:

  open(my $fh, '<music.txt') or die qq{Cannot open music.txt: $!};
  while( my $line = <$fh>) {
    my ($first_name, $last_name, $gender, $category) =
        split ' ', $line;
     print qq{$first_name $last_name is a $gender who's into $category\n};
  }

The split ' ', $line function takes $line and splits it into a list (of however many elements that turns out to be), with whitespace being used as a separator between elements. More generally,

  my @array = split /$separator/, $line, $number;

will split $line into at most $number fields, based on the pattern contained in $separator as the field separator.

The use of the split function is a very powerful technique in extracting wanted information from some general structure. Matching as a regular expression can also come into play here. For example, suppose we have a directory listing:

04/04/2003  11:56a      <DIR>          .
04/04/2003  11:56a      <DIR>          ..
04/04/2003  11:57a               2,718 ifsa.aux
04/04/2003  11:57a              42,712 ifsa.dvi
04/04/2003  11:57a               5,976 ifsa.log
04/04/2003  11:57a              25,717 ifsa.tex
04/04/2003  11:57a              25,481 ifsa.tex~
04/04/2003  11:57a              13,631 ifsa1.tex
04/04/2003  11:57a             484,374 lena.eps
04/04/2003  11:57a              12,244 lena.jpeg
04/04/2003  11:57a               1,511 lena.pl
04/04/2003  11:57a               1,529 lena.pl~
04/04/2003  11:57a           1,972,444 lenalinear.eps
04/04/2003  11:57a           1,841,056 lenaquadratic.eps
04/04/2003  11:57a           3,369,583 maglin3.eps
04/04/2003  11:57a             136,714 maglin3.jpg
04/04/2003  11:57a           3,369,584 magquad3.eps
04/04/2003  11:57a              92,857 magquad3.jpg
04/04/2003  11:57a               1,919 save.txt
05/16/2003  09:57a               7,852 Resize.pm~
05/16/2003  09:57a                 318 draw.pl~
05/16/2003  09:57a                 413 test.pl
05/16/2003  04:08p               8,271 Resize.pm
05/16/2003  04:11p                 318 draw.pl
05/16/2003  04:19p              20,502 t.jpeg

stored in a file data.txt and wished to extract from this the sizes of all jpg images. One could do this using split:

use strict;
use warnings;
open(my $fh, "<data.txt") or die "Could not open data.txt: $!";
while (my $line = <$fh>) {
  chomp $line;
  my @entries = split ' ', $line;
  next unless $entries[3];
  if ($entries[3] =~ /\.(jpg|jpeg)$/) {
      print "Image $entries[3] has size $entries[2] bytes\n";
  }
}

(the line next unless $entries[3]; is there to check that a file name exists, which does not happen in case the entry is a directory). Alternatively, one could check $line using a regular expression, and capture the appropriate information if a match succeeds:

use strict;
use warnings;
open(my $fh, "<data.txt") or die "Could not open data.txt: $!";
while (my $line = <$fh>) {
  chomp $line;
  if ( $line =~ /([,0-9]+)\s+(\w+\.jpeg|\w+\.jpg)$/ ) {
    my $file = $2;
    my $size = $1;
    print "File $file has size $size bytes\n";
  }
}

(here we use and capture a character class [,0-9]+ consisting of either digits or a comma to extract the file size). Whether to use split in cases like this or regular expressions can be a matter of taste, but often one or the other is most natural.

Next: References and Complex Data Up: Input/Output Previous: Command line arguments Contents Index