Complex data structures

Next: Subroutines, functions, and modules Up: References and Complex Data Previous: References Contents Index

Complex data structures

References for scalars, arrays, and hashes are useful in their own right, but the real power for references comes in making up more complex data structures, for which they are absolutely required. A typical example of this is a database of some sort, where one has a number of records (rows) containing various bits of information in certain fields (columns):

 George   Washington  American  President  1234
 Ludwig   Beethoven   German    Composer   5678
 Joan     Arc         French    Saint      2435

Representing this data structure in terms of an array or hash alone isn't possible, but can be done with references.

As a simple example of a more complex data structure, consider the following:

If we had just one number in each record (row), rather than two, then a simple array would be natural - the array index would represent the record (row) number, and the value would be the number in that column. However, here we have two numbers in each row. What we could do though is to associate each row with not a simple number but with an array of two elements. This can be done through references explicitly as

  $aref = [
           [ 0, 0],
           [ 2, 4],
           [ 5, 9],
           [ 7, 32],
           ];

Accessing particular elements uses the arrow operator; for example, $aref->[2]->[0] has a value of 5, and $aref->[3]->[1] has a value of 32.

Looping over all elements of a multidimensional array can be tricky; in the above example, one can use

for my $row_ref ( @$aref) {
  for my $element ( @$row_ref ) {
      print "$element\n";
  }
}

Note that each element of @$aref is itself an array reference. Alternatively, one can use indices:

my $rows = scalar @$aref;
my $columns = scalar @{$aref->[0]};

for (my $i=0; $i<$rows; $i++) {
  for (my $j=0; $j<$columns; $j++) {
      print "Row $i and Column $j has value $aref->[$i]->[$j]\n";
  }
}

Note the manner in which the number of rows and columns of this structure is obtained - scalar @$aref for the rows, and @{$aref->[0]} for the columns (here, the index ``0'' was chosen arbitrarly, but any index would do, assuming each row has the same number of columns).

If one had this data in a file of the form

and wanted to read it in and populate a data structure with it, the following could be used:

my $aref = [ ];   # make $aref an empty reference
open (my $fh, 'data.txt') or die "Cannot open data.txt: $!";
while (my $line = <$fh>) {
  chomp $line;
  my @a = split ' ', $line;
  push @$aref, [ @a ];
}
close $fh;

Note how the array @a is added to $aref via the push @$aref, [ @a ]; call - @$aref forces $aref into an array context, and we are adding to it a single array reference [ @a ].

An analagous procedure can be made for constructing and manipulating higher-dimensional arrays.

In some cases, particularly in the case of a database or spreadsheet, it may be more natural to associate the columns with a hash, rather than an array. For example, suppose we have the data

 George   Washington  American  President  1234
 Ludwig   Beethoven   German    Composer   5678
 Joan     Arc         French    Saint      2435

which holds information on people's last name, first name, nationality, occupation, and social insurance cumber (SIN). Here it's natural to use an array to represent the rows, but for the columns, it would be more convenient to use a hash with appropriately named keys. This data can be used to populate an array of hashes in the following manner:

my $aref = [ ];
open (my $fh, 'data.txt') or die "Cannot open data.txt: $!";
while (my $line = <$fh>) {
  chomp $line;
  my @a = split ' ', $line;
  push @$aref, {first_name  => $a[0], 
                last_name   => $a[1],
                nationality => $a[2],
                occupation  => $a[3],
                sin         => $a[4],
            };
}
close $fh;

where here we push onto @$aref a hash reference, rather than an array reference as was done above. This data can then be printed out using

$count = 0;
for my $row_ref ( @$aref) {
  print "For row $count:\n";
  for my $key ( keys %$row_ref ) {
      print "\tKey $key has value $row_ref->{$key}\n";
  }
  $count++;
}

Depending on the context of the problem, however, it may be more convenient to use a hash of hashes:

my $href = { };   # make $href an empty hash ref
open (my $fh, 'data.txt') or die "Cannot open data.txt: $!";
while (my $line = <$fh>) {
  chomp $line;
  my @a = split ' ', $line;
  $href->{$a[4]} = { first_name  => $a[0], 
                     last_name   => $a[1],
                     nationality => $a[2],
                     occupation  => $a[3],
                 };
}
close $fh;

where here the key $a[4] (the sin) is a unique key that is used to identify the hash reference containing the remaining information. This information may be printed out as:

for my $sin ( keys %$href ) {
  print "For sin $sin:\n";
  for my $key ( keys %{$href->{$sin}} ) {
      print "\tKey $key has value $href->{$sin}->{$key}\n";
  }
}

Other arbirarly complicated data structures can similarly be constructed. It is important to become familiar with both constructing and manipulating such structures, as often approaching a problem with the most efficient data structure will make attacking the rest of the problem that much easier.

Next: Subroutines, functions, and modules Up: References and Complex Data Previous: References Contents Index