next up previous contents index
Next: if() { } and Up: Comparisons Previous: Sorting   Contents   Index


Regular Expressions

At times you may want to phrase a condition not in terms of an exact match, but rather a partial match, in some sense - for example, do something if a file ends with an txt extension, irrespective of the name of the file. For this one can use a regular expression.

We won't get into a comprehensive discussion of regular expressions - whole books can, and have, been written on them - but give just enough of the basics to whet one's appetitite, and appreciate their power. A condition using a regular expression can be phrased as follows:

if ($some_string =~ /some_regular_expression/) {
  do something;
}
where ``some_regular_expression'' is an expression defining the regular expression to match. This can be composed out of This list is not exhaustive. You can also use quantifiers after these expressions: Finally, you can put an i modifier after the expression to perform a case-insensitive test (for example, /ball/i will match Baseball and also FooTBALL).

Note that if you want to test for the presence of some special characters literally (like *, ., \, $, ... that have a special meaning within a regular expression), you have to escape them using the \ operator. For example, to match a literal *, use /&sstarf#star;/, or use / \// to match a literal /. To aid in the readability, Perl allows you to write a regular expression /some_regular_expression/ using a delimeter other than /; for example, / \// can be written as m!/!.

As well as their use in conditionals, regular expressions are very powerful when used in string substitutions:

   $string =~ s/old_pattern/new_pattern/;
which will substitute old_pattern for new_pattern in $string. For example,
  my $string = 'My favorite color is red';
  $string =~ s/color/colour/;
will leave $string as My favorite colour is red. As with matching, the i modifier can be used to do case-insensitive substitutions, and the g modifier can be used to specify that the substitution should be done as many times as possible:
  my $string = 'My favorite color is red';
  $string =~ s/or/our/g;
will leave $string as My favourite colour is red. Round brackets can be used in old_pattern to capture the enclosing characters for use in new_pattern, as in
  my $name = 'Margaret Smith';
  $name =~ s/^(\w+) (\w+)/$2, $1/;
will leave $name as Smith, Margaret.

It takes lots of practice (and frustration) to become proficient using regular expressions. One thing to be aware of is that, in Perl, regular expressions are greedy - they match as much as possible. However, once you become somewhat familiar with regular expressions, you will quickly see their power - imagine, with a 10 line Perl script, being able to go through a list of files in a directory and, for each file, substituting all occurrences of 2002 by 2003.


next up previous contents index
Next: if() { } and Up: Comparisons Previous: Sorting   Contents   Index