When it comes to time-space tradeoffs, Perl nearly always prefers to throw memory at a problem. Scalars in Perl use more memory than strings in C, arrays take more than that, and hashes use even more. While there's still a lot to be done, recent releases have been addressing these issues. For example, as of 5.004, duplicate hash keys are shared amongst all hashes using them, so require no reallocation.
In some cases, using substr() or vec() to simulate arrays can be highly beneficial. For example, an array of a thousand booleans will take at least 20,000 bytes of space, but it can be turned into one 125-byte bit vector--a considerable memory savings. The standard Tie::SubstrHash module can also help for certain types of data structure. If you're working with specialist data structures (matrices, for instance) modules that implement these in C may use less memory than equivalent Perl modules.
Another thing to try is learning whether your Perl was compiled with the system malloc or with Perl's builtin malloc. Whichever one it is, try using the other one and see whether this makes a difference. Information about malloc is in the INSTALL file in the source distribution. You can find out whether you are using perl's malloc by typing perl -V:usemymalloc.
Of course, the best way to save memory is to not do anything to waste it in the first place. Good programming practices can go a long way toward this: * Don't slurp!
Don't read an entire file into memory if you can process it line by line. Or more concretely, use a loop like this:
# # Good Idea # while (<FILE>) { # ... }instead of this:
# # Bad Idea # @data = <FILE>; foreach (@data) { # ... }When the files you're processing are small, it doesn't much matter which way you do it, but it makes a huge difference when they start getting larger.
* Use map and grep selectively
Remember that both map and grep expect a LIST argument, so doing this:
@wanted = grep {/pattern/} <FILE>;will cause the entire file to be slurped. For large files, it's better to loop:
while (<FILE>) { push(@wanted, $_) if /pattern/; }
* Avoid unnecessary quotes and stringification
Don't quote large strings unless absolutely necessary:
my $copy = "$large_string";makes 2 copies of $large_string (one for $copy and another for the quotes), whereas
my $copy = $large_string;only makes one copy.
Ditto for stringifying large arrays:
{ local $, = "\n"; print @big_array; }is much more memory-efficient than either
print join "\n", @big_array;or
{ local $" = "\n"; print "@big_array"; }
* Pass by reference
Pass arrays and hashes by reference, not by value. For one thing, it's the only way to pass multiple lists or hashes (or both) in a single call/return. It also avoids creating a copy of all the contents. This requires some judgment, however, because any changes will be propagated back to the original data. If you really want to mangle (er, modify) a copy, you'll have to sacrifice the memory needed to make one.
* Tie large variables to disk.
For "big" data stores (i.e. ones that exceed available memory) consider using one of the DB modules to store it on disk instead of in RAM. This will incur a penalty in access time, but that's probably better than causing your hard disk to thrash due to massive swapping.