Access Substrings

 

Overview

In Perl strings are a basic data type, which means that one must use functions like unpack or substr to access individual characters or a portion of the string.

 

substr

To access or modify just a portion of a string use the substr function:

$value = substr($string, $offset, $count);
$value = substr($string, $offset);
    
substr($string, $offset, $count) = $newstring;
substr($string, $offset)         = $newtail;

The offset argument to substr indicates the start of the substring you're interested in, counting from the front if positive and from the end if negative. If offset is 0, the substring starts at the beginning. The count argument is the length of the substring.

$string = "The quick brown fox";

### +012345678901234567890  Indexing forwards  (left to right)
### 109876543210987654321- Indexing backwards (right to left)
### note that 0 means 10 or 20, etc. above

$first  = substr($string, 0, 1);  # "T"
$start  = substr($string, 4, 5);  # "quick"
$rest   = substr($string, 10);    # "brown fox"
$last   = substr($string, -1);    # "x"
$end    = substr($string, -3);    # "fox"
$piece  = substr($string, -9, 5); # "brown"

You can do more than just look at parts of the string with substr; you can actually change them. That's because substr is a particularly odd kind of function - an lvaluable one, that is, a function that may itself be assigned a value. (For the record, the others are vec, pos, and as of the 5.004 release, keys. If you squint, local and my can also be viewed as lvaluable functions.)

$string = "The quick brown fox";
print $string;
The quick brown fox
substr($string, 4, 8) = "lazy"; 
The lazy brown fox
substr($string, -3)  = "dog";
This lazy brown dog
substr($string, 0, 3) = "";       
lazy brown dog
substr($string, -9)  = "";       
lazy

You can test substrings with =~

if (substr($string, -10) =~ /pattern/) 
{
    print "Pattern matches in last 10 characters\n";
}

### Substitute "The" for "My", restricted to first five characters

substr($string, 0, 5) =~ s/The/My/g;

You can swap values by using several substrs on each side of an assignment:

### Exchange the first and last letters in a string

$a = "make a hat";

(substr($a,0,1), substr($a,-1)) = (substr($a,-1), substr($a,0,1));
print $a;
take a ham

 

unpack

Unpack is considerably faster than substr, but does not offer the ability to directly modify substrings.

### get a 20-byte string, skip 30, then grab 2 8-byte strings, then the rest

($leading, $s1, $s2, $trailing) = unpack("A20 x30 A8 A8 A*", $data);

#### split at 20 byte boundaries
@fivers = unpack("A20" x (length($string)/20), $string);

# chop string into individual characters
@chars  = unpack("A1" x length($string), $string);

The unpack function uses a lowercase "x" with a count to skip forward some number of bytes and an uppercase "X" with a count to skip backward some number of bytes.

# extract column with unpack
$a = "To be or not to be";
$b = unpack("x6 A6", $a);  # skip 6, grab 6
print $b;
or not

($b, $c) = unpack("x6 A2 X5 A2", $a); # forward 6, grab 2; backward 5, grab 2
print "$b\n$c\n";
or
be

Sometimes you prefer to think of your data as being cut up at specific columns. For example, you might want to place cuts right before positions 8, 14, 20, 26, and 30. Those are the column numbers where each field begins. Although you could calculate that the proper unpack format is "A7 A6 A6 A6 A4 A*", this is too much mental strain for the virtuously lazy Perl programmer. Let Perl figure it out for you. Use the cut2fmt function below:

sub cut2fmt {
    my(@positions) = @_;
    my $template   = '';
    my $lastpos    = 1;
    foreach $place (@positions) {
        $template .= "A" . ($place - $lastpos) . " ";
        $lastpos   = $place;
    }
    $template .= "A*";
    return $template;
}

$fmt = cut2fmt(8, 14, 20, 26, 30);
print "$fmt\n";
A7 A6 A6 A6 A4 A*