See Also: Home Links Personal Site Blogroll  FriendFeed CV

Tags:

Topic Image

Understanding Greediness

This talk was presented by Jacinta Richardson of Perl training Australia and looked at getting around the natural greediness of regular expressions. I use regexes a lot and was aware of the need to override greediness but never really understood what it was about greediness that made it inneficient.

Some times you really do want to match greedily, ie the largest possible match in a string, but occasionally you really only want the 1st possible match in a pattern, not from the 1st to the last which is the default greedy behaviour.

Its tempting to use the '.' metacharacter in regexes which matches any character (except newline) to find any text between to other marker sections of text. For example to find BLADE between 954 and '2003' in the string 954BLADE2003 you could use a regex like /4(.*)2/ which would return BLADE, this is a simple example and a very short string so performance wise is fine, and there is only one match.

What the regex engine actually does in this search is the following..

  • look for 1st occurance of '4'
  • swallow up the rest of the string because of '.*'
  • work backwards one char at a time till we find '2'

As mentioned this works find in this short and contrived example, but what if the string was many KB or MB long. An obvious improvment is to use something like /4(\w*)2/ which will stop the greedy swallow after the 1st non word character, ie the '2' in this case.

Another improvment if we actually know the length of the target string ('BLADE' in this case which is '5' chars long) is to use something like /4(.{5})/ which fetches exaclty 5 word-chars following the '4'. This is faster apparently coz we're not checking type in the regex where just fetching raw characters of a fixed length.

This jump-to-line-end and step back approach that regex engines (apparently they all behave similarly) was interesting to understand a little better, Jacinta called it the "Are we there yet" syndrome making reference to the way kids ask that all the time in the car as the journey grinds along.


See Also: OSDC 2006 | Web Development | Notes Index