PHP preg_match performance and overcoming maximum limit

Recently, I ran into a problem with PHP PCRE regex matching for no apparent reason:

preg_match("@ <b>(.*?)</b> .* <p>(.*?)</p> @isx", $d, $m);

After some debugging it turned out that PHP doesn’t like regex matches that are longer than a couple of kilobytes. In this case, if there are a couple 100 lines between the closing </b> and the next <p> it will cause preg_match to return without completing, resulting in 0 matches. The fix is to use preg_match_all and replace .* with a | (logical OR):

preg_match_all("@ <b>(.*?)</b> | <p>(.*?)</p> @isx", $d, $m);

However, this is considerably slower (5x slower in my case). It turns out that the simplest approach is also the fastest (faster than both the above).

preg_match("@ <h1>(.*?)</h1> @isx", $d, $m1);
preg_match("@ <p>(.*?)</p> @isx", $d, $m2);

So it’s better to use the last approach when the regex matches a very long string and has multiple matches. Avoid preg_match_all unless you really need it, like for matching all links in a document.

4 Responses



This article is no longer open for comments