PHP preg_match performance and overcoming maximum limit
Recently, I ran into a problem with PHP PCRE regex matching for no apparent reason:
preg_match("@ <b>(.*?)</b> .* <p>(.*?)</p> @isx", $d, $m);
After some debugging it turned out that PHP doesn’t like regex matches that are longer than a couple of kilobytes. In this case, if there are a couple 100 lines between the closing
</b> and the next
<p> it will cause
preg_match to return without completing, resulting in 0 matches. The fix is to use
preg_match_all and replace
.* with a
| (logical OR):
preg_match_all("@ <b>(.*?)</b> | <p>(.*?)</p> @isx", $d, $m);
However, this is considerably slower (5x slower in my case). It turns out that the simplest approach is also the fastest (faster than both the above).
preg_match("@ <h1>(.*?)</h1> @isx", $d, $m1); preg_match("@ <p>(.*?)</p> @isx", $d, $m2);
So it’s better to use the last approach when the regex matches a very long string and has multiple matches. Avoid
preg_match_all unless you really need it, like for matching all links in a document.