Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It seems like there should be a way to determine whether a regex can be compiled using the classic O(n) DFA algorithm or with whatever madness PCREs use to support backtracking and so on.

Anybody know if any regex engines attempt this?

Obviously you can still shoot yourself in the foot, but it's somewhat more difficult to do so in a situation like this where the regex in question "looks" cheap.




> It seems like there should be a way to determine whether a regex can be compiled using the classic O(n) DFA algorithm or with whatever madness PCREs use to support backtracking and so on.

Obviously. If the "regex" includes a backreference, it requires backtracking. If it includes only regular operations (I can't call any other nonregular operations that people might expect to mind), it doesn't. This is information that can be trivially surfaced by a parser.


I wonder why regex libraries don't do this then? Isn't /\s+$/ backreference free? Or does including the anchor change that?


It's backreference free. Backtracking to process it is just bad design.

Here's a state machine that recognizes /^.*\s+$/:

1: initial state (non-matching)

2: matching state

all states transition to 2 on \s, and to 1 on \S. Done.

Taking this a little further, we can observe that since the transition table is identical for every state, there's no real need to have states at all -- you get the same effect by just checking the last character.


As an alternative, in Java, there is FindBugs plugin "find-sec-bugs" that flags potentially-long-running regexes.

In my current project, it found two of those in Google Zxing library.


For people who already care enough about this there is Ragel.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: