Hacker News new | past | comments | ask | show | jobs | submit login

> It seems like there should be a way to determine whether a regex can be compiled using the classic O(n) DFA algorithm or with whatever madness PCREs use to support backtracking and so on.

Obviously. If the "regex" includes a backreference, it requires backtracking. If it includes only regular operations (I can't call any other nonregular operations that people might expect to mind), it doesn't. This is information that can be trivially surfaced by a parser.




I wonder why regex libraries don't do this then? Isn't /\s+$/ backreference free? Or does including the anchor change that?


It's backreference free. Backtracking to process it is just bad design.

Here's a state machine that recognizes /^.*\s+$/:

1: initial state (non-matching)

2: matching state

all states transition to 2 on \s, and to 1 on \S. Done.

Taking this a little further, we can observe that since the transition table is identical for every state, there's no real need to have states at all -- you get the same effect by just checking the last character.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: