Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Most simple regular expression evaluators are basically state machines. They hold a few variables:

a) what part of the regex am I currently trying to match

b) what point in the string am I currently starting at

c) how much of the string has this piece of the regex consumed so far

Then the state machine basically has three transitions:

* if (b+c) terminally matches (a), increment both (a) and (b) and reset (c)

* if (b+c) matches (a), but more could be consumed, increment (c) and check again

* if (b+c) doesn't match (a), increment (b) and reset (c)

So yeah, you could put in short cuts for things like "well this didn't match because the next piece of the regex is matching on a line ending", but keeping it simple is straightforward and generally smiled upon.



If you are really using a state machine and every match in the regex is at the end of the haystack, then you can reverse the state machine and use the same algorithm you described in reverse. :-)

(Rust's regex engine does this.)




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: