I've fixed so many bugs using regex, only to have to fix several bugs later. My ...

mattmanser · on July 20, 2016

Regexes are almost always a massive code smell. They should almost never be used, bad idea, bad implementation, hard to grok, hard to debug, hard to test, hard to spot.

Whoever came up with them has surely been given the same honorary place in hell with Jon Postel, who invented the utterly disastrous "be liberal in what you accept", that has plagued all web developers for the last 20 years.

gnuvince · on July 21, 2016

> Regexes are almost always a massive code smell. They should almost never be used

Regular expressions are just a notation for expressing a finite state automata that recognizes a regular language: if that's the problem that you have, regular expressions are definitely the tool you want to be using. For instance, I recently made myself a small tool to compute the min/max/expected value of a dice roll; the notation for such a thing forms a regular language that can be expressed as follows:

    non_zero_digit ::= '1' | ... | '9'
    digit ::= '0' | non_zero_digit
    integer ::= non_zero_digit digit*
    modifier ::= '+' integer | '-' integer
    roll ::= integer 'd' integer modifier?

Converting this grammar to a regular expression is straight-forward and was the correct tool to use.

I agree that regular expressions are often used in contexts where they should not, especially when people start using back-references to recognize non-regular languages, but don't throw out a perfectly good tool because sometimes it is ill-suited.

DasIch · on July 21, 2016

Regexes are perfectly fine. Awful implementations that accept patterns that aren't regular expressions without complaint and provide little to no tools to look at underlying automata or to step through them during execution are the problem.

It's quite an amazing problem to have honestly because it really shouldn't be a problem to create a proper implementation. You can learn the theory behind regular expression in a few hours and know pretty much everything there is to know.

flukus · on July 21, 2016

How are they hard to test? Given input x, expect output y. It's one of the easiest things in the world to test.

Niksko · on July 21, 2016

The point of this post is that even though the regex behaves correctly (input x produces expected output y), you also need to consider performance constraints.

Without fuzzing, it's going to be pretty difficult to come up with enough test cases to thoroughly test a regex.

qw · on July 22, 2016

Wouldn't the same apply to a custom method? Why would code using a combination of indexOf(), substring() and trim() be any less foolproof on arbitrary data?

mattmanser · on July 21, 2016

Yeah, testing what you expect in regexes is super easy. Edge case testing is not though. Just like exactly what happened in the post this discussion is about..

hinkley · on July 21, 2016

Where do you think you are, right now?

wpears · on July 21, 2016

Where x can be any combination of letters of any length?

pmarreck · on July 21, 2016

This is a load of BS. They are quite capable (and fast) tools in the right hands. And they are easily (and SHOULD be) tested, in any test suite.

As proof, I submit some email header parsing code which I rewrote as a well-commented Regex which was something like 300x faster than using the Mail gem: https://github.com/pmarreck/ruby-snippets/blob/master/header...

Once you know what to look for re: catastrophic backtracking, you know how to avoid it. This is called programmer skill.

wruza · on July 22, 2016

>"be liberal in what you accept"

... is a part of so-called unix philosophy, the CAUSE of the internet. The fact that many web developers forgot to become a programmer is just a fun fact changing nothing.

zamalek · on July 21, 2016

I'll write a trivial state machine over using regex any day of the week. They are easier to write and read (especially if complex), aren't a black box in terms of profiling and don't have surprising performance characteristics like regex does. Unless we're talking about a Thompson NFA (which isn't used by many modern stdlibs) or JS, regexes simply do not appear in my belt of solutions.

lmm · on July 21, 2016

And when you actually need complex parsing, parser combinators can express it in a much more maintainable way.