Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> If you're OK with the ocassional catastrophically slow regex: https://swtch.com/~rsc/regexp/regexp1.html , sure.

I already addressed that in my original comment. Approaches based NFAs and Brzozowski derivatives don't have these flaws; but you don't need to know anything about how they work to use them.

You just need to read one blog post that tells you to avoid regular expression matchers that use backtracking, and you are good to go. You don't even need to understand why matching via backtracking is bad.

> But you do need to understand the abstraction of strings, code points and so on, if you want to do regexes on unicode that doesn't stop at the ASCII level.

Why?



>You just need to read one blog post that tells you to avoid regular expression matchers that use backtracking, and you are good to go. You don't even need to understand why matching via backtracking is bad.

Yeah, no.

You might not be able to avoid using your standard lib's regex, or your project's chosen regex dependency - based on team/company policy. So it's not as simple as "use a regex engine that doesn't has this flaw".

Then if you want to avoid the cost, you need to know what backtracking is, to the level of understanding which kind of expressions can give you those performance issues.

>Why?

Because there are tons of factors that can affect your regex experience with unicode, normalization, different lower/upper case treatment, composite characters that don't match even though it looks like you typed the same character in your query, handling new unicode characters (ASCII 7/8 bit has been fixed for decades) and so on.


> You might not be able to avoid using your standard lib's regex, or your project's chosen regex dependency - based on team/company policy. So it's not as simple as "use a regex engine that doesn't has this flaw".

Well, yes, if someone forces you to use tools that have flaws, you need to learn about the flaws so you can work around them. Like when using a shoe as a hammer.

I'm not sure that proves anything about abstractions?

See also https://blog.codinghorror.com/the-php-singularity/

> Because there are tons of factors that can affect your regex experience with unicode, normalization, different lower/upper case treatment, composite characters that don't match even though it looks like you typed the same character in your query, handling new unicode characters (ASCII 7/8 bit has been fixed for decades) and so on.

Thanks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: