I (and most people I talk to about this stuff) tend to use Regex101 (https://regex101.com) for this purpose. It will be interesting to spend some time with RegExr and compare the two tools.
I am also aware of RegexBuddy (https://www.regular-expressions.info/regexbuddy.html), whose author also publishes very good regex learning content on their site. It looks great, but it's a closed-source Windows-only application, which means it's something I'll never be able to benefit from.
It has many more features than regex101 AND it keeps all data local. Maybe regex101 keeps things locally too, but I have to run it in a browser and I'm not about to put sensitive test data there.
That being said, regex101 is very well done, but I paid $39 for RegexBuddy 14 years ago (and price is still the same) and last year paid $19 for the optional upgrade from v3 to v4...but I really only did that to support the developer. v3 still met all my needs.
https://www.regular-expressions.info/ is the best place to learn though, not because of the tools, but because the text is so good, so clear, you can learn without helper tools, just by reading, and become a regex master in one day.
Last time a regex conversation came up on HN someone turned me on to https://regexcrossword.com - which is good fun if you're someone who enjoys regex :)
I’ve thought for a while that regex could make for a good competition. Given an input, write a regex that will get some output. Multiple competitors get the same test, first one to produce the output wins. In a tie, shortest regex wins :)
Would make for an entertaining show as the various malformed regexii produce partially correct outputs.
This one is my go-to. But it would be a lot better if it didn't display an alert when you try to leave the site. I've started to try and find alternatives.
I'd like a tool similar to this but for sed or awk usage. Like to see interactively what would be the output for given input and command. Particularly for the toolchain distributed with Ubuntu. I believe demand for such tool would be greater than just me. Let me know if you know one!
<this is a repost, but I like to spread the good word when the opportunity avails itself>
I think one reason why most people have a hard time reading regex is because they don't use any indentation or linebreaks.
Honestly, if a buddy came to you and asked you to help him debug a javascript method and all 15 statements were on the same line, would you offer to help him, or tell him to fix his shit first so you can read it?
What if it was all on one line and his variable names were all "v1", "v2", etc. Would you help him then? fuck no. And yet, this is standard operating procedure with regex, except you don't even get "v1", "v2" because nothing is labeled at all. v1/v2/... would be an improvement!
This is how most people write a simple date regex:
\d{1,2}/\d{1,2}/(\d{4}|\d{2})
And mind you, this is a very simple scenario. Here is how you would write it if you treated it like actual code:
(?<month>\d{1,2})
/
(?<day>\d{1,2})
/
(?<year>\d{4}|\d{2})
First off, you can know what my intent is when I'm capturing each group. Maybe this code gets used by a european where the month and day switch places. They can figure out how to fix it in like two seconds. Secondly, the forward slashes are not lost in a sea of characters anymore because we use whitespace like a civilized developer, not a regex savage.
If you want to keep things simple with regular expressions:
* Be liberal with what your pattern matches and use a normal programming language for your complicated conditional logic to filter out crap you don't want
* Don't be afraid to break up the search with multiple regular expressions
* Ignore pattern whitespace and use it to visually break up your pattern. Nobody would agree to debug javascript that has been minimized, yet people do this all the time with regex
* For the love of all that is holy, USE NAMED GROUPS. It is a fantastic way to document your intent.
Part of the reason for this is that many languages don’t support these features. For example JavaScript supports neither extra white space or named capture groups in it’s regarded.
I love Rubular and use it often. The "Regex Quick Reference" block is so incredibly useful too when you don't completely remember that one thing.
However it is closed source, and it sends your info to the server for processing. This is of course not (likely) nefarious, but it does give me a little pause, and means that you should never use it if the data you are putting into it is sensitive!. For example, don't test your API key parsing regex on it with real/active keys! Also for corporate dev you are (probably) violating company policy by using it.
Rubulex[1] is a neat open source clone that I use a lot. The only downside is I have to start it locally. One of these days I'll stand up a permanent instance, though I don't want to do that without auditing the code and I simply haven't had time to do that. If anyone has done so I'd love to hear about it. Scriptular[2] is an open source clone that uses javascript:
Side note: If anyone knows of or wants to build an Elixir regex tool in Phoenix/LiveView, I'd be willing to collaborate a bit and willing to host/maintain (and I'll pay for the VM/domain). I've already got some Phoenix apps running in prod so if you get it to work with `mix phx.server` I can take it from there (I know that operationalizing and devops isn't what usually interests most people). A non LiveView version would be cool too, but it seems like such a cool project to build with LiveView to be super responsive and show off what you can do (and also learn Elixir/Phoenix/LiveView with a simple app). Would be really neat to release an elixir-desktop[3] version too! My email is in my profile if you are interested.
I’m sure that these resources are great. And it’s not their fault that the regex family of languages evolved in the way that they did. But the simpler regex languages (without backreferences and other stuff… that I might not even know about) seem simple at first glance. In a perfect world I want to just spend and hour internalizing them forever. But in practice it seems that doubt always grips me, mostly because of the meta-syntax problem: did I unintentionally use some metacharacter in this part of the string which I meant to be “fixed”? So then I feel I have to “validate” it with some external tool. And suddenly it feels like this seemingly terse and agile language is just making me second-guess myself.
> did I unintentionally use some metacharacter in this part of the string which I meant to be “fixed”?
When in doubt, just throw a backslash in front of it, which always means "the next character is to be interpreted literally," even in cases where it's not necessary.
(Well, not always; the backslash will invoke a special character when thrown before some letters; eg, "\t" means the tab character. But normal letters never need to be escaped; just punctuation.)
Fun fact: Rust's regex crate won't let you do this. If you try to escape a character that isn't a meta character, you get an error. So in cases like this, it will erase your doubt.
(There is ongoing discussion about relaxing this rule for some characters, since it is so common in some cases. For example, escaping / so common that folks try to do it with the regex crate and are surprised when it returns an error. / is rarely a regex meta character, rather, it tends to denote the start and stop of regexes, e.g., in Javascript or sed.)
I think regexr does a better job of helping you to understand exactly what is happening in the regex. I typically reach for debuggex when I already have a decent understanding of how to accomplish what I want and just want a simple way to edit & test, I think the interface is less busy for that case.
Regexr was great. It's a shame development has stalled. I've started using regex101 instead, mostly because of support for more than javascript and php. There is plenty of opportunity to add some additional languages through webassembly but sadly I don't think this is gonna happen.
Is there a list of regex patterns for common usecases like imei/geo cordinates etc. somewhere . My google searches are leading me either to regex tutorial sites or regex libraries. There are handful of results for emails/url etc. but not getting exhaustive list.
Do geographical coordinates have a standard layout? I'd expect that you have to look at your particular data for cases like this.
Something like IMEI should be pretty easy, if Wikipedia [0] is to be trusted (e.g. in Python):
# Matches IMEI and IMEISV
imei_pattern = re.compile(r"\d{2}-\d{6}-\d{6}-\d\d?")
You could write a big monster pattern that sets up capture groups for all the different TAC and Check Digit variants, but why bother? Just slice off what you need from the result after matching.
RegExr is my go to tool for testing regex. It’s always been able to solve my needs and I’ve never had any need to change. In saying that, I’ve always wondered if regexr is missing out on something that other regex build/test tools have?
I typically use Regex101. It's been good enough for my needs and I'm just used to it at this point. Looking at RegExr, it seems like the only big difference is that Regex101 supports substitution, though I think I've only used it once.
Ah, I see it now. Usually when I do replace/substitution it's in a larger project so it's not a feature I typically use on the site. Most of my Regex101 use is just figuring out why a regex I wrote isn't executing like I expected.
I normally use the replace/substitution in situations such as:
I have list of, say, 1000+ things/rows in a single column. I want to put them in a single line and separate by comma. It is is pretty easy to go there and substitute \n for ", ".
That said, that's also doable in a text editor, but I like that I can visualize the pattern and results more easily.
- code generator, with support for a lot of languages,
- a complete quick reference with examples,
- an extensive regex library,
- a regex quiz for "golfing" and learning purposes,
Perhaps, most importantly, it runs entirely client side and does not submit any information to the server unless you hit save (which returns a delete link to remove all data). You can even run the website and (most) of its features offline.
Regexr submits all input to the server for processing.
I use regex101.com, that looks a lot like RegExr. I remember using RegExr at some point, and I think I default to regex101 just because the name is a bit easier to remember. I'll try to remember to switch to RegExr since it's open source.
Just thinking out loud... can't I just ask the google "Hey Google, in PERL, show me a regex to find the first occurrence of a semicolon to the next period."?
Regex seems like a good use case for GPT3. Most people that I've seen use regex use it so rarely that they end up having to relearn the syntax each time they use it.
Validating regex with some visualization is cool and all...but I actually value my time. I want a WYSIWYG regex builder, not a tool that helps me learn it.
I am also aware of RegexBuddy (https://www.regular-expressions.info/regexbuddy.html), whose author also publishes very good regex learning content on their site. It looks great, but it's a closed-source Windows-only application, which means it's something I'll never be able to benefit from.