You will learn about regular expressions in 55 minutes. To learn how to use regular expressions, in as long as it might take you, it might be better to:
And remember, writing regular expressions can be very difficult, reasoning about your regular expressions can be even more so, defining your problem can be the most difficult of all. Think before you regex.
The most difficult part of regex isn't defining the problem. I'd say that one's easy. The hardest part is figuring out your regex months or years after you've written it!
Hopefully you at least wrap the regex in a function, then it's easier to reason about given the function name. Regexes randomly placed in long functions are harder to grasp.
One trick I've used that saves life when designing and reviewing regex and makes them readable even for later programmers who know nothing about regex is to chunk them into smaller bits with significant names.
For example, using C, file-local, pre-processor:
#define MATCH_NAME_RE "[a-zA-Z]+"
#define MATCH_SPACES_RE "\s+"
#define CAPTURE_NUMBER_RE "(\d+\)"
And then:
re = QRegex( MATCH_NAME_RE MATCH_SPACES_RE CAPTURE_NUMBER_RE );
It both documents what each part does, makes it easier to debug and change them later on and makes them grokable. It only breaks down when you're doing very complex matching, but most of the time RE can be broken into smaller pieces.
I had a look through the regex tuesday list. It looks like fun for people who know their way around regexs but if you're just learning I'd recommend starting with something simpler.
The tutorial I learned regular expressions from, which is longer but more detailed than this one, was http://www.regular-expressions.info/. It’s free, thorough and well-organized. Its only flaw is that its section on support in various languages is out of date. It was written to sell a Windows-only regex tool, but it’s very non-pushy with the advertisements.
You can see a list of online regular expression testers for various languages at the Stack Overflow “regex” tag wiki, in the “Online sandboxes” section: http://stackoverflow.com/tags/regex/info. For JavaScript regexes, http://regexpal.com/ is easy to use.
regular-expressions.info has been a good reference site for me. Another useful tool is an online regex checker. Out of a handful or so options, I find this the best : http://regex101.com/?
Regular expressions are useful, but on a much smaller set of problems than people typically use them for.
For example Chrome's script skipper in the debug tools completely unnecessarily uses regexes rather than a simple list of 'contains'.
It's definitely one of the worst "everything's a nail" hammers a lot of programmers wield. And quite often they've been holding it upside down for years without even realizing it.
Another situation where regexes are possibly abused - syntax highlighting. Yes, there are a lot of languages where you can sanely highlight /most/ code with regexes, but there's a non-zero number where you need a little more context, such as that provided by an AST (which would also help with autocompletion anyway if the highlighting is being done in the context of an editor).
Maybe the next fight on HN about the interview process should be whether we ought to expect the candidate to know about regexps.
I know them, so of course every good developer should know them. (My proof: I am a good developer, and I know them, therefore good developers should know them!)
(NB: In case it's not obvious this comment is slightly tongue-in-cheek.)
For the author: I've learned regex for a while, for years, I always come back to them and I'm always annoyed to forget about them if I don't use them for a long time.
I use mostly cheat sheets now,
but still, your tutorial was the best I ever read and I actually read it. It reads better than a "learnXinYminutes.com" and I actually learned some new stuff.
Notes:
validating email adresses: most MVC have a function that does that. I know that PHP has that natively as well.
Handy. This does gloss over some of the notable differences between implementations (not everything has non-greedy matches or identical {m,n} or {m,} syntax), but it's still by far the best tutorial introduction I've seen for regular expressions.
In my comment above, I said this is the best tutorial introduction I've seen, even with its few limitations regarding differences between regex implementations.
I completely misread "Handy" as "Hardly" and then saw "Hardly. This does gloss over some of the notable differences . . ." which changed the entire tone of your comment for me.
Well, remember that it also has to fit in the row/column. If there's three spaces available, the string has to have three characters. So (O|RM|HHM)* can only really match ORM, RMO, and HHM.
The time I finally _really_ learned regex was the time I _really_ needed them for a project that was beyond something I could stack overflow. I think having some problem in front of oneself, and testing over a ton of input data is a great way to learn.
I find regular expressions intensely annoying. I come across situations where they are a perfect fit, but, as I've never got around to learning the syntax I spend ages looking for a perfect, or sometimes not so perfect, answer on SO that gives me the arrangement. More often than not, in order to tweak the answer to make it fit I find myself in the position of having to learn the syntax in order to do so. Catch 22.
It seems there are no shortcuts and you can't practice cut and paste.
Yes, you are right. Learning needs interaction with the material.
Any experts here want to set an exercise for each section? We can argue about the best solution in true HN style, and the result will be a good resource.
matches entire string while article says that ? should force it to match as little as possible. (All "test1", test2 and "test3" are highlighted as red) It works correctly http://regexpal.com/. What am I doing wrong?
This is when you learn that not all implementations are the same, and realize it's better to read the documentation for your particular implementation.
I'm guessing your grep defaults to POSIX EREs, but always check the manual.
The behavior of multiple adjacent duplication symbols ( '+', '*', '?', and intervals) produces undefined results.
I don't particularly need a regex course, but I have to say, this site is so nicely designed, so easy for the eye.
A good way to practice regex is to use one of the many online regex tools to validate your understanding. My current favorite is http://www.gethifi.com/tools/regex# because it shows the group.
Why is it that I find it much easier to just build a lexer/parser than I do using regex? Something about my brain, every time I see them I usually do everything in my power to not have to try to understand them.
On the other hand, I've worked with some regex ninjas who appear to just innately get it.
Maybe it's like cilantro, separating us by genetics.
Well I think you're in the "good" side of the discussion. Sometimes whenever people learn regular expressions, they try to solve everything with them, but for any kind or medium sized problem or DSLs, writing a custom parser is a better option.
IIRC yacc's documentation advocates its use for tasks where people would go regex only. I guess in the old days yacc got more light, now it's an obscure beast for compiler writers ... not F5 to refresh the webpage things.
The most important things to learn about regex are greed and boundaries.
Nobody cares about 'cat' for that you use a basic replace function in your preferred language, so when you learn about masks and patterns you need to understand when to start and when to stop.
Finding everything between quotes ".*" is not gonna give you what you expect.
This is not reddit so you get downvoted for comments like this because you are not contributing anything to the discussion. In the future you can upvote the story and then click on your profile and there is a link to saved submissions. Any submission you upvote will be saved in the saved submission list. Furthermore on reddit you can just click the "save" link to save things. I never understood why it was acceptable to post comments like this at reddit but that is just one of many why does reddit put up with X questions.
Read the wikipedia page which has a great description of what regular expressions are and why you would use them: https://en.wikipedia.org/wiki/Regular_expression
Read the JS docs: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guid...
Test your regexes: http://regexpal.com/
Visualize your regexes: http://www.regexper.com/
Try some challenges: http://callumacrae.github.io/regex-tuesday/
And remember, writing regular expressions can be very difficult, reasoning about your regular expressions can be even more so, defining your problem can be the most difficult of all. Think before you regex.