I worked on a tool that infers a regex from sample strings, by building and then...

captn3m0 · on Nov 21, 2019

If someone is looking, https://github.com/noprompt/frak does the same..

>frak transforms collections of strings into regular expressions for matching those strings. The primary goal of this library is to generate regular expressions from a known set of inputs which avoid backtracking as much as possible. It is available as a command line utility and for the browser as a JavaScript library.

_8huj · on Nov 21, 2019

This sounds exactly like what I was hoping to learn more about.

I wanted to create an IRC client where I would add/remove modules at runtime. These modules might respond to things said in-chat. As an example, you might have a greeter module that responds to common things like "hi" and "hello". The module would register expressions for things it triggers on. In parallel, all these modules would have registered expressions. Instead of testing all these expressions in series, I wanted to feed these to some code that would decompose and reduce them into a simplified expression. It'd be like if you arranged the regexps in a nested series of if-conditions.

Finding an appropriate middle ground for testing expensive regular expressions was the objective of all that, but I didn't get very far as I lost myself in Lua's LPeg project.

What a beautiful goddamn project. So unique and amazing.

jbapple · on Nov 21, 2019

> Has anyone explored regexp minimization?

There has been significant work on https://en.wikipedia.org/wiki/DFA_minimization

bane · on Nov 21, 2019

Very cool, do you know of any tool that does something similar (provide examples, get regex)?

jraph · on Nov 21, 2019

I am not sure this is what you are looking for, but I have been working on Aude (AUtomata DEmystifier), an open source pedagogical application targeted at CS teachers and students that works in browsers without installation (but one can download and run it offline too).

The aim is to visualise and manipulate automata, including conversions between them and regular expressions. Happy to take feedback!

https://aude.imag.fr

AlchemistCamp · on Nov 21, 2019

That's a neat project! What inspired you to start it?

jraph · on Nov 21, 2019

Thanks for the kind words.

Well, I was a student and needed to practice the related lesson, "Languages and Automata", before the final exam. So I implemented the algorithms of the lesson and used Graphviz to render the result. The thing worked in a browser but ran on the server (using D!). I figured my fellow students or the teacher may find it useful, so I sent the link to the teacher (in a post-scriptum of a very long mail in which I was asking for help, ah ah).

"Your tool seems really interesting, what about you continue developing this during an internship this summer?"

Hell, yes. And Aude started.

Since then, I was lucky to mentor several interns to work on Aude with this teacher.

AlchemistCamp · on Nov 21, 2019

That is very cool and I'll bet having a steady flow of later cohorts of students helped a ton!

jraph · on Nov 21, 2019

I would not say a steady flow, there were two to three students two months per summer these last three years, but they helped a lot anyway indeed :-)

And it was fun to work with them. The first "cohort" of three students called themselves the Aude Team and they put that in their internship report.

Gys · on Nov 21, 2019

Maybe this: http://regex.inginf.units.it/

pottertheotter · on Nov 21, 2019

Microsoft PROSE kind of does. I haven't looked at it for a while so can't quite remember, but might be of interest to you.