If that is the only incompatibility, it would be easy to make a patch that checks if it is called as "grep" and default to "not -rHInE" - so one could ship ripgrep as default and yet have backwards compatibility. Some busy boxes already do that iirc.
EDIT: So I've quickly looked into it and it seems nobody did an extensive comparison to the grep feature set or the POSIX specification. If I have some time later this week I might do this and check whether something like this would be viable.
It's not even close to the only incompatibility. :-) That's a nominal one. If that were really the only thing, then sure, I would provide a way to make ripgrep work in a POSIX compatible manner.
There are lots of incompatibilities. The regex engine itself is probably one of the hardest to fix. The incompatibilities range from surface level syntactic differences all the way down to how match regions themselves are determined, or even the feature sets themselves (BREs for example allow the use of backreferences).
Then of course there's locale support. ripgrep takes a more "modern" approach: it ignores locale support and instead just provides what level 1 of UTS#18 specifies. (Unicode aware case insensitive matches, Unicode aware character classes, lots of Unicode properties available via \p{..}, and so on.)
Pity! I did look; only "-E" and "-s" diverge from the POSIX standard parameter-wise. But making significant changes to the pattern engine is probably not worth it.
It's worth noting that the implementation of ripgrep has been split up into a whole bunch of modular components. So it wouldn't be out of the question for someone to piece those together into a GNU-compatible grep implementation.
To the extent that you want to get a POSIX compatible regex engine working with ripgrep, you could patch it to use a POSIX compatible regex engine. The simplest way might be to implement the requisite interfaces by using, say, the regex engine that gets shipped with libc. This might end up being quite slow, but it is very doable.
But still, that only solves the incompatibilities with the regex engine. There are many others. The extent to which ripgrep is compatible with POSIX grep is that I used flag names similar to GNU grep where I could. I have never taken a fine toothed comb over the POSIX grep spec and tried to emulate the parts that I thought were reasonable. Since POSIX wasn't and won't be a part of ripgrep's core design, it's likely there are many other things that are incompatible.
A POSIX grep can theoretically be built with a pretty small amount of code. Check out busybox's grep implementation, for example.
While building a POSIX grep in Rust sounds like fun, I do think you'd have a difficult time with adoption. GNU grep isn't a great source of critical CVEs, it works pretty well as-is and is actively maintained. So there just isn't a lot of reason to. Driving adoption is much easier when you can offer something new to users, and in order to do that, you either need to break with POSIX or make it completely opt-in. (I do think a good reason to build a POSIX grep in Rust is if you want to provide a complete user-land for an OS in Rust, perhaps if only for development purposes.)
Well, the reasons I see for being POSIX-compatible would be:
1. Distributions could adopt rg as default and ship with it only, adding features at nearly no cost
2. The performance advantage over "traditional" grep
Number 1 is basically how bash became the default; since it is a superset of sh (or close enough at least), distributions could offer the feature set at no disadvantage. Shipping it by default would allow scripts on that distribution to take advantage of rg and, arguably, improve the situation for most users at no cost.
If one builds two programs in one with a switch, you're effectively shipping optional software, but in a single binary, which makes point 1 pretty moot. If you then also fall back on another engine, point 2 is moot as well - so the only point where this would actually be useful is if rg could become a good enough superset of grep that it would provide sufficient advatages (most greps _already_ provide a larger superset of POSIX, though). Everything else would just add unnecessary complexity, in my opinion.
Ah I see. Yeah, that's a good point. But it's a very very steep hill to climb. In theory it would be nice though. There's just a ton of work to do to hit POSIX compatibility and simultaneously be as good at GNU grep at other things. For example, the simplest way to get the regex engine to be POSIX compatible would be to use an existing POSIX compatible regex engine, like the one found in libc. But that regex engine is generally regarded as quite slow AIUI, and is presumably why GNU grep bundles it's entire own regex engine just to speed things up in a lot of common cases. So to climb this hill, you'd either need to follow in GNU grep's footsteps _or_ build a faster POSIX compatible regex engine. Either way, you're committing yourself to writing a regex engine.
I didn't look closely, but Oniguruma is pretty dang fast and has drop-in POSIX syntax + ABI compatability as a compile-time option. Could maybe use that.
Ahh, very interesting, thanks for sharing! Do you have any thoughts around why that is? I presume that's due to Oniguruma supporting a much broader feature set and something like fancy-regexp's approach with mixing a backtracking VM and NFA implementation for simple queries would be needed for better perf? (I am aware you played a role in that) [1]
I have been playing around with regex parsing through building parsers through parser combinators at runtime recently, no clue how it will perform in practice yet (structuring parser generators at runtime is challenging in general in low-level languages) but maybe that could pan out and lead to an interesting way to support broader sets of regex syntaxes like POSIX in a relatively straightforward and performant way.
No idea. I've never done an analysis of onig. Different "feature sets" tends to be what people jump to first, but it's rarely correct in my experience. For example, PCRE2 has a large feature set, but it is quite fast. Especially its JIT.
The regex crate does a lot of literal optimizations to speed up searches. More than most regex engines in my experience.
EDIT: So I've quickly looked into it and it seems nobody did an extensive comparison to the grep feature set or the POSIX specification. If I have some time later this week I might do this and check whether something like this would be viable.