Hacker News

burntsushi · 2025-03-19T19:07:20 1742411240

That isn't true. ripgrep will happily pass through data it gets as-is. It just itself doesn't support locales. Commands do not need to have locale support to be used in shell pipelines. `rg foo | rg -v bar` is an example that has nothing to do with locales but demonstrates that ripgrep can be used in shell pipelines.

oguz-ismail · 2025-03-19T19:18:06 1742411886

It is. Consider

    foo | grep -E '^.{4}$' | bar

If foo prints $'O\u011Fuz' and LC_ALL is set to a valid UTF-8 locale, bar will receive what foo printed. If foo prints $'O\xF0uz' and LC_ALL is assigned a valid ISO-8859 locale, again, bar will receive what foo printed; because foo, bar, and grep are compatible. You can't have that with ripgrep; you'll have to modify the shell script in between or worse, manually parse LC_ALL and hope ripgrep supports the encoding

burntsushi · 2025-03-20T02:09:40 1742436580

First of all, that doesn't invalidate literally anything I said. I have never claimed that ripgrep supported locales. I only said that it could be used in shell pipelines. That doesn't mean it can be used in all shell pipelines in exactly the same way that grep can. I even clarified this explicitly. Which is why you are a troll.

Second of all, you're just wrong. ripgrep uses UTF-8 by default, but you don't have to use it. And when you switch your locale to ISO-8859 (who uses that!?!? why does it mater!?!?), you are no longer emitting UTF-8. As you obviously know or else you wouldn't have been able to come up with the example.

As has been pointed out elsewhere in this thread, you can disable Unicode mode in ripgrep:

    $ echo $'O\xF0uz' | LC_ALL=en_US.ISO-8859-1 grep -E '^.{4}$'
    Ouz
    $ echo $'O\xF0uz' | rg '^(?-u:.){4}$'
    Ouz

Or just specify the encoding you want:

    $ echo $'O\xF0uz' | rg -E iso-8859-1 '^.{4}$'
    Oðuz

ripgrep doesn't do this the same way that grep does. But you can achieve it, which is perfectly in line with what I said.

oguz-ismail · 2025-03-20T02:57:12 1742439432

> you can disable Unicode mode in ripgrep

And everything becomes a byte, that's not always what you want

    $ export LC_ALL=tr_TR.ISO-8859-9
    $ echo $'O\xD0UZ' | grep -E '^[[:upper:]]{4}$'
    O�UZ

> Or just specify the encoding you want

And what if I want to support more than one encoding? Am I supposed to modify my shell script every time I run it?

burntsushi · 2025-03-20T03:04:57 1742439897

I'm sure you can get creative. :-) You can set an environment variable to control the encoding, expose a flag or any one of a number of other things to control the encoding. Either way, now you're just shifting the goalposts.

You've also continued to ignore my most substantive rebuttal: that a specific example where ripgrep is not compatible with grep or doesn't behave the same doesn't mean it can't be used in shell pipelines. I use ripgrep in shell pipelines all of the time. As do many others. Literally nothing you've said has invalidated anything I've said. All you're doing is finding things that some implementations of grep can do that ripgrep (intentionally) cannot do in exactly the same way. But that's fine, because ripgrep was never, isn't and will never be compatible with grep: https://github.com/BurntSushi/ripgrep/blob/master/FAQ.md#pos...

So if you need grep compatibility get a fucking clue and just use grep.

oguz-ismail · 2025-03-20T03:30:30 1742441430

> You can set an environment variable to control the encoding

Yes, that's what LC_ALL is. Every other tool understands it except ripgrep. Even if you parse it by hand, there's no guarantee that ripgrep will support the encoding.

burntsushi · 2025-03-20T03:42:59 1742442179

ripgrep intentionally doesn't understand it. It is far from the only tool that doesn't. For example, busybox doesn't either:

    $ echo $'O\xF0uz' | LC_ALL=en_US.ISO-8859-1 busybox grep -E '^.{4}$'

So that's yet another lie from you.

If you want or need LC_ALL support, then don't use ripgrep. It's right there in the FAQ entry as a reason to use grep instead:

> Do you care about POSIX compatibility? If so, then you can't use ripgrep because it never was, isn't and never will be POSIX compatible.

Maybe learn how to read. Your complaint boils down to "ripgrep doesn't do this thing that it says it doesn't do." What a fucking revelation.

oguz-ismail · 2025-03-20T04:43:10 1742445790

>busybox

Apples vs. oranges. Busybox is a suite of minimal tools for low-resource systems, lack of QoL features you wouldn't need in such an environment is its selling point. Besides you won't see anyone claiming individual busybox tools can be used in shell scripts just fine.

It's easier to admit ripgrep is rather a supplementary tool that is incompatible with the tooling your typical Unix-like operating system provides out of the box than a "modern replacement" for grep and move on.

burntsushi · 2025-03-20T12:29:49 1742473789

So you lied. Now you've shifted the goalposts after being called out on it. And ripgrep has no problems being used in shell pipelines. What ripgrep has a problem doing is being drop-in compatible for grep, because that isn't its goal.

> "modern replacement" for grep

I have literally never called ripgrep this. So more lies and straw-manning from you. ripgrep's repo neither positions itself as a replacement or as "modern." And in the FAQ about "replacing" grep, it makes a nuanced claim that accounts for all of this. It specifically says you should use grep when you need the features of grep that aren't in ripgrep.

whytevuhuni · 2025-03-19T20:17:57 1742415477

I found this interesting, so I tried to test it:

On LC_ALL=en_US.UTF-8:

    $ echo $'O\u011Fuz' | grep -E '^.{4}$'
    Oğuz
    $ echo $'O\u011Fuz' | rg '^.{4}$'
    Oğuz

On LC_ALL=en_US.ISO-8859-1:

    $ echo $'O\xF0uz' | grep -E '^.{4}$'
    O�uz
    $ echo $'O\xF0uz' | rg '^.{4}$'

It strangely doesn't find anything at all:

    $ echo $'O\xF0uz' | rg '^.*$' | wc -c
    0

It only does once the $ anchor is removed:

    $ echo $'O\xF0uz' | rg '^.*' | wc -c
    5

burntsushi · 2025-03-20T02:19:27 1742437167

It's not strange because ripgrep doesn't understand non-UTF-8 data (unless there's a UTF-16 BOM, in which case, ripgrep will automatically understand it). But you can tell it to:

    $ echo $'O\xF0uz' | rg -E iso-8859-1 '^.{4}$'
    Oðuz

The person you're responding to has been trolling in this thread (and others) by twisting words and claiming multiple false things. When I've fixed their errors, they don't acknowledge them as mistakes and just keep on twisting words.