Yeah, it's actually an interesting case study in performance optimization - GNU ...

dist1ll · on July 24, 2022

That's not really what's happening in the article. If you read through the single file benchmark, you'll see several clever algorithmic improvements (like rarest byte guessing, building a set of variants for Unicode-aware multiple pattern matching, etc...).

The author literally concedes that the .gitignore feature was not done for performance, and actually carries a significant overhead in large directory trees. For the sake of comparability, the study was controlled for the .gitignore overhead.

ducktective · on July 24, 2022

> simple trick of "completely ignore a lot of files by

The author of rg wrote a blog post about this. According to what I recall, he did performance comparisons on same limitations and scope. So it's not like in that benchmark, the difference would be due to an obvious fact as this.

burntsushi · on July 25, 2022

This is very very very wrong. GNU grep is not doing any optimizations based on "deep kernel knowledge" that ripgrep doesn't do. I'm honestly not even sure what you're referring to. GNU grep uses standard 'read' syscalls. ripgrep does that too (but also uses memory maps in some cases). There is some buffer size tuning, but otherwise, nothing particularly interesting there.

ripgrep's speed might come from ignoring files in any given use case, and it might even be the biggest reason why a search completes faster. But in my linked blog post, I control for all of that. Yes, while ripgrep might be faster in some cases because of its "smart" filtering, it's also faster in cases where "smart" filtering isn't enabled.

ape4 · on July 24, 2022

Sounds like both techniques could be used together.