It's fast indeed. And I can't help keeping promoting the combination with fzf :) For those who want to try it out, this is a Powershell function but the same principle applies in any shell. Does ripgrep then puts fuzzy searching in the resulting files+text on top while showing context in bat:
There are other ways to approach this, but for me this is a very fast way of nailing down 'I now something exists in this multi-repo project but don't know where exactly nor the exact name'
edit this comes out of https://github.com/junegunn/fzf/blob/master/ADVANCED.md and even though you might not want to use most of what is in there, it's still worth glancing over it to get ideas of what you could do with it
Infact I would recommend a step further to integrate rip-grep-all (rga) with fzf that can do a fuzzy search not just on text files but on all types of files including pdfs, zip files. More details here [1]
That's really nice, thanks. Long ago there was the Google Desktop Search where you could 'Google' your local documents. But the difference is that that worked with an index, so I imagine it's faster if you have thousands of pdfs en epubs.
Even longer ago, there was `glimpse`: https://www.linuxjournal.com/article/1164 which is still available. [1] glimpse's index-builds are like 10X slower than `qgrep` mentioned elsethread. `qgrep` also seems to have faster search (though I only tried a few patterns) and `qgrep` does not allow spelling errors like `glimpse`.
Neither `glimpse` nor `qgrep`, to my knowledge, directly supports pre-processing / document conversion (like `pdftotext`), though I imagine this would be easy to add to either replicating Desktop Search. (Indirectly, at some space cost, you could always dump conversions into a shadow file hierarchy, index that, and then translate path names.)
function frg --description "rg tui built with fzf and bat"
rg --ignore-case --color=always --line-number --no-heading "$argv" |
fzf --ansi \
--color 'hl:-1:underline,hl+:-1:underline:reverse' \
--delimiter ':' \
--preview "bat --color=always {1} --theme='Solarized (light)' --highlight-line {2}" \
--preview-window 'up,60%,border-bottom,+{2}+3/3,~3' \
--bind "enter:become($EDITOR +{2} {1})"
end
Still not a fan of the string-based injections based on the colon and newline characters, but all versions suffer from it. (also: nice that fzf does the right thing and prevents space and quote injection by default).
I've never really seen PowerShell beyond minimal commands, but after seeing the parent, I definitely think it has the superior syntax of the shells. Especially for scripts.
I expected to like Powershell when I began working somewhere with a lot of Windows (after decades of mostly Linux). I figured on paper this sounds like it has learned many important lessons that Unix shells could learn but (at least the popular ones) didn't, it's been given a blank canvas, the principles it's working to make sense, it has good people behind it. So I even undertook to write a modest new piece of glue code in Power Shell, after all if it had been on Linux I'd definitely consider the Bourne shell as well as Python for the work...
Then I tried it and I strongly dislike it. The syntax is clunky, it's really no better than popular Unix shells at being a "real" programming language, and yet it's not as good as they are at being just a shell either.
It also just doesn't feel like a quality product. On my work Windows laptop, Powershell will sometimes not quite bother flushing after it starts, so I get the banner text and then... I have to hit "enter" to get it to finish up and write a prompt. In JSON parsing the provided JSON parser has some arbitrary limits... which vary from one version to another. So code which worked fine on machine #1 just silently doesn't work on machine #2 since the JSON parsers were changed and nobody apparently thought that was worth calling out. If you told me this was the beta of Microsoft's new product I'd be excited but feel I needed to provide lots of feedback. Knowing this is the finished product I am underwhelmed.
I find the built-in commands rough. "curl https://jrock.us" to see if my website is up used to involve opening Internet Explorer to accept some sort of agreement. Now it just flashes the terminal, moves the cursor to the far right hand side of the screen, and blinks for a while. I like the Linux version of curl better...
As it turns out, the reason that "curl ..." doesn't work is because it pops up a window below all of my other windows saying that certificate revocation information is unavailable, and would I like to proceed. After that it does download my web page!
> It also just doesn't feel like a quality product. On my work Windows laptop, Powershell will sometimes not quite bother flushing after it starts, so I get the banner text and then..
That’s independent of the shell, and is I believe a bug in the terminal emulator. There is an open source Windows Terminal you can separately install and that is so much better.
Nope, windows terminal definitely does this too. Just last week I was trying to install WSL, and thought it had frozen at 20% and was trying to figure out what went wrong ... turns out it had already booted but powershell had stopped flushing output.
A lot of people sleep on PowerShell, possibly because some of the syntax is a little clunky (and quite slow compared to some other shells, I will freely admit). That being said, I'd argue object oriented programming is a massive improvement over text oriented programming. I never want to touch awk again!
The only thing making it less than a total win is its handling of piped errors and `set -e`. The programming model itself is far superior to 'stringly-typed' sh.
PowerShell is the worst of all worlds. Its a terrible shell compared to bash/zsh/whateversh, and for anything complex enough to need a long script you’re far better off in Python.
This comment got me to get out my French press and manually grind some beans. It wasn't a meditative and calming as I remember, and the coffee tastes a little...dusty. I guess it's time for me to update my vimrc.
I use my Aeropress every day but I wouldn't say it's "better" than French press, it's just different. Using less coffee but finer ground changes the characteristics of the brew quite a bit (probably some technical reason about extraction level or something).
Add one or a few drops of water to your roasted coffee beans with your hand and shake well after weighing it out to stop the grinds from sticking to the walls of your grinder from static.
I love that a ripgrep article has such a deeply nerdy coffee thread…
> Add one or a few drops of water to your roasted coffee beans
Ah, RDT (Ross Droplet Technique)[0].
A little atomizer (“spritz” bottle) of plain water serves well here. NB: this is for single-dose grinding - e.g. measuring a small amount of beans loaded into a grinder to grind immediately. If you have a grinder with a “big” hopper on top that has (e.g.) the weeks worth of coffee (even though you grind on-demand for ea. espresso/french press/aeropress/pourover/drip/…) this isn’t for you.
With that in the [alias] section of a gitconfig file, running git fza brings up a list of modified and not yet added files, space toggles each entry and moves to the next entry.
That alias as well as fzf+fd really speed up some parts of my workflow.
And that's just the start: it could even be that by binding a key to the fzf reload command to then display the diff in it's finder, and in turn a key to stage the selected line, you could turn that into an interactive git staging tool.
This is pretty much my exact use of ripgrep, too. I use it as a starting point to zero in on files/projects in a several-hundred repo codebase, and then go from there....
> How do you scroll the preview window with keyboard ?
alias pf="fzf --preview='less {}' --bind shift-up:preview-page-up,shift-down:preview-page-down"
That will let you run `pf` to preview files in less and lets you use shift + arrow keys to scroll the preview window. No dependencies are needed except for fzf. If you want to use ripgrep with fzf you can set FZF_DEFAULT_COMMAND to run rg such as `export FZF_DEFAULT_COMMAND="rg ..."` where ... are your preferred rg flags. This full setup is in my dotfiles at https://github.com/nickjj/dotfiles.
Oh. Thanks for the tip. This might make me finally embrace powershell. I’ve been using WSL+zsh+fzf as a Windows CLI for continuity with day job Mac tools, but git CLI performance is only usable inside the WSL file system.
You can also add a small script to your WSL under `/usr/local/bin/git`:
GIT_WINDOWS="/mnt/c/Program Files/Git/bin/git.exe"
GIT_LINUX="/usr/bin/git"
case "$(pwd -P)" in
/mnt/?/*)
case "$@" in
# Needed to fix prompt, but it breaks things like paging, colours, etc
rev-parse*)
# running linux git for rev-parse seems faster, even without translating paths
exec "$GIT_LINUX" "$@"
;;
*)
exec "$GIT_WINDOWS" -c color.ui=always "$@"
;;
esac
;;
*)
exec "$GIT_LINUX" "$@"
;;
esac
This allows you to use `git` in your WSL shell but it'll pick whichever executable is suitable for the filesystem that the repo is in :)
Yeah, I have a bit of a love-hate relationship with it. But I actually have that with all shells out there. I don't know if it's just me or the shells, or (the most likely I think): a bit of both. But PS is available out of the box and using objects vs plain text is a major win in my book, and even though I still don't know half of the syntax by heart it feels less of an endless fight than other shells. And since I use the shell itself for rather basic things and for the rest only for tools (like shown here), we get along just fine.
I use ripgrep with the Emacs packages project.el (comes out of the box) and dumb-jump (needs to be installed). This may not be the most popular way of using rg but I have been very pleased with the overall experience. All it takes is running package-install to install the dumb-jump package and configuring the following hook:
The Xref key sequences and commands work fine with it. If I type M-. (or C-u M-.) to find definitions of an identifier in a Python project, dumb-jump runs a command like the following, processes the results, and displays the results in an Xref buffer.
The above command shows how dumb-jump automatically restricts the search to the current file type within the current project directory. If no project directory is found, it defaults to the home directory.
By the way, dumb-jump supports the silver searcher tool ag too which happens to be quite fast as well. If neither ag nor rg is found, it defaults to grep which as one would expect can be quite slow while searching the whole home directory.
Ripgrep can be used quite easily with the project.el package too that comes out of the box in Emacs. So it is not really necessary to install an external package to make use of ripgrep within Emacs. We first need to configure xref-search-program to ripgrep as shown below, otherwise it defaults to grep which can be quite slow on large directories:
(setq xref-search-program 'ripgrep)
Then a project search with C-x p g foo RET ends up executing a command like the following on the current project directory:
The results are displayed in an Xref buffer again which in my opinion is the best thing about using external search tools within Emacs. The Xref key sequences like n (next match), p (previous match), RET (jump to source of match), C-o (show the source of the match in a split window), etc. make navigating the results a breeze!
Looking at your regex---just by inspection, I haven't tried it, so I could be wrong---but I think you can drop the --pcre2 flag. I also think you can drop the second and third \b assertion. You might need the first one though.
The example I have posted in my comment is not a command I am typing myself. The dumb-jump package generates this command for us automatically. It is possible to customize the command it generates though. Indeed while running ripgrep manually, I do not use the --pcre2 option. Thank you for developing and maintaining this excellent tool!
this is a good option but i still use rg.el for the occasion that i want to search several projects at once or a subfolder within a project, where i would otherwise use ‘rgrep’
I've been using ripgrep for about 2 years now and I find in indispensable. The main reason I switched from grep was ease of use. From the README: "By default, ripgrep will respect gitignore rules and automatically skip hidden files/directories and binary files." Typing `rg search_term directory` is much better than the corresponding grep command, but the speed improvement is also a nice bonus.
Random other helpful flag I use often is -M if any of the matches are way too long to read through and cause a lot of terminal chaos. Just add `-M 1000` or adjust the number for your needs and the really long matches will omit the text context in the results.
Yeah, the -M command is wonderful (super handy for ignoring minified files that you don't want to see results from, etc), and also great is the -g command (eg `-g *.cs` and you'll just search in files that have the .cs extension).
Also the fact that it is a standalone portable executable can be super handy. Often when working on a new machine, I'll drop in the executable and an alias for grep that points to rg, so if muscle memory kicks in and I type grep it will still use rg.
If you're a fan of the -g flag to ripgrep then I also recommend checking out the -t flag, short for --type, which lets you search specific file types. You can see the full list with `rg --type-list`. For example, you could just search .cs files with `rg -tcs`.
This flag is especially convenient if you want to search e.g. .yml and .yaml in one go, or .c and .h in one go, etc.
Tbh I've always just typed -t followed by something that feels intuitive and it's always worked. Never really bothered looking in help until I made the above comment.
And this may still be true in 2023, but the problem is that most of the parallelized grep replacements (e.g. ripgrep, ag, etc.) are SO much faster than grep that the much small speed differences between them doesn't provide much of a basis for differentiating them.
I use ag (typically from inside Emacs) on a 900k LOC codebase and it is effectively instantaneous (on a 16 core Ryzen Threadripper 2950X). I just don't have a need to go from less than 1 second to "a bit less than less than 1 second".
Speed is not the defining attribute of the "new greps" - they need to be assessed and compared in other ways.
In 2016, I'd say speed was definitely a defining attribute. ag has very significant performance cliffs. You can see them in the blog post.
But as I mentioned in my comparison to qgrep elsewhere in the thread, everyone has different workloads. And for some workloads, perf differences might not matter. It really just depends. 900 KLOC isn't that big, and indeed, for simple queries pretty much any non-naive grep is going to chew through it very very quickly.
The blog post also compares Unicode support, and contextualizes its performance. ag essentially has zero Unicode support. Unicode support isn't universally applicable of course---you may not care about it---but it satisfies your non-perf comparison criteria. :-)
In my experience they are all horribly i/o bound and the search takes as long as files load from disk and that's quite long, after that the difference can't possibly be meaningful. When files are in cache search time is dominated by time it takes me to navigate the file system and write the command, and again performance difference can't possibly be meaningful.
> When files are in cache search time is dominated by time it takes me to navigate the file system and write the command
This suggests your corpora are small. If you have small corpora, then it should be absolutely no surprise that one tool taking 40ms and another taking 20ms will matter for standard interactive usage.
well... it is not faster than qgrep :) even though the way both work - differs greatly, and even though qgrep is based on re2 - the speed comes from the presence of index. but then I wonder why people forget the qgrep option, since with large file stores it makes much more sense to use qgrep AND indices, rather than always go through all the files.
this above all true UNLESS you need multi-line matches with UTF8, where ripgrep is not so fast, because it needs to fall back to the other PCRE2 lib
Yes, qgrep uses indexing, which will always give it a leg up over other tools that don't use indexing. But of course, now you need to setup and maintain an index. The UX isn't quite as simple as "just run a search."
But there isn't much of a mystery here. Someone might neglect to use qgrep for exactly the same reason that "grep is fast enough for me" might prevent someone from using ripgrep. And indeed, "grep is fast enough" is very much true in some non-trivial fraction of cases. There are many many searches in which you won't be able to perceive the speed difference between ripgrep and grep, if any exists. And, analogously, the difference between qgrep and ripgrep. The cases I'm thinking of tend to be small haystacks. If you have only a small thing to search, then perhaps even the speed of a "naive" grep is fast enough.
So if ripgrep, say, completes a search of the Linux kernel in under 100ms, is that annoying enough to push you towards a different kind of tool that uses indexing? Maybe, depends on what you're doing. But probably not for standard interactive usage.
This is my interpretation anyway of your wonderment of (in your words) "why people forget the qgrep option." YMMV.
> this above all true UNLESS you need multi-line matches with UTF8, where ripgrep is not so fast, because it needs to fall back to the other PCRE2 lib
That's not true. Multiline searches certainly do not require PCRE2. I don't know what you mean by "with UTF8," but the default regex engine has Unicode support.
PCRE2 is a fully optional dependency of ripgrep. You can build ripgrep without PCRE2 and it will still have multiline search support.
Does `build.rs` build the project? One of my favorite (Big Corp) code-bases just had a single C file (build.c) that did all the dependency tracking, like, Make, but in some nicely written (easy to understand) C code. The C file started with a shebang: a self-building-and-executing line, so we'd do this:
No. Cargo does. The `build.rs` is basically a Cargo hook that gets compiled as a Rust program and executed just before the ripgrep binary is compiled. It lets you do things like set linker flags[1] so that you can embed an XML manifest into the binary on Windows to enable "long path support."
ripgrep's build.rs used to do more, like build shell completions and the man page. But that's now part of ripgrep proper. e.g., `rg --generate man` writes roff to stdout.
I switched from from ripgrep to ugrep and never looked back. It's just as fast, but also comes with fuzzy matching (which is super useful), a TUI (useful for code reviews), and can also search in PDFs, archives, etc.
The optional Google search syntax also very convenient.
I’m a die-hard ripgrep fan, but just recently found ugrep looking for one feature that ripgrep lacks: searching in zip archives (without decompressing them to disk).
Ugrep has that. In my case, I’m working with zipped corpora of millions of small text files, so I can skip unpacking the whole thing to the filesystem (certain filesystems have trouble at this scale).
I’m grateful for both tools. Thanks to the respective authors!
So I was casually searching for "ugrep vs ripgrep" articles, when I stumbled upon a couple reddit posts where apparently the authors of ugrep and ripgrep seemed to have a multi-year feud on reddit, eg. https://www.reddit.com/r/programming/comments/120wqvr/ripgre...
So weird. I mean, it's just about some open source tool, right? :-/
I came across ugrep recently and I immediately recognized the organization as one that I had dealt with starting about 15 years go. The author is brilliant‡, but extremely prickly (sometimes even to paying customers). The author of ripgrep, on the other hand, has always seemed like someone who just wants to get on with the business of writing software that people use.
‡ The main commercial product of the ugrep author's company at the time was the gSOAP code generator (it may still be), and that it not only works but makes a reasonably good C and C++ API from WSDL is proof that it is the product of a genius madman. It also allowed you to create both the API and WSDL from a C++-ish header, and both .NET and Java WSDL tools worked perfectly with it. We needed it to work and work it did.
At the time, the generated API was just difficult enough to use that I generated another ~1k lines of code for that project. IIRC, the generated API is sort of handle-based, which requires a slightly different approach than the strict RAII approach we were using. Generating that code was a minor adventure (generating the gSOAP code from the header-ish file, generating doxygen XML from the generated gSOAP code, then generating the wrapper C++ from the doxygen XML).
There are feuds about open source tools all the time. Text editors, Linux distros, shells, programming languages, desktop environments, etc... And ugrep vs ripgrep may be a poster child for C++ vs Rust.
It is not all bad, it drives progress, and it usually stays at a technical level, I've yet to see people killing each others for their choice of command line search tool.
There's like a whole host of things you could use to explain it. Inertia. Compatibility. Resistance to change. Innovator's dilemma. And so on. (I do not say any of these things pejoratively! All of those things apply to me too.)
For the same reason the 40yo chair I currently sit in is not being replaced with Razer UltraSeat XR3000-A. It's comfortable, fits the workplace around it, and there's no reason for getting a replacement and rebuilding everything. (Partially because a Razer-like chair already stands nearby taking care of my clothes, but that's where the analogy ends.)
Someone designed unix based on the idea that some system functions are both core OS functions AND tools for human use. That leads to some bizarre outcomes decades later like "there must be a program called xyz that accepts these arguments and works exactly like this".
There are multiple alternatives you can already use as an alternative, like ripgrep. What are you proposing, switching out the command `grep` for another utility?
Sounds like that could introduce a ton of breakage, for little value. People who want a faster grep will use a different thing, while people who use grep can continue to use it. Sounds like an ideal situation already.
grep is a general purpose tool for searching for text in all types of files, baked into the standards for UNIX. Some programmers use it to search source code. Other people use it for other types of text searches that have nothing to do with source code, they rely on it in scripts, they don't use it as part of a text-based programmer UI, they rely on it to never crash, etc.
ripgrep is a specialist, opinionated tool, designed primarily to search through source code repositories.
There's not much you can add to general purpose text search to make it faster; you can make it use mmap() at the risk of it crashing on truncated files, you can reduce the expressiveness of regular expressions so they can be computed faster. You could throw out general support for all locales and charsets and hardcode support for only UTF-8 / UTF-16, but you shouldn't.
> There's not much you can add to general purpose text search to make it faster
Oh I beg to differ! The blog post goes into this. Here's a simple demonstration using ripgrep 14:
$ ls -l full.txt
-rw-rw-r-- 1 andrew users 13113340782 Sep 29 12:30 full.txt
$ time rg -c --no-mmap 'Clipton' full.txt
294
real 1.419
user 0.539
sys 0.879
maxmem 15 MB
faults 0
$ time LC_ALL=C grep -c 'Clipton' full.txt
294
real 6.911
user 6.078
sys 0.829
maxmem 15 MB
faults 0
$ time rg -c --no-mmap 'DMZ|Clipton' full.txt
1070
real 1.643
user 0.747
sys 0.894
maxmem 15 MB
faults 0
$ time LC_ALL=C grep -E -c 'DMZ|Clipton' full.txt
1070
real 8.317
user 7.384
sys 0.930
maxmem 15 MB
faults 0
No memory maps. No multi-threading. No filtering. No fancy regex engine features or reducing expressiveness. No locales. No UTF-8. No UTF-16. Just a simple literal and a simple alternation of literals. It's just better algorithms.
Also, you can disable ripgrep's opinions with `-uuu`. It's not designed to just be for code searching. You can use it for normal grepping too. It will even automatically revert to the standard grep line format in shell pipelines.
These benchmarking results are seven years old, so perhaps it has been.
My entirely anecdotal and unscientific impression is that rg and grep perform similarly on Linux (though rg has nicer defaults for searching through source code). The old version of grep that Apple preinstalls on the Mac was slower last time I checked though.
Complete guess: it works just fine for 99.9999% of users, but greater than (1-0.999999)% chance that it would break compatibility or have a bug. Anyone who would need the performance gain would know about specialized alternatives.
I guess it is because after decades of use, grep has probably been fixed to handle lots of user cases that the new tools don´t handle because they haven´t found them yet.
Like automatic encoding detection and transparently searching UTF-16?
Or simple ways for composing character classes, e.g., `[\pL&&\p{Greek}]` for all codepoints in the Greek script that are letters. Another favorite of mine is `\P{ascii}`, which will search for any codepoint that isn't in the ASCII subset.
Or more sophisticated filtering features that let you automatically respect things like gitignore rules.
Those are all things that ripgrep does that grep does not. So I do not favor this explanation personally.
ripgrep has just about all of the functionality that GNU grep does. I would say the two biggest missing pieces at this point are:
* POSIX locale support. (But this might be a feature[1].)
* Support for "basic" regexes or some equivalent that flips the escaping rules around. i.e., You need to write `\+` to match 1 or more things, where as `+` will just match `+ literally.
Otherwise, ripgrep has unfortunately grown just about as many flags as GNU grep.
Honestly: I rely on my shell scripts working and any "grep replacement" has to work with all the old crusty shell scripts out there, likely including ones that use odd "quirks" and GNU options.
If you want to innovate in this space, why sign up for all that? Invent a better wheel, and if people like it, they'll migrate over time.
I remember using ag in the old days, and I use rg now. But there's things rg does by default that I don't like at times... so I go back to old fashioned grep.
rg is at the point where many programmers use it. I think it is on its way to becoming one of those "standard tools". It needs... another 5 years?
When POSIX has a rg standard... we'll know ripgrep "succeeded" and teargrep will soon come into existence ;)
Using Ripgrep via Consult [1] in Emacs is bliss. It's like the rg+fzf thing that some have made, but all inside Emacs. I use the `consult-ripgrep` command all the time, and sometimes I use it to make project-wide edits too! Workflow is search with `consult-ripgrep` -> export results to buffer -> edit buffer -> commit edits back to files. Details at [2] (includes video of me working it)
Very often, I don't want to look for files that aren't tracked under Git VC, and I'm not looking for matches in binary files, so by default, ripgrep does that, which can cut time by 99%.
I used to grep in small dirs, now I can I can ripgrep in my whole home, not that I do it, but I can. That + Sourcegraph on master branch, and it makes searching for any other thing than plain text feel sooo slow (Atlassian Confluence and Jira, Google docs, etc.).
Maybe I'm missing something but I only use it with AND conditions (usually in the form of 'foo 'bar and it only matches lines with foo AND bar both present)
A search for `rg -e foo -e bar` will return lines that match either foo or bar. Some lines may have both, but it isn't required.
The standard way to run "AND" queries is through shell pipelines. That is, `rg foo | rg bar` will only print lines containing both. But composition usually comes with costs. The output reverts to the standard grep format and it doesn't interact nicely with contextual options like -C/--context.
Semi off-topic, I've coded a ncurses-frontend to navigate and filter grep-like results which might be of interest to some of you: https://github.com/gquere/ngp2
Not exactly but yes, it ultimately uses the `memchr` crate [1] which provides SIMD-optimized character and string search routines. But it uses `_mm256_cmpeq_epi8` instead of `_mm256_sad_epu8`.
I think Wojciech Muła, who devised the original SIMD-oriented Rabin Karp algorithm, also did measure MPSADBW approaches and found that it is not a good fit for general string-in-string searches [1]. Maybe not today though.
Would be nice if ripgrep was drop in compatible with grep. I'd feel like a dick writing a shell script for other people to use and forcing them to install a new grep
Even if the answer is instant, you have a 50% performance improvement in your search just from typing "rg" instead of "grep"!
From my perspective it's a no brainer. I don't HAVE a grep (because I don't have a Unix) so when I install a grep, any grep, reaching for rg is natural. It's modern and maintained. I have no scripts anywhere that might expect grep to be called "grep".
Of course if you already have a grep (e.g. you run Unix/Linux) then the story is different. Your system probably already has a grep. Replacing it takes effort and that effort needs to have some return.
Well, a cmd script for msys64 grep in my \CmdTools is named `gr`. It feels more natural, because index-then-middle finger also does. Thinking of it, I actually hate starting anything with a middle finger (no pun). Also learning new things that do the same thing as the old one.
I am amused by this comment, because it shows a dramatically different type of thinking. I have probably have thought "I wish this was faster" for nearly everything I do on a computer :)
DB logs can get HUGE. logrotate for them is currently daily. I briefly wanted to tune it to help alleviate the issue, but honestly it didn’t and doesn’t matter, given the infrequency with which they’re directly accessed. No risk of running out of disk space, and the DBAs like them how they are, so meh. There are other things to worry about.
A few years ago I worked on a Solaris box that would lock the whole machine up whenever I grepped through the log files. Like it wouldn't just be slow, the web server that was running on it would literally stop serving requests while it was grepping.
My best guess is your grep search was saturating I/O bandwidth, which slowed everything else to a crawl.
Another possibility is that your grep search was hogging up your system's memory. That might make it swap. On my systems which do not have swap enabled but do have overcommit enabled, I experience out-of-memory conditions as my system essentially freezing for some period of time until Linux's OOM-killer kicks in and kills the offending process.
I would say the first is more likely than the second. In order for grep to hog up memory, you need to be searching some pretty specific kinds of files. A simple log file probably won't do it. But... a big binary file? Sure:
grep -a burntsushi /proc/self/pagemap
Don't try that one at home kids. You've been warned. (ripgrep should suffer the same fate.)
(There are other reasons for a system to lock up, but the above two are the ones that are pretty common for me. Well, in the past anyway. Now my machines have oodles of RAM and lots of I/O bandwidth.)
I tested this on a large repo in 2016 when I installed several tools (including rg and ag) to compare speed. I don't have the metrics anymore, but the results were pretty clear then. According to the benchmarks from the OP, git grep is pretty comparable to rg in a large git repo. I guess different benchmarks give slightly different results, but the OP acknowledges that git grep is very fast. Bonus is that it comes preinstalled with git and can search through commit history.
It really just depends. The way I like to characterize `git grep` (at present) is that it has sharp performance cliffs. ripgrep has them too, to be sure, but I think it has fewer of them.
If you're just searching for a simple literal, `git grep` is decently fast:
But if you switch it up and start adding regex things to your pattern, there can be substantial slowdowns:
$ time LC_ALL=C git grep -c -E '\w{5,}\s+PM_RESUME'
Documentation/dev-tools/sparse.rst:1
Documentation/translations/zh_CN/dev-tools/sparse.rst:1
Documentation/translations/zh_TW/sparse.txt:1
real 5.704
user 55.671
sys 0.585
maxmem 207 MB
faults 0
$ time LC_ALL=en_US.UTF-8 git grep -c -E '\w{5,}\s+PM_RESUME'
Documentation/dev-tools/sparse.rst:1
Documentation/translations/zh_CN/dev-tools/sparse.rst:1
Documentation/translations/zh_TW/sparse.txt:1
real 24.529
user 4:34.42
sys 0.753
maxmem 211 MB
faults 0
$ time LC_ALL=en_US.UTF-8 git grep -c -P '\w{5,}\s+PM_RESUME'
Documentation/dev-tools/sparse.rst:1
Documentation/translations/zh_CN/dev-tools/sparse.rst:1
Documentation/translations/zh_TW/sparse.txt:1
real 1.372
user 16.980
sys 0.647
maxmem 211 MB
faults 1
$ time rg -c '\w{5,}\s+PM_RESUME'
Documentation/translations/zh_CN/dev-tools/sparse.rst:1
Documentation/dev-tools/sparse.rst:1
Documentation/translations/zh_TW/sparse.txt:1
real 0.082
user 0.226
sys 0.612
maxmem 18 MB
faults 0
In the above cases, ripgrep has Unicode enabled. (It's enabled by default irrespective of locale settings. ripgrep doesn't interact with POSIX locales at all.)
Thanks for clarifying! I use `git grep -IPn --color=always --recurse-submodules` many times a day, every day. I hasn't yet let me down, but I don't search for unicode when working on source code. I do use regex though, using the -P switch.
I don’t think there’s been a point to using `git grep` since ack started parsing gitignore. As far as I’m concerned the use case of `git grep` is to search into non-checked-out trees (by giving it a tree-ish). And it’s not super great at that, because it searches a static tree-ish, so pickaxe filters are generally more useful (though they’re slow).
Once again mercurial has/had more useful defaults, `hg grep` searches through the history by default, that’s it’s job.
Small clarification: ack did not and does not respect your gitignore files. I just tried it myself, and indeed it doesn't. And this is consistent with the feature chart maintained by the author of ack: https://beyondgrep.com/feature-comparison/
One practical result of this is that it will mean `ack` will be quite slow when searching typical checkouts of Node.js or Rust projects, because it won't automatically ignore the `node_modules` or `target` directories. In both cases, those directories can become enormous.
`ack` will ignore things like `.git` by default though.
I believe `ag` was the first widely used grep-like tool that attempted to respect your .gitignore files automatically. (Besides, of course, `git grep`. But `git grep` behaves a little differently. It only searches what is tracked in the repo, and that may or may not be in sync with the rules in your gitignores.)
So? My distro doesn't come with almost any of the tools I need for day to day work. It has never been a problem for me to install a new editor or compiler on a new machine, I don't see why ripgrep would be any different. Especially since it's usually a single command to install anyway.
edit this comes out of https://github.com/junegunn/fzf/blob/master/ADVANCED.md and even though you might not want to use most of what is in there, it's still worth glancing over it to get ideas of what you could do with it