Hacker News new | past | comments | ask | show | jobs | submit login
Git as a debugging tool (lucasoshiro.github.io)
211 points by lucasoshiro on March 30, 2024 | hide | past | favorite | 76 comments



Remember that the more you treat commit history as something to be read, the better these tools get. If you’re just jamming “-m” on `git commit` with nonsense like “fix bug”, “pr comments”, “ci”, then you’re going to have a bad time debugging with Git. Also, if you have mandatory squash and merge, congrats, you’re destroying history all the time that can help you debug things.


This is one of the biggest issues I face at work. I always see "added X", which git can already tell me.

> Also, if you have mandatory squash and merge, congrats, you’re destroying history all the time that can help you debug things.

Well the alternative is 100s of commits like this:

fix bug

added file

fix typo

fix typo

update test

test

test

test

fix typo

added file


Ours isn’t much better. Each commit message has a mandatory ticket number you have to enter. So for details you need to jump to the ticket.

The tickets often don’t have meaningful information in them apart from ‚for details see the attachments‘

Attachments are often either not uploaded or nowadays we do not have read access as they were created by a different team working for a different customer but forwarded to the ‚central backend team’ and for know how protection purposes we are not even allowed to see the attachments.

Effectively resulting in the information that is available telling us: ‚we did stuff‘


I don't know about at your workplace, but in our (non-git) repo the intent of the ticket number isn't so you can enter details about the commit in the ticket. It's to link tickets to particular work items being tracked in the ticket system. You should still add as much detail as possible about what the commit is doing in the commit itself. If anyone really wants to, they can always aggregate all commits against a ticket to get an overall idea of the flow for that task, which might span branches or repos, and of course the tickets can be linked to higher ones tying that commit into the larger plan. But none of that is as important as explaining exactly what the commit's code changes did. So yeah, I would never use the ticket system as a replacement for commit comments.


We've migrated our ticket management system to a few vendors. So a ticket number might come from the system 3 iterations ago... and of course migration wasn't perfect. Because that's the one ticket that didn't get migrated.


> Ours isn’t much better. Each commit message has a mandatory ticket number you have to enter. So for details you need to jump to the ticket.

In my perfect world people would write self-contained commit messages. Links to non-essential elaborations is fine.

But what we have is often (ticket + some bare-bones explanation of what the code change does without explaining why). Okay, so that’s annoying that I have to click through to a slow issue tracker instance for every commit.

But maybe people just want an easily changeable (unlike immutable commits) place to write down what this is about. Is the issue tracker that? Ideally (or second-to-ideally; see first paragraph) the title and description explain what the problem is. If this was a long back and forth issue then hopefully someone has updated the description to point to exactly what the PR/commits are supposed to do. More likely though is that the issue is a stream of consciousness:

- Naive title

- Naive description

- Back and forth troubleshooting in the comments

- The tech lead shoots in with “so, i guess <discard everything> and do X”

And that’s the average good case. I’ve been complaining recently (maybe on two occasions) that I as a secondary (to the tech lead) PR reviewer can’t even easily figure out what the PR is supposed to do based on the issue.

So there is no curation or editing. Even though the issue tracker is elevated to be the source of code change truth by mandating that commits need to have the issue id in the title but almost nothing else is demanded of the commit message.

Now compare all of that to—no matter what is in the issue tracker, no matter if it has good or bad information—taking five minutes in order to write a commit message (or just a PR description which you can use in the merge commit) on a change which took in total three hours to work on with all the back and forth and testing and debugging. Now the snapshot of your understanding of the issue at the time of writing the commit/PR stays there forever, without any need for hyperlinks or external tools.


Same here and tickets most of the time come from bugs so the title and discussion (if any) there are about the problem. The commit message should be about the cause and the solution. Instead people just copy and paste bug title into the message.


Do we work at the same place


That's one alternative. Another alternative is cleaning that up in a rebase into a series of (hopefully) easy to follow individual commits that do one thing, and then a merge commit pulling in the branch with a description of the change as a whole (and a reference to the pr and any relevant tickets). There are other alternatives as well that make various tradeoffs between effort for the author, effort for the reviewer, ease of reading the git history, applicability of various tools, etc.


Yes, I'll never understand the "fix typo" commits when `git commit --amend` is so straightforward.


Better-still is `git commit --fixup {rev}` and `git rebase -i --autosquash`, since then the fix(up) can be placed into the most-appropriate commit, which isn't necessarily the most-recent one.

Of course, that assumes a you're already got a workforce that is able to do resolve minor conflicts when something else near the typo got tweaked in a separate commit, so I'll grant that `--amend` is easier when starting out.


Better still, git-absorb figures out in which recent commit your fixups should go on its own (based on which file it touches)


As a reviewer, —amend means I need to reread the whole commit to see what changed. A tiny commit means it takes me seconds.

My opinion is that a PR should be small enough that it’s desirable to have it squashed into a single commit at the end anyway.


> As a reviewer, —amend means I need to reread the whole commit to see what changed. A tiny commit means it takes me seconds.

A `git commit --fixup` commit can be made which does that. Then when the review is done `git rebase -i --autosquash`. Like squashing but with fine-grained control.


Yeah, fixup commits are the answer when it comes to responses to reviewers, who can easily see just what changed since the last time.

It's even better if there's some system that ensures authors remember to do the squashing, such as by prohibiting actions that would bring fixup commits into the main development branch.

Personally, I use `--fixup` even before making a PR, particularly if there's some work that I want to split into "refactoring prep" versus "the new feature."


> As a reviewer, —amend means I need to reread the whole commit to see what changed. A tiny commit means it takes me seconds.

If the change is in response to PR feedback, this is definitely a meaningful concern. The history in the comment I was responding to seemed to imply it was presented for review in the form it was created with lots of little commits along the way; fixing up the history into a small number of meaningful commits before requesting review should be easier on the reviewer, not harder.

Whether to fix up commits before requesting review, during review, and after approval are three separate questions somewhat separate from the question of whether to squash everything at the end.

> My opinion is that a PR should be small enough that it’s desirable to have it squashed into a single commit at the end anyway.

I mostly agree directionally, or for a sufficiently weak "should", but as we get stricter we get into tradeoffs and it gets more subjective and/or context sensitive. If the code base and desired change are in a state such that making the change you need to make is easily split into several conceptual pieces, but they do not make sense on their own, then either you combine them into one commit and wind up with a big PR that's harder to follow, you keep them separate but clean, or you merge them in separate PRs. Some downsides of that last are that it's less clear to the reviewer what motivates the early changes, it may require redundant work to keep everything working (and the code clear) with some of the changes and not others of the changes, and related work moves further apart in the commit history. On the other hand, it should mean smaller merge conflicts as incompatible changes are addressed sooner.

I'd also caveat "one single commit" in that sometimes one commit changing behavior and then a separate commit formatting (or one commit with some preparatory reorganizing and then another changing behavior) can make it clearer what the real changes are, and the intermediate step might not pass linting/format checks if you try to do it in separate PRs.


If you use a good tool like Gerrit, then you can see a diff of changes made in a review since an arbitrary previous version of the change.


The idea is that --amend would be used before pushing the commit that it amends, and so before a reviewer ever sees the original commit.


The idea of PR review is that the submission is imperfect and that there will be fixes. Like, "typo fix" fixes.


Yes, but the history presented upthread looks much more like someone working before submitting a PR for review, rather than changes made in response to review. How much to amend the history at each step is several separate questions.


And then you have to go through the sysadmin team to temporarily enable push --force on your branch


I think you start from a different opinion of what a PR looks like. You say “commits that do one thing”, but in my workplace PRs already are supposed to be small and do one thing: I _want_ them to be squashed into a single atomic commit that’s easy to revert if needed


A PR should do one thing at a higher level than a commit should do one thing (both at a different level than a function should do one thing).

I expect we're agreed that most PRs should be a single commit with a small number of changes in a small number of places, but IME it's not rare to have situations where dividing the changes into groups makes things clearer but where elevating those groups to the level of PR would make things less clear.


IMO PRs are supposed to do one thing. But they might end up doing a few more things like refactor, clean up whitespace, or even add a new function in order to facilitate the change. And all of these can be put into their own commits.

Now you can make like five PRs for each of those commits. But that seems similar to making five <issue tracker> issues for those commits. You’re already there in the PR. You might not need the overhead of N external items for N commits.


Why would you write such commit messages instead of describing what changed and, if needed, why?

I find it quite rude to change a codebase and not leave an explanation in the version control metadata. Over the lifetime of the application that's the source of truth. You can type in whatever in Jira or Trello, what's in version control will be built and shipped anyway, and they're unreliable. Sooner or later someone will think it's a good idea to 'clean up' and delete stuff, or someone decides to migrate to another project management supplier and issue-tags in commit messages become dead links.


We can do rebase and squash on our side while also developing with silly commits but not pushing them.


The point of squash is that the "history" is nonsense like "fix" etc. There are two types of commit: versions and checkpoints. The latter are just to help you develop and can include stuff like "end of day" that should never end up on master. Squashing is a way to turn those into versions. Blindly squashing every branch down into one commit is stupid, though.


> The point of squash is that the "history" is nonsense like "fix" etc. […]

Using squash when appropriate is good. But then it should be generalized to “rebase” since that doesn’t ever imply a certain strategy like “squash everything”.

> Blindly squashing every branch down into one commit is stupid, though.

Yep, which is what OP is complaining about (mandatory squash).


I usually keep all my work in a stash until I'm ready to create the PR. I used to preface my intermediate commits with "WIP -" and then reset the branch and re-commit everything when I was ready to create the PR, but that was just too much effort


You don't need to reset the branch. Using rebase -i, you can squash all your WIP commits that way.

You can also use it to split and individually pick changes if you want to clean up your commits.

You can automatically fix up commits after review commits using git absorb [1].

[1] https://github.com/tummychow/git-absorb


As a big fan of interactive rebase, when cleanint up a chunk of commits I do a reset as reset the OP described (smetimes mid rebase). It's convenient for inspecting all your changes together, and you can pick and choose changes into different commits easily too... Pretty much equivalent, but I find it more straightforward for some things.


Agreed. A good policy is to include ticket number and a short description.


I save that and one or more “Reviewed-by” trailers for the merge commit message (Yes! You can write commit messages there too!)


Meh, you can easily twist that into a bad policy, too (and yes, people do that). At work, some of my colleagues litter many repos (I don't usually have to work with, thankfully) with dozens of commits having the same ticket ID and the title of the ticket as their subject. Usually pushed straight to trunk (full of foxtrot merges if two people work at the same time). And in the message body, then, there's usually the unhelpful "fix typo".

My opinion: Write readable messages, scope your commits to simple changes, put them on a branch, use `--autosquash`, put the ticket ID in the merge commit's message. It really isn't hard.


git blame shows only who made the last change. Maybe that person only changed the name of a variable, applied a codestyle change, moved a function declaration to another file or many other things that the change was almost irrelevant to the code behavior.

There are a few options that help you out with this:

-M Detect moved or copied lines within a file

-C In addition to -M, detect lines moved or copied from other files that were modified in the same commit.

--ignore-rev <rev> Ignore changes made by the revision when assigning blame, as if the change never happened

--ignore-revs-file <file> Ignore revisions listed in file

https://git-scm.com/docs/git-blame


I wish they would make long option names for everything, including -C and -M. (Perhaps I should contribute that..)

I use short options interactively on the command line, but in scripts and when communicating with other people, I prefer longer options because they are self-documenting and can catch most typos. (For a long option, typos are more likely to result in an invalid option, and thus an error message. For one-letter options, a typo could result in anything..)


FWIW, -C has precedents: `make` and `tar` have the same option with the same meaning.


Yes, sticking to precedent is a good idea in general, too. But that's independent of long options.


I recommend just skipping blame and going to git log -L to see the full evolution of a range of lines, I set up a little keybind in vim which does this for the current visual selection and it works much better than blame.


And for those using Magit in Emacs, you can do this by selecting a region of lines and hitting `C-c M-g l`.

https://magit.vc/manual/magit/Commands-for-Buffers-Visiting-...


This feature is built into Emacs, no Magit needed. It's the vc-region-history command, bound to `C-x v h` by default. It works across all version control systems Emacs supports, not just git.


There's also git time machine to quickly rollback a buffer in place


Nice. Can you share your vim configuration for that?


My config is kind of cluttered so this is a simplified version without dependencies. Glogr is for range history, GLogf for file history and <leader>gc for showing a commit based on hash:

    nnoremap <leader>gc :Gshow <C-R><C-W><cr>
    command! -nargs=1 Gshow enew | set ft=diff buftype=nofile | 0r!git log -p -n 1 "<args>"
    command! -nargs=0 Glogf tabnew | set ft=diff buftype=nofile | 0r!git -C "#:h" log -p --follow "#:t"
    command! -nargs=0 -range Glogr tabnew | set ft=diff buftype=nofile | 0r!git -C "#:h" log -L "<line1>,<line2>:#:t"


These are the things that can be a chore to find in man pages. There needs to be a way to "sort" man page elements to show most commonly used switches first.


I recently found out that git rebase has a --exec option that allows you to execute a shell command for each commit. If the command fails, the rebase pauses and allows you to make changes to make the command pass. I use this to make sure that there is no commit that leaves my project in an unbuildable state.


You can also use `--interactive` to do a lot of complicated stuff.


FWIW, you can use `git rebase --edit-todo` in the middle of an interactive rebase, just in case you changed your mind about what comes next.


Another thing you can do when you have two versions, one which fails and one which doesn't, but where the commits are of the messy work in progress nature and not atomic changes, is gradually stage the changes from the bad version onto the good version until it starts failing

Basically git checkout -p until things fail


That's a smart way to do it. When I was in this situation without having thought of that, I ended up chunking the monster commit into as small changes as possible with `git add -p`, and then used git rebase as my bisect (since I didn't know how bisect worked either) by rewinding through time until i had found the micro-commit that contained the error.


Thanks for the comment! It looks to be a very useful tip for doing after bisect on repositories where people insist to squash pull requests.


Didn't know Git had a bisect feature. I'll keep that in mind for later.

Other than bisect though, I do think a lot of the practices outlined in the article (checking the blame, logs, search, etc) is way easier to do in a web UI, at least for someone like me who hasn't tailored their workflow for the command line. The tooling is so ubiquitous that it's easily available. I personally think GitHub does okay in those regards.


Personally, I've tried some graphical tools (e.g. magit on Emacs and the git tools on JetBrains IDEs), but still prefer CLI as I think those UIs hide too much from the user, specially in more complex repositories (e.g. repos with submodules).

These features (except blame) are more advanced features, that may be hidden in those interfaces. Instead of remember where they are behind some menus or shortcuts, I prefer to remember their CLI as it is faster.


Git Bisect has saved me multiple times. It's such a simple concept, but being able to hone into the commit where something is broken is almost always useful.

I've never tried automating it. That'll be fun to try next time I'm in trouble.


FWIW, there's `git bisect run` and `git bisect replay`.


I never knew about the -s flag with git log. That would be super helpful in many situations for me / my team.


Didn't know about all these (especially some flags) so thank you! I've personally used the bisect feature a lot to track down WTF bugs and git blame is essentially embedded into VSCode for me with the git extension, which also helps a ton.


For interactive blaming from command line, I can recommend `tig blame`: use the `,` binding for blaming the parent that introduced a change, and `<` to "go back".


There are two tools I often wonder why so few developers take the time to understand:

- git

- regex

The return on investment is huge.

I wouldn't call myself an expert on neither, but just having a good understanding has often felt like a superpower, and made me the one who "fixed the difficult problem".


This is the only post in my recent memory that talks well of Git. Most others I've seen, gave Git some heat and hatred.

To me, Git is like vim: you've to "get it" to love it. I say that as long-time Vimmer and ex-Git hater.


TIL about `git log -S` (and `git log -G`) from this article (list commits that increased or decreased the number of times a given identifier is found in the source code). Looking forward to needing to use it, that's a nice trick


> A nice way to see the pathspecs in action is using git ls-files. That command lists all the files in the staging area,

This should read "That command lists all the files in the _working directory_"


I don't think it's the working directory. AFAIK, ls-files will only list files that are known to Git (i.e., files that have been `git add`). ls-files won't list files that are not known to Git, even if they're present in the working directory.


On git commands that accept pathspec, other git adjacent tools also accept path spec. e.g. I tend to use gitk or tig with a pathspec when looking at a bug in a particular file or directory.


Not sure why the article left out `git ls-tree -r`: this is as good as doing ls-files at any commit (given as argument), not just at what's currently checked out.


git diff-tree [1] is also super helpful. I've use it a lot to ensure the codebase after a complex rebase is in an identical state as before.

[1] https://git-scm.com/docs/git-diff-tree


Another way to use git to debug - albeit more tedious - is to manually check out branches (using a binary search) until you find the exact change that caused an issue. This is helpful if you have no idea what part of the code is causing a problem, but you know the problem didn't exist a month ago. Performance issues are a good example.


I don't know about branches, but if you're just traversing history that's a textbook use for git bisect, right?


Yes, failure to rtfa on my part.


Yep. Git bisect automatically does a binary search to find the earliest breaking commit. You can either test each version manually or, if you have a script that will pass or fail, like a unit test, you can tell git and it will do everything automatically. I love it when I get to use git bisect.


A handy trick I've developed is using`git bisect` to solve otherwise intractable rebase conflicts:

https://codeconstruct.com.au/docs/bisect-intractable-rebase-...


You might want to combine this with enabling rerere.


Yeah, I have rerere turned on However, it's not directly beneficial for the process outlined. With what I wrote about you only solve a given conflict at the end of a complete bisect run, not at each bisection point inside a run. The bisect/cherry-pick process is only used to determine whether conflicts do or don't happen at a given upstream commit. Usually you will solve a specific conflict only once, regardless of whether rerere is enabled.


In jetbrain products, you can select any bloc of code and show the git history for this particular bloc. I don’t know exactly which command is behind it but I always thought it would be tedious to do it manually like you described


Probably they use libgit2 instead of git. But you can use git log -L for that.


Thats called git bisect




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: