Remember that the more you treat commit history as something to be read, the better these tools get. If you’re just jamming “-m” on `git commit` with nonsense like “fix bug”, “pr comments”, “ci”, then you’re going to have a bad time debugging with Git. Also, if you have mandatory squash and merge, congrats, you’re destroying history all the time that can help you debug things.
Ours isn’t much better.
Each commit message has a mandatory ticket number you have to enter.
So for details you need to jump to the ticket.
The tickets often don’t have meaningful information in them apart from ‚for details see the attachments‘
Attachments are often either not uploaded or nowadays we do not have read access as they were created by a different team working for a different customer but forwarded to the ‚central backend team’ and for know how protection purposes we are not even allowed to see the attachments.
Effectively resulting in the information that is available telling us: ‚we did stuff‘
I don't know about at your workplace, but in our (non-git) repo the intent of the ticket number isn't so you can enter details about the commit in the ticket. It's to link tickets to particular work items being tracked in the ticket system. You should still add as much detail as possible about what the commit is doing in the commit itself. If anyone really wants to, they can always aggregate all commits against a ticket to get an overall idea of the flow for that task, which might span branches or repos, and of course the tickets can be linked to higher ones tying that commit into the larger plan.
But none of that is as important as explaining exactly what the commit's code changes did. So yeah, I would never use the ticket system as a replacement for commit comments.
We've migrated our ticket management system to a few vendors. So a ticket number might come from the system 3 iterations ago... and of course migration wasn't perfect. Because that's the one ticket that didn't get migrated.
> Ours isn’t much better. Each commit message has a mandatory ticket number you have to enter. So for details you need to jump to the ticket.
In my perfect world people would write self-contained commit messages. Links to non-essential elaborations is fine.
But what we have is often (ticket + some bare-bones explanation of what the code change does without explaining why). Okay, so that’s annoying that I have to click through to a slow issue tracker instance for every commit.
But maybe people just want an easily changeable (unlike immutable commits) place to write down what this is about. Is the issue tracker that? Ideally (or second-to-ideally; see first paragraph) the title and description explain what the problem is. If this was a long back and forth issue then hopefully someone has updated the description to point to exactly what the PR/commits are supposed to do. More likely though is that the issue is a stream of consciousness:
- Naive title
- Naive description
- Back and forth troubleshooting in the comments
- The tech lead shoots in with “so, i guess <discard everything> and do X”
And that’s the average good case. I’ve been complaining recently (maybe on two occasions) that I as a secondary (to the tech lead) PR reviewer can’t even easily figure out what the PR is supposed to do based on the issue.
So there is no curation or editing. Even though the issue tracker is elevated to be the source of code change truth by mandating that commits need to have the issue id in the title but almost nothing else is demanded of the commit message.
Now compare all of that to—no matter what is in the issue tracker, no matter if it has good or bad information—taking five minutes in order to write a commit message (or just a PR description which you can use in the merge commit) on a change which took in total three hours to work on with all the back and forth and testing and debugging. Now the snapshot of your understanding of the issue at the time of writing the commit/PR stays there forever, without any need for hyperlinks or external tools.
Same here and tickets most of the time come from bugs so the title and discussion (if any) there are about the problem. The commit message should be about the cause and the solution. Instead people just copy and paste bug title into the message.
That's one alternative. Another alternative is cleaning that up in a rebase into a series of (hopefully) easy to follow individual commits that do one thing, and then a merge commit pulling in the branch with a description of the change as a whole (and a reference to the pr and any relevant tickets). There are other alternatives as well that make various tradeoffs between effort for the author, effort for the reviewer, ease of reading the git history, applicability of various tools, etc.
Better-still is `git commit --fixup {rev}` and `git rebase -i --autosquash`, since then the fix(up) can be placed into the most-appropriate commit, which isn't necessarily the most-recent one.
Of course, that assumes a you're already got a workforce that is able to do resolve minor conflicts when something else near the typo got tweaked in a separate commit, so I'll grant that `--amend` is easier when starting out.
> As a reviewer, —amend means I need to reread the whole commit to see what changed. A tiny commit means it takes me seconds.
A `git commit --fixup` commit can be made which does that. Then when the review is done `git rebase -i --autosquash`. Like squashing but with fine-grained control.
Yeah, fixup commits are the answer when it comes to responses to reviewers, who can easily see just what changed since the last time.
It's even better if there's some system that ensures authors remember to do the squashing, such as by prohibiting actions that would bring fixup commits into the main development branch.
Personally, I use `--fixup` even before making a PR, particularly if there's some work that I want to split into "refactoring prep" versus "the new feature."
> As a reviewer, —amend means I need to reread the whole commit to see what changed. A tiny commit means it takes me seconds.
If the change is in response to PR feedback, this is definitely a meaningful concern. The history in the comment I was responding to seemed to imply it was presented for review in the form it was created with lots of little commits along the way; fixing up the history into a small number of meaningful commits before requesting review should be easier on the reviewer, not harder.
Whether to fix up commits before requesting review, during review, and after approval are three separate questions somewhat separate from the question of whether to squash everything at the end.
> My opinion is that a PR should be small enough that it’s desirable to have it squashed into a single commit at the end anyway.
I mostly agree directionally, or for a sufficiently weak "should", but as we get stricter we get into tradeoffs and it gets more subjective and/or context sensitive. If the code base and desired change are in a state such that making the change you need to make is easily split into several conceptual pieces, but they do not make sense on their own, then either you combine them into one commit and wind up with a big PR that's harder to follow, you keep them separate but clean, or you merge them in separate PRs. Some downsides of that last are that it's less clear to the reviewer what motivates the early changes, it may require redundant work to keep everything working (and the code clear) with some of the changes and not others of the changes, and related work moves further apart in the commit history. On the other hand, it should mean smaller merge conflicts as incompatible changes are addressed sooner.
I'd also caveat "one single commit" in that sometimes one commit changing behavior and then a separate commit formatting (or one commit with some preparatory reorganizing and then another changing behavior) can make it clearer what the real changes are, and the intermediate step might not pass linting/format checks if you try to do it in separate PRs.
Yes, but the history presented upthread looks much more like someone working before submitting a PR for review, rather than changes made in response to review. How much to amend the history at each step is several separate questions.
I think you start from a different opinion of what a PR looks like. You say “commits that do one thing”, but in my workplace PRs already are supposed to be small and do one thing: I _want_ them to be squashed into a single atomic commit that’s easy to revert if needed
A PR should do one thing at a higher level than a commit should do one thing (both at a different level than a function should do one thing).
I expect we're agreed that most PRs should be a single commit with a small number of changes in a small number of places, but IME it's not rare to have situations where dividing the changes into groups makes things clearer but where elevating those groups to the level of PR would make things less clear.
IMO PRs are supposed to do one thing. But they might end up doing a few more things like refactor, clean up whitespace, or even add a new function in order to facilitate the change. And all of these can be put into their own commits.
Now you can make like five PRs for each of those commits. But that seems similar to making five <issue tracker> issues for those commits. You’re already there in the PR. You might not need the overhead of N external items for N commits.
Why would you write such commit messages instead of describing what changed and, if needed, why?
I find it quite rude to change a codebase and not leave an explanation in the version control metadata. Over the lifetime of the application that's the source of truth. You can type in whatever in Jira or Trello, what's in version control will be built and shipped anyway, and they're unreliable. Sooner or later someone will think it's a good idea to 'clean up' and delete stuff, or someone decides to migrate to another project management supplier and issue-tags in commit messages become dead links.
The point of squash is that the "history" is nonsense like "fix" etc. There are two types of commit: versions and checkpoints. The latter are just to help you develop and can include stuff like "end of day" that should never end up on master. Squashing is a way to turn those into versions. Blindly squashing every branch down into one commit is stupid, though.
> The point of squash is that the "history" is nonsense like "fix" etc. […]
Using squash when appropriate is good. But then it should be generalized to “rebase” since that doesn’t ever imply a certain strategy like “squash everything”.
> Blindly squashing every branch down into one commit is stupid, though.
Yep, which is what OP is complaining about (mandatory squash).
I usually keep all my work in a stash until I'm ready to create the PR. I used to preface my intermediate commits with "WIP -" and then reset the branch and re-commit everything when I was ready to create the PR, but that was just too much effort
As a big fan of interactive rebase, when cleanint up a chunk of commits I do a reset as reset the OP described (smetimes mid rebase). It's convenient for inspecting all your changes together, and you can pick and choose changes into different commits easily too... Pretty much equivalent, but I find it more straightforward for some things.
Meh, you can easily twist that into a bad policy, too (and yes, people do that). At work, some of my colleagues litter many repos (I don't usually have to work with, thankfully) with dozens of commits having the same ticket ID and the title of the ticket as their subject. Usually pushed straight to trunk (full of foxtrot merges if two people work at the same time). And in the message body, then, there's usually the unhelpful "fix typo".
My opinion: Write readable messages, scope your commits to simple changes, put them on a branch, use `--autosquash`, put the ticket ID in the merge commit's message. It really isn't hard.
git blame shows only who made the last change. Maybe that person only changed the name of a variable, applied a codestyle change, moved a function declaration to another file or many other things that the change was almost irrelevant to the code behavior.
There are a few options that help you out with this:
-M Detect moved or copied lines within a file
-C In addition to -M, detect lines moved or copied from other files that were modified in the same commit.
--ignore-rev <rev> Ignore changes made by the revision when assigning blame, as if the change never happened
--ignore-revs-file <file> Ignore revisions listed in file
I wish they would make long option names for everything, including -C and -M. (Perhaps I should contribute that..)
I use short options interactively on the command line, but in scripts and when communicating with other people, I prefer longer options because they are self-documenting and can catch most typos. (For a long option, typos are more likely to result in an invalid option, and thus an error message. For one-letter options, a typo could result in anything..)
I recommend just skipping blame and going to git log -L to see the full evolution of a range of lines, I set up a little keybind in vim which does this for the current visual selection and it works much better than blame.
This feature is built into Emacs, no Magit needed. It's the vc-region-history command, bound to `C-x v h` by default. It works across all version control systems Emacs supports, not just git.
My config is kind of cluttered so this is a simplified version without dependencies. Glogr is for range history, GLogf for file history and <leader>gc for showing a commit based on hash:
These are the things that can be a chore to find in man pages. There needs to be a way to "sort" man page elements to show most commonly used switches first.
I recently found out that git rebase has a --exec option that allows you to execute a shell command for each commit. If the command fails, the rebase pauses and allows you to make changes to make the command pass. I use this to make sure that there is no commit that leaves my project in an unbuildable state.
Another thing you can do when you have two versions, one which fails and one which doesn't, but where the commits are of the messy work in progress nature and not atomic changes, is gradually stage the changes from the bad version onto the good version until it starts failing
That's a smart way to do it. When I was in this situation without having thought of that, I ended up chunking the monster commit into as small changes as possible with `git add -p`, and then used git rebase as my bisect (since I didn't know how bisect worked either) by rewinding through time until i had found the micro-commit that contained the error.
Didn't know Git had a bisect feature. I'll keep that in mind for later.
Other than bisect though, I do think a lot of the practices outlined in the article (checking the blame, logs, search, etc) is way easier to do in a web UI, at least for someone like me who hasn't tailored their workflow for the command line. The tooling is so ubiquitous that it's easily available. I personally think GitHub does okay in those regards.
Personally, I've tried some graphical tools (e.g. magit on Emacs and the git tools on JetBrains IDEs), but still prefer CLI as I think those UIs hide too much from the user, specially in more complex repositories (e.g. repos with submodules).
These features (except blame) are more advanced features, that may be hidden in those interfaces. Instead of remember where they are behind some menus or shortcuts, I prefer to remember their CLI as it is faster.
Git Bisect has saved me multiple times. It's such a simple concept, but being able to hone into the commit where something is broken is almost always useful.
I've never tried automating it. That'll be fun to try next time I'm in trouble.
Didn't know about all these (especially some flags) so thank you! I've personally used the bisect feature a lot to track down WTF bugs and git blame is essentially embedded into VSCode for me with the git extension, which also helps a ton.
For interactive blaming from command line, I can recommend `tig blame`: use the `,` binding for blaming the parent that introduced a change, and `<` to "go back".
There are two tools I often wonder why so few developers take the time to understand:
- git
- regex
The return on investment is huge.
I wouldn't call myself an expert on neither, but just having a good understanding has often felt like a superpower, and made me the one who "fixed the difficult problem".
TIL about `git log -S` (and `git log -G`) from this article (list commits that increased or decreased the number of times a given identifier is found in the source code). Looking forward to needing to use it, that's a nice trick
I don't think it's the working directory. AFAIK, ls-files will only list files that are known to Git (i.e., files that have been `git add`). ls-files won't list files that are not known to Git, even if they're present in the working directory.
On git commands that accept pathspec, other git adjacent tools also accept path spec. e.g. I tend to use gitk or tig with a pathspec when looking at a bug in a particular file or directory.
Not sure why the article left out `git ls-tree -r`: this is as good as doing ls-files at any commit (given as argument), not just at what's currently checked out.
Another way to use git to debug - albeit more tedious - is to manually check out branches (using a binary search) until you find the exact change that caused an issue. This is helpful if you have no idea what part of the code is causing a problem, but you know the problem didn't exist a month ago. Performance issues are a good example.
Yep. Git bisect automatically does a binary search to find the earliest breaking commit. You can either test each version manually or, if you have a script that will pass or fail, like a unit test, you can tell git and it will do everything automatically. I love it when I get to use git bisect.
Yeah, I have rerere turned on However, it's not directly beneficial for the process outlined. With what I wrote about you only solve a given conflict at the end of a complete bisect run, not at each bisection point inside a run. The bisect/cherry-pick process is only used to determine whether conflicts do or don't happen at a given upstream commit. Usually you will solve a specific conflict only once, regardless of whether rerere is enabled.
In jetbrain products, you can select any bloc of code and show the git history for this particular bloc.
I don’t know exactly which command is behind it but I always thought it would be tedious to do it manually like you described