Squash and rebase are doing it wrong.

dtech · on Sept 12, 2022

why do you care about my 10 "wip" commits on a feature branch? Why am I not allowed to package a change in a nice set of commits once it's done/reviewable?

rovr138 · on Sept 12, 2022

Not them, but I would hope the messages are better than "WIP" and expose context vs a commit that simply says 'Implemented X feature". I usually leave those for a tag.

kjeetgill · on Sept 12, 2022

If you don't have commits called WIP you're not committing enough. You're doing it wrong.

I kid, I kid! But I do commit just to push a back up before getting on a train, or a convenient point to diff against as part of a refactor, etc.

They should all be a single commit by the end.

sanitycheck · on Sept 12, 2022

I like seeing small commits, mine and other peoples. It means when something's broken I can easily narrow down the cause to a single small change and either unbreak it myself or point the appropriate team/person at it, making it a quick and easy fix.

If squashing happens then all I can immediately say is "something in this 600 line changeset over 12 files for Feature X broke it". The bug report for that one is going to be more vague, get allocated more story points, and maybe stay in the backlog for several sprints (or forever).

If people are pushing their feature branches and not deleting them that makes life a bit better, but generally people who want their git history to be "clean" and squash commits also want to get rid of old branches.

Someone might say code review should have caught it. Code review almost never catches actual bugs. Or tests? Tests only test stuff that the author expected to break.

The main things I try to ensure are that every commit is small, every commit has a useful message, and no commits should break the build.

int_19h · on Sept 12, 2022

Squashing and rebasing doesn't have to produce one gigantic commit. It can take, say, 20 "WIP" commits (many of which might not even build), and organize them into 5 commits that constitute logical units with descriptive messages. But raw WIP commits themselves are rarely so organized, unless you invest a lot of time and effort into that while coding.

Personally, I prefer my coding to be free of such distractions, and to use SCM facilities as a scratchpad for fast iteration on the code (the ability to easily reverse changes etc); this results in many commits that don't make sense in the final pull request / code review.

sanitycheck · on Sept 12, 2022

I've heard of people doing this, but never seen it in real life. Do each of your "logical unit" commits build and run properly on their own? I'm interested in the concept, if there's a tool/process which makes that easy to achieve - but it sounds like possibly more work than just making good small commits to begin with.

dllthomas · on Sept 12, 2022

I also do this. I prefer the various checks all pass, although I am likely to temporarily break lint when "make the change" and "make it pass lint" are independently obviously correct but look messy when composed. It is very rare (and I'll loudly call it out in the commit message) that a similar thing applies to types or tests and I'm more likely to actually disable the relevant test than suppress lint I'm fixing in the next commit in the same PR (although on reflection maybe I should be doing that).

The bulk of my PRs are still one small commit, but when a feature gets larger I find that "what's the next thing to try to solve the problem" isn't always the same question as "what will make this most readable for a reviewer." I tend to address presentation of the commits in the PR at about the same time I'm addressing readability of the resulting code, as they wind up being related concerns (though not exactly the same thing).

int_19h · on Sept 12, 2022

It really depends on how you choose to define the granularity. I prefer my commits to all be buildable at least, and ideally also pass the tests (so new code + new tests for it should be in the same commit); but that is subjective and there are valid arguments for smaller commits that are not necessarily isolated like that.

I can't think of any special tools you'd need for this? A git history visualizer with the ability to easily diff and squash ranges helps, but this is nothing special for pretty much any modern IDE. At the end of the day, you just look through the WIP branch history and identify chunks that span multiple commits but logically represent a single change towards some goal.

RockingGoodNite · on Sept 12, 2022

"I prefer my commits to all be buildable at least, and ideally also pass the tests (so new code + new tests for it should be in the same commit); but that is subjective"

In my view, it is not subjective. All commits locally must pass unit, integration and e2e tests, and be locally tested by the developer, every...single...one.

sanitycheck · on Sept 13, 2022

Right, OK - pretty manual then. I was envisaging some sort of n-way diffing tool which simulates several staging areas and changes could be moved around line by line, maybe a button to run a build + tests for each in parallel. Gets difficult as soon as code in one of the middle commits needs a bit of extra tweaking to work though, I suppose.

I'll have to try it sometime, I have a feeling it wouldn't normally turn out too differently to what I end up with anyway. I do think all commits should build and run properly - otherwise git bisect and similar processes stop being useful.

rovr138 · on Sept 12, 2022

I think usually my issue is with context. I know the benefit of squashing, but I feel they get abused and you loose a lot of context.

WIP of a function/method, they could be squashed into a good commit that explains what and why. But some people implement a full feature, squash all of it into a single commit that simply says 'implemented Y'. But you loose context on why they modified the different functions/methods.

Yes, you can try to find it, but there's no context so you're just dealing with a huge block of code that modifies other things, doesn't just add new code.

ajross · on Sept 12, 2022

I second the "you're doing it wrong" note. The use of a tool that permits easy squashing and rebasing opens up new ways of doing development where you can commit and checkpoint at any moment, cheaply. So e.g. when you inevitably break something, you can bisect to where it happened, or when you find you've added a mismatched assumption you can see where the thought process went wrong.

You don't need to have things tidy, you know you can recover anything you do. So likewise when you're "done" you can squash and split them into changes that make more sense logically before pushing to some shared tree that someone else is going to have to reason about.

People who develop without tools like this tend to do it in a big flat directory and think about "commits" as something done every day or so. Once you get beyond that style, it feels really clumsy.

int_19h · on Sept 12, 2022

I think that preferences along these lines are inherently subjective, and some people may well really be better served by the more traditional workflow where every commit is something you commit to. I just wish this wasn't presented as the one and only right way to do things, to the point where software actively resists any other.

ajross · on Sept 12, 2022

Isn't that backwards though? Git supports "rare commits that you commit to" just fine. It's Fossil that refuses to support the "commit rapidly and frequently on a whim and fix up a submission later" idiom.

int_19h · on Sept 12, 2022

That's exactly my problem with Fossil.

rovr138 · on Sept 12, 2022

> So likewise when you're "done" you can squash and split them into changes that make more sense logically before pushing to some shared tree that someone else is going to have to reason about.

Which keeps context which is my point.

The other case that I see is squashing it all into one that simply says 'Implemented feature Y' which doesn't provide any context into why something was changed.

dolmen · on Sept 12, 2022

--amend

bayindirh · on Sept 12, 2022

You can't --amend remote, sorry.

If you don't backup your code to remote even if it's not finished, you're taking serious risk.

ajross · on Sept 12, 2022

What does remote storage have to do with branch management? Push to your own branch! That's the core idea behind git development paradigm, and on sites like github it's as easy as clicking a "fork" button.

bayindirh · on Sept 12, 2022

Because remote storage is either my fork of the code I work on, or my private repo which I develop my own software.

Of course it's a branch other than master/main, even if it's on "remote".

The thing is, I do not always continue developing on a single/same system, so I do "transfer" commits. I push unfinished code to its own branch on remote, pull from other system and continue developing.

When the code completes and passes all the tests. I merge to either development to prepare for the next version, or to master if the utility is small enough.

The place where the development branch lives doesn't matter, and --amend ing a single commit during a feature development is not the most correct way either.

Oh, and I don't use GitHub for my own software. That part is over.

sshine · on Sept 12, 2022

Many companies like the idea of working with a single remote where individual developers prefix their branches with their username. The main reason is that the workflow is much simpler with only one remote endpoint. Having only one remote makes continuous synchronisation much easier (you don't need both "upstream" and "origin"). There are other benefits to this: If someone gets sick, you can find their work in the same place, and you can more easily pass over a branch to someone, since everyone shares remote workspace.

The drawback of one remote endpoint is that it becomes less obvious whose rules apply. You could argue that other people can mess up your remotes, but it isn't until you get to very large organisations or public development (e.g. FOSS) that you should need to deal with adversarial behavior.

I've had bosses who got angry that I force-pushed to my own remote branches because they liked to review code without being requested by pulling the branch, and after a force-push that doesn't work. But my defense was always: Let's establish a protocol on how to cooperate; there is surely some way we both get what we want, and that git supports.

cestith · on Sept 12, 2022

Well... you can force push to a remote with sufficient privs, but you really don't want to do that if you can help it.

sshine · on Sept 12, 2022

A remote can be your personal remote endpoint, or it can be on a shared remote endpoint, but prefixed with your username. As long as you `--force-with-lease` to a feature branch you control and agree on a protocol with potential collaborators, the harm is minimal.

I mostly avoid creating these kinds of conflicts when people's limited git experience would cause stress or unnecessary use of people's time fiddling with the history. Or if more than one person is doing stuff independent of one another.

I've had colleagues that frown on rebase, and people who can't not.

One of the beauties (and complexitties) of git is that it allows for this diversity of workflow.

bayindirh · on Sept 12, 2022

Why should I abuse git while I can commit my developments part by part with nice descriptions?

sshine · on Sept 12, 2022

Because you may want to back up your work remotely before you have the time to write something nice. I had to leave halfway through writing a unit test today. My colleague asked me to back up the code by pushing it. And now I'm back, fixing that test, writing that commit message. Most likely we'll merge right after, so my colleague probably won't need to add more, so the branch is in my control.

bayindirh · on Sept 12, 2022

I generally work on a single feature over many commits. Sometimes I need to leave the thing half finished, so commit as is with a nice commit message detailing the state.

At other time I make "transfer commits" since I'll continue development on another machine and need the latest snapshot of the code to continue.

Not all features fit into a single new function, we need to move mountains to rearrange stuff, and to keep code tidy.

As long as the commit messages are clear, and the code works at the end, it's alright.

rovr138 · on Sept 12, 2022

>Not all features fit into a single new function, we need to move mountains to rearrange stuff, and to keep code tidy.

Of course. But when modifying another, you can commit and say,

Implemented function X. Modified Y to accommodate this new argument that's used on X... and so on

>As long as the commit messages are clear, and the code works at the end, it's alright.

I'm fine if the commit messages are clear. The problem is when squashing, some people squash and don't keep commit them clear or some even squash into one that doesn't provide any insight or clarity, 'Implemented feature Y' and you get a diff of thousands of lines that touches everything

bayindirh · on Sept 12, 2022

> Implemented function X. Modified Y to accommodate this new argument that's used on X... and so on

Sometimes I need to write "Implemented function X, but evaluates the result wrong possibly because of this. Fix this first, then continue".

> The problem is when squashing, some people squash...

A single commit touching whole codebase and only says "Bug fix" (or similar) is bad. I concur.

Also commit history should have enough granularity allowing bisection and partial rewind to understand problems and other side effects.

craggyjaggy · on Sept 12, 2022

What message do you use for the commit you make after finishing for the day? Me going home usually does not coincide with any feature or change being completed.

rovr138 · on Sept 12, 2022

Completed function <A>. Started function <B>

Code for function <B> has the skeleton. Still need to add <...>

pizza234 · on Sept 12, 2022

> why do you care about my 10 "wip" commits on a feature branch?

If that's how a team develops PRs, then the commits are structurally suboptimal, and the problem is in the development practices.

Such team won't be able to efficiently bisect, independently of the repository structure.

I personally work with granular, self-standing commits, and with this workflow, non-squashed commits makes sense.

Obviously it's not possible to always have self-standing commits, but it is possible the vast majority of the times.

It takes a lot of practice and discipline, though, and if a team is not willing to put them, then of course, squashing in the only way that makes sense.

samus · on Sept 12, 2022

Why would I want to bisect on a feature branch?

Edit: I strive to keep my feature branches both short (# of commits) and light (# of lines changed).

pizza234 · on Sept 12, 2022

After a feature branch is merged, its content is candidate for any bisect.

If the branch was squashed, bisecting will be less effective, compared to the same branch, merged without squashing (at the conditions of the commits being self-standing).

MichaelCollins · on Sept 12, 2022

Do you find that you often have authoritarian takes, or just in matters pertaining to software development? If the latter, I suggest you pause and reflect on that. There is no one right way to do things, different workflows work for different people. Tools which accommodate the most number of people will be the most popular.