Unfortunately, git rebase has a very very annoying limitation that git merge doesn't. If you have a branch with, say, masterX + 10 commits, and commit 1 from your branch is in conflict with masterX+1, then when you rebase your branch onto masterX+1, you will have to resolve the conflict 10 times (assuming all 10 commits happen in the same area that had the original conflict). If instead you merge masterX+1 onto your branch, you will only have to resolve the conflict once.
Even though I much prefer a linear history, losing 1h or more to the tedious work of re-resolving the same conflict over and over is not worth it, in my opinion.
In your example, you pretty much have to change the same line, or neighbouring line, those 10 times to end in that scenario. If it's just somewhere else in the file, git auto-merging will handle it just fine.
It seems like a very contrived example to me. We have been running rebase/fast-forward only for close to 10 years now, and I have never experienced anything that unfortunate.
I run in to this quite frequently, even on projects where I'm the only one working on it (I tend to have a lot of things going on in parallel). Once branches diverge and commits accumulate it can become a right pain. Usually my solution is to merge master into the branch just to keep up to date and then just undo everything, make one new commit in the master, and rebase that. But in some more difficult cases it was "just merge and fuck it because life's too short". I've also just manually "copy/paste merged" things to a new branch, because that seemed quicker than dealing with all the merges/conflicts.
Maybe there are better ways of doing this, and arguably I shouldn't have all these long-lived branches in the first place (but it works well for me, so...), but it's not that much of a contrived edge case.
> arguably I shouldn't have all these long-lived branches in the first place
This is the problem here. If you have multiple long-lived branches, there's no technical solution to preventing rot -- you must actively keep them in sync.
Regularly merging in main is the opposite of the proper solution. Constantly rebasing on top of main is the proper solution.
A rebase and a merge result in the same code. A rebase is more error prone though. Just because someone "feels" a merge isn't as safe doesn't make it so.
Doesn't rebase use the exact same automatic merge algorithm as a merge? They are equally as likely to introduce a production bug. Especially if adding a tool like rerere into the mix to do even more auto-magic merging when you hit the differences between rebase and merge.
1) The merge auto-applies cleanly, but the merged code is wrong. This is pretty niche, usually, but happens in certain edit patterns. I've never seen this produce a syntactically-valid, semantically-invalid construct (but I suppose it's possible) so generally these are caught by the compiler.
2) The merge does not auto-apply, so you get into manual resolution. This is where things get hairy.
The merge commit really ought not have any changes of its own, but lots of people consider minor conflict resolution legal. So you end up with a bunch of code changes that logically belong to another commit, and are grouped together for purposes of expediency.
Rebase applies your changes to another branch as though they had been made there originally. If a conflict comes up, you already have all the context needed for how to resolve it, because you just wrote that code. The fix goes where it belongs.
All I can tell you is that I've been bit by merge-induced production bugs enough times that I now work to avoid that particular failure mode.
> The merge commit really ought not have any changes of its own, but lots of people consider minor conflict resolution legal.
I'm not sure where this rule comes from. For code review, I for one normally review all of the changes that are going into master, and only look commit-by-commit if it becomes overwhelming - so, unless this is a huge merge (which should generally be avoided anyway), I wouldn't really see how this is a problem.
The only real problem I have with merging into your local branch to keep it in sync with master is the way it pollutes history when it is finally merged back into master. This is enough of a problem that I and my team always rebase unless we end up in one of these rare cases that I was highlighting.
> This is the problem here. If you have multiple long-lived branches, there's no technical solution to preventing rot -- you must actively keep them in sync.
Well, merge actually works much smoother and rebase gives a lot more grief, so the problem is with rebase.
> Regularly merging in main is the opposite of the proper solution. Constantly rebasing on top of main is the proper solution.
The "proper" solution is the one that allows me to get stuff done. The only thing that matters is how the main branch ends up looking in the end, and what I do before that isn't really all that important.
Another problem with rebase is when multiple people are working on the branch; it requires careful coordination if you don't want to lose work. Overall, just merge in main is usually the best strategy here.
Always surprising when folks are confused about how to collaborate on git branches... I'd expect the recursive solution to be more obvious!
> The "proper" solution is the one that allows me to get stuff done.
Yeah, but the stuff that needs to get done doesn't end with your commit, it starts there. Merge commits are prone to introduce unexpected and uncaught bugs: rebases just don't.
> Merge commits are prone to introduce unexpected and uncaught bugs: rebases just don't.
How so? If I make an error with a rebase then I risk losing my changes. You can fetch it from the local reflog, but that's not so easy. With a merge I have a merge commit which records what was merged.
That doesn't really solve much: if both you and I rebase our personal feature branches onto master at different places, when we both try to push to the shared feature branch, we'll have a REALLY bad time - especially if we actually had to do conflict resolution.
> arguably I shouldn't have all these long-lived branches in the first place (but it works well for me, so...)
Given that this scenario is common for you but sounds contrived to others, I would argue that this doesn't work well for you. It's just familiar enough that you're willing to deal with some pain.
Short-lived feature branches sidestep this hell. Longer-lived projects can almost always be partitioned into a series of shorter mergeable steps. You may need support/buy-in from your manager, I hope you get it.
It's not a organisational/manager problem; it's just how I like to work. I often work on something and then I either get bored with it or aren't quite sure what the best way is to proceed, so I work on something else and come back to it later (sometimes hours later, sometimes days, weeks, sometimes I keep working on it until I get it right). I do this with my personal projects as well where I can do whatever I want.
I know some people think this is crazy, but it works well for me and I'm fairly productive like this, usually producing fairly good code (although I'm not an unbiased source for that claim).
In the end I don't want to radically change my workflow to git or other tooling; I want the tooling to adjust to the workflow that works well for me.
Sounds like you've never worked on a project with a file everyone wants to append to :)
If every error in your system needs a separate entry in the error enum, or every change needs an entry in the changelog - loads of changes will try to modify the last line of the file.
Even multiple appends are not that bad for rebasing - if you put the remote changes before your own then after the first commit the context for your remaining commits will be the same.
If order actually matters then yeah, git can't magically know where each new line should go.
I'm not saying these situations are impossible. But you can work towards reducing when they arise. If everyone needs to change the same file, then it sounds like something should be refactored (it's probably a quite big file as well?).
If every error needs to go to the same error enum, that sounds like an error enum that might benefit from being split up.
And if every change needs to write to a common changelog file, I would personally find a new way to produce that changelog.
If it's that big a painpoint, then I would look into different ways to get around it.
It happens pretty often when two different people are adding a new function in the same area of a file. It's likely that as you're working on that function, you'll be modifying the surrounding lines a few times (say, you have a first pass for the happy path, then start adding error handling in various passes; or, handling one case of an algorithm in each commit).
Rebase is still by far the most common case in our repo, as yes, these cases appear very rarely. But when they do happen, it's often worth it to do a merge and mess up a history a little bit (or squash, which messes with history in another way) rather than resolving conflicts over and over.
Someone else was also suggesting rerere for this use case, but I've never used it myself and I don't know how well it actually handles these types of use cases.
That alone helps a lot. Other formatting rules that avoid dense lines, and instead splits over multiple lines also have a huge impact on merge-conflicts.
It's not as contrived as you may think. I, along with what I imagine are many others, do a lot of frequent micro-commits as time goes on and the feature becomes more complete, with a lot of commits in the same area of any given file. Rebasing a development branch in this state is pretty gnarly when a conflict arises.
Sadly, my current approach is to just reset my development branch to the merge base and make one huge commit, and then rebase.
I do a lot of micro-commits as well, though I rarely find that other members of my team are doing the same, to the same files, at the same time.
When that happens, we look into if it's possible to do more frequent merges (fast-forward rebases through Gerrit, to be specific) of our smaller commits to master, so we don't accumulate too much in isolation.
I find it helps reducing bugs as well, if two or more members are doing active work in the same area in that way, it's not good to be working in complete isolation as it just opens up for bugs because of incompatibility with the work going on in parallel.
Yeah that scenario only ever happens if you have an extremely large branch that hasn't been merged into the target branch for a long time (like a feature branch that takes months to develop), which btw isn't really something that should be done anyway (always try for more frequent merge with small side branches).
How would Sapling avoid this? As I understand it it uses the same data model as Mercurial which is really the same as Git's. I think you would need something like Pijul to solve it nicely. At least as far as I can tell.
I might actually try this in Pijul because I too encounter this semi-regularly (it's not a freak occurrence at all) and my solution is basically to give up and squash my branch before rebasing.
That has its own problems. Separating whitespace-only reformatting commits from
substantive commits makes it much easier to inspect the real
changes, for instance.
Also, more fine-grain commits can help you trace down a bug, perhaps with the help of git bisect. Once you've tracked down the commit that introduced the bug, things will be easier if that commit is small.
Fortunately you can just merge from master, bringing your code back in sync with master without touching master itself. I see Beltalowda has mentioned this.
Reviewing a squashed branch is much harder than reviewing one set of closely related deltas, and then reviewing a different set of closely related deltas that happen to overlap.
To be fair, if you have 10 commits that all change the same file: squash with respect to your first commit, _then_ rebase. If you have lots of commits, always first squash-rebase to your own first commit, and only rebase to current main once that's done.
Rebase is being annoying here mostly because it's doing exactly what you want it to do: warn you about merge conflicts for every commit in the chain that might have any.
If you have ten different commits all touching the same part(s) of the same file(s), dial down your granularity a little: you've over-committed.
Either that, or you lobbed 10 different issues into the same branch, which is a whole different barrel of "no one benefits from this, you're just making it harder to generate a changelog, can you please not" fish.
It often amuses me that some people will say "git is actually easy, you just need to know git commit, git pull, git push, and git branch", but when you go into the details, you find out you have to learn a hundred other rarer tools to actually fix the 5% or 1% use cases that everyone eventually hits.
For what it's worth, I had heard of git rerere before, and have looked at the man page, but haven't understood how it's supposed to work, and haven't had time to play with it to see how well it actually works in practice. `git merge` or `git squash` and accepting a little bit of a mess in history seems much easier than spending time to learn another git tool for some use case, but I fully admit I may be missing out.
When you hit a merge conflict, rerere (re)members how you (re)solved it and (re)applies the same fix when the same conflict happens again. But using it can create a new problem/annoyance: If you make a mistake with the initial resolution, and revert the merge/rebase to try again, it'll remember the wrong one next time. So you have find and tell it to forget that resolution.
Yes. Usually I just squash merge to main and then `git checkout my-branch; git rebase --hard main`. Sure it squashes all the commits, but keeping them all is nearly never needed.
Can I asked how they converted you (or do you mean by dictate, as opposed to becoming convinced it was better)? I find myself loving merges and never using rebases. It's not that I cannot describe technically what's happening, but I just don't understand the love.
(Not the person you replied to, but a passionate rebase-preferred) For me there are two reasons - one aesthetic, one practical.
The aesthetic reason is that it tells a more coherent story. The codebase is a single entity, with a linear history. If I asked you "how old were you last year", and you asked "which me are you asking about?", I'd be confused. Similarly, if I want the answer to the question "what was the codebase like at this point in time // immediately prior to some point?", you shouldn't need to ask clarifying questions. `HEAD^` should only ever point to a single commit.
The practical reason is that it discourages a bad-practice - long-lived branches. The only vaguely compelling reason I have heard for merge commits is that they preserve the history of the change, so that when you look at a change you can see how it was developed. But that's only the case if you're developing it (in isolation) for a long-enough time that `main` will get ahead of you. You should be pushing every time you have a not-incorrect change that moves you closer towards the goal, not waiting until you have a complete feature! If you make it difficult to do the wrong thing _while also_ making it easy to do the right thing (too many zealots forget the second part!), you will incentivize better behaviour.
(Disclaimer - I've been lucky enough to work in environments where feature flagging, CI/CD, etc. were robust enough that this was a practical approach. I recognize this might not be the case in other situations)
And yeah, I'm kinda intentionally invoking Cunningham's Law here, hoping that Merge-aficionados can tell me what I'm missing!
> what was the codebase like at this point in time // immediately prior to some point?", you shouldn't need to ask clarifying questions
I would assume that such a question would talk only about the main branch. However, I will point out that "what was the state of feature X" is only answerable with a non-linear story.
> The practical reason is that it discourages a bad-practice - long-lived branches.
Wait, long-lived branches are bad? Merging in partially done features is good? That seem insane.
First, if the feature is small enough to knock out in an hour, that's great. But sometimes it can take a couple of days. I should hope you have enough activity that the main branch will move in that time.
But committing partial features is crazy. Sometimes you realize the way you are implementing it (or the whole feature) is a bad idea and all the work should be orphaned. Other times, a feature requires changing something (e.g. an API) where a partial change cannot really work - and sometimes where you need to have a meeting before you do it. Consider the feature to be "update dependency X", which means you now have some number of bugs to track down due to the new verison.
Heck, sometimes a feature might need to be mothballed. Sometimes you have to wait for an external dependency to be fixed. And you can chuck your work, commit something broken, mothball it and come back when the external dependency is fixed or switch your dependency.
> long-lived branches are _bad_? Merging in partially-done features is _good_?
...uhhh, yes? I've never heard anything to the contrary. Can you explain why you think the opposite?
For long-lived branches: The longer a branch exists separately and diverges from main, the more pain you'll create when you try to merge it back in - both because of changes that someone else has made in the meantime (and so, conflicts you'll (possibly) have to resolve), and because you are introducing changes that someone else will have to resolve. The pain of resolving conflicts scales super-linearly - it's much better to resolve lots of small conflicts (ideally, so small that they can be algorithmically resolves) than to resolve one large one. Plus all the arguments from the point below...
For checking-in early and often: flip it around - what is _better_ about having the change only on your local repo, as opposed to pushed into the main codebase? If the code's checked in (but not operational - hidden behind a feature flag), then:
* your coworkers can _see_ that it exists and will not accidentally introduce incompatible changes, and will not start working on conflicting or overlapping work (yes, your work-planning system should also account for that - but extra safety never hurts!)
* if you have introduced a broken dependency, or a performance black-hole (which might only be possible if you're running your code in "shadow mode", executing but not affecting the output until it's fully ready - which, again, is only possible if you check in early-and-often!), you can discover that breakage _early_ and start work on finding an alternative (or, if necessary, abandon the whole project if it's intractable) earlier than otherwise
In fact, to take your example - "sometimes you realize the way you are implementing it (or the whole feature) is a bad idea and all the work should be orphaned" - yep! This happens! This is not a counter-example to my claim! Orphaning an inactive "feature" that has been pushed to (but not fully activated in) production has no more impact than abandoning a local branch. Even orphaning a feature that has been partially activated is still fine, so long as it didn't result in irreversible long-term state-updates to application entities (e.g. if it added a "fooFeatureStatus" to all the users in your database, rolling it back will be tricky. But not impossible!). So there are very few (or no) downsides, and all the advantages I described above.
I do agree that API changes are the one exception to this rule - you should have those reasonably nailed down and certain before you make changes, since those affect your clients. But any purely-internal change which can be put behind a feature flag, on an inactive code path, in shadow mode, in canary/onebox testing, or any other kind of "safe to deploy in prod, but not _really_ affecting all of prod" - do it!
I'm not advocating branches should be made longer for no reason, but I see no reason to avoid them. I do think they should be made long if they need to be to encapsulate a feature. I don't think that the pain of resolving conflicts scales super-linearly and that idea doesn't make sense to me. In fact, I think the opposite is true. I admit, that could be a taste issue.
I mistyped at one point by saying to avoid a partial-feature commit when I meant partial-feature merge onto the main branch. Yes, commit to the feature branch often. Hopefully clarifying that resolves most of the issues that you raised as advantages.
Meanwhile, managing partially built features by feature flags seems worse. It has orphaned code migrate into the main codebase and stay there. You brought up a broken dependency. What happens if a dependency is broken and not likely to get fixed for a month? Just leave that code in the main codebase orphaned for a month? Further, having multiple partial feature commits complicates bisecting or simple reading a feature's history.
I concede feature flags for deployment has some advantages, especially for feature specific elevation through testing.
> I don't think that the pain of resolving conflicts scales super-linearly and that idea doesn't make sense to me. In fact, I think the opposite is true. I admit, that could be a taste issue.
Then we'll have to agree to disagree, as this is pretty fundamental to my argument - everything else ("Your coworkers get to see what you're working on and will notice clashes of intention earlier", "You can run incomplete features in shadow-mode to ensure they don't affect performance in production", etc.) is just sugar.
I really appreciate your well-reasoned and civil discussion!
Maintaining orphaned code has a cost. Keeping a change you've made to a function (and its callers) that's no longer needed obscures both the history and probably what it does.
Not saying trunk-based is wrong, but to say abandoning a feature is as cheap as in branch-based development fails to account for everything.
In my case, I switched rapidly to git-rebase because it produces history that is much cleaner and easier to understand. I only do merge if there is a good reason to preserve history (e.g. some other branches depend on it, or some test reports refer to a given commit).
You should do reverse rebase(if it makes sense lol) for this. Instead of rebasing branch to master, rebase master to branch. The only downside is that it requires many force push in the branch.
Yeah force push on master is a huge no no - I can't even remember the number of times I've force pushed wrong shit in a hurry - I can't imagine entire team dealing with this.
I would first say that I would sooner re-code the whole feature by hand from memory than ever rebasing master onto anything for any serious project.
Even if we were to do that, rebasing master is likely to lead to the same issue.
My preferred solution is rebase featureB onto master for the 99% or 99.9% of use cases where this is smooth, and in the rare case that you have too many conflicts to resolve, merge master into featureB (and/or squash featureB then rebase onto master, depending on use case).
Even though I much prefer a linear history, losing 1h or more to the tedious work of re-resolving the same conflict over and over is not worth it, in my opinion.