Unfortunately, git rebase has a very very annoying limitation that git merge doe...

broeng · on Nov 22, 2022

In your example, you pretty much have to change the same line, or neighbouring line, those 10 times to end in that scenario. If it's just somewhere else in the file, git auto-merging will handle it just fine.

It seems like a very contrived example to me. We have been running rebase/fast-forward only for close to 10 years now, and I have never experienced anything that unfortunate.

Beltalowda · on Nov 22, 2022

> It seems like a very contrived example to me.

I run in to this quite frequently, even on projects where I'm the only one working on it (I tend to have a lot of things going on in parallel). Once branches diverge and commits accumulate it can become a right pain. Usually my solution is to merge master into the branch just to keep up to date and then just undo everything, make one new commit in the master, and rebase that. But in some more difficult cases it was "just merge and fuck it because life's too short". I've also just manually "copy/paste merged" things to a new branch, because that seemed quicker than dealing with all the merges/conflicts.

Maybe there are better ways of doing this, and arguably I shouldn't have all these long-lived branches in the first place (but it works well for me, so...), but it's not that much of a contrived edge case.

couchand · on Nov 22, 2022

> arguably I shouldn't have all these long-lived branches in the first place

This is the problem here. If you have multiple long-lived branches, there's no technical solution to preventing rot -- you must actively keep them in sync.

Regularly merging in main is the opposite of the proper solution. Constantly rebasing on top of main is the proper solution.

MaxBarraclough · on Nov 22, 2022

  > If you have multiple long-lived branches, there's no technical
  > solution to preventing rot -- you must actively keep them in sync.

Rebasing isn't an alternative to this, it's just a different way of manually keeping in sync.

  > Regularly merging in main is the opposite of the proper solution.
  > Constantly rebasing on top of main is the proper solution.

Why? You've given no justification for your preference.

couchand · on Nov 22, 2022

> Rebasing isn't an alternative to this, it's just a different way of manually keeping in sync.

I never said it was, I said it was the right way to keep them in sync.

> Why? You've given no justification for your preference.

I don't need to, the GGGGP said it perfectly: https://news.ycombinator.com/item?id=33705026

jacobsenscott · on Nov 22, 2022

A rebase and a merge result in the same code. A rebase is more error prone though. Just because someone "feels" a merge isn't as safe doesn't make it so.

couchand · on Nov 22, 2022

> A rebase and a merge result in the same code.

If done correctly, that's true, but it's beside the point. The reason to prefer one over the other is the failure mode.

> A rebase is more error prone though.

On what metric? In my experience, a merge is far, far more likely to silently introduce a production bug. I've never seen a rebase fail that way.

simiones · on Nov 23, 2022

Doesn't rebase use the exact same automatic merge algorithm as a merge? They are equally as likely to introduce a production bug. Especially if adding a tool like rerere into the mix to do even more auto-magic merging when you hit the differences between rebase and merge.

couchand · on Nov 23, 2022

There are two distinct possible problems:

1) The merge auto-applies cleanly, but the merged code is wrong. This is pretty niche, usually, but happens in certain edit patterns. I've never seen this produce a syntactically-valid, semantically-invalid construct (but I suppose it's possible) so generally these are caught by the compiler.

2) The merge does not auto-apply, so you get into manual resolution. This is where things get hairy.

The merge commit really ought not have any changes of its own, but lots of people consider minor conflict resolution legal. So you end up with a bunch of code changes that logically belong to another commit, and are grouped together for purposes of expediency.

Rebase applies your changes to another branch as though they had been made there originally. If a conflict comes up, you already have all the context needed for how to resolve it, because you just wrote that code. The fix goes where it belongs.

All I can tell you is that I've been bit by merge-induced production bugs enough times that I now work to avoid that particular failure mode.

simiones · on Nov 23, 2022

> The merge commit really ought not have any changes of its own, but lots of people consider minor conflict resolution legal.

I'm not sure where this rule comes from. For code review, I for one normally review all of the changes that are going into master, and only look commit-by-commit if it becomes overwhelming - so, unless this is a huge merge (which should generally be avoided anyway), I wouldn't really see how this is a problem.

The only real problem I have with merging into your local branch to keep it in sync with master is the way it pollutes history when it is finally merged back into master. This is enough of a problem that I and my team always rebase unless we end up in one of these rare cases that I was highlighting.

Beltalowda · on Nov 23, 2022

> This is the problem here. If you have multiple long-lived branches, there's no technical solution to preventing rot -- you must actively keep them in sync.

Well, merge actually works much smoother and rebase gives a lot more grief, so the problem is with rebase.

> Regularly merging in main is the opposite of the proper solution. Constantly rebasing on top of main is the proper solution.

The "proper" solution is the one that allows me to get stuff done. The only thing that matters is how the main branch ends up looking in the end, and what I do before that isn't really all that important.

Another problem with rebase is when multiple people are working on the branch; it requires careful coordination if you don't want to lose work. Overall, just merge in main is usually the best strategy here.

couchand · on Nov 23, 2022

Always surprising when folks are confused about how to collaborate on git branches... I'd expect the recursive solution to be more obvious!

> The "proper" solution is the one that allows me to get stuff done.

Yeah, but the stuff that needs to get done doesn't end with your commit, it starts there. Merge commits are prone to introduce unexpected and uncaught bugs: rebases just don't.

Beltalowda · on Nov 23, 2022

> Merge commits are prone to introduce unexpected and uncaught bugs: rebases just don't.

How so? If I make an error with a rebase then I risk losing my changes. You can fetch it from the local reflog, but that's not so easy. With a merge I have a merge commit which records what was merged.

couchand · on Nov 23, 2022

We're talking past each other. You're describing issues that come up during the git workflow. I'm talking about production bugs.

Beltalowda · on Nov 23, 2022

You're being quite cryptic in this entire thread and I to be honest I have no idea what you're talking about any more.

pqdbr · on Nov 22, 2022

How do you constantly rebase on top of main if more than one person is working on the feature branch?

couchand · on Nov 22, 2022

recursion

simiones · on Nov 23, 2022

What does that mean in the context of git?

couchand · on Nov 23, 2022

There is nothing special about the main branch! This is the recipe to collaborate in git:

- Pick a shared branch to work on.

- Work.

- If you complete within a day, push to shared branch.

- If you need to hold onto it longer, make a new branch, switch to it.

- (Possibly recurse.)

- Complete work, rebase new branch on shared branch, push.

And of course feel free to replace branch with remote/branch. It is distributed, after all, nothing special about any particular server.

A worked example, to make it more concrete:

- Pick the shared branch main.

- Work on a feature for more than a day, so:

- Create a feature branch feature/e2ee, switch to it.

- Recurse, since you'll be doing database updates and I'm adding the UI.

- Pick the shared branch feature/e2ee.

- I create a branch git.sr.ht/~couch/new-twitter/feature/e2ee

- You work and push to feature/e2ee.

- I complete my work, rebase the branch, and push to feature/e2ee.

- We are satisfied that we've completed the feature, rebase and push to main.

simiones · on Nov 23, 2022

That doesn't really solve much: if both you and I rebase our personal feature branches onto master at different places, when we both try to push to the shared feature branch, we'll have a REALLY bad time - especially if we actually had to do conflict resolution.

ycombobreaker · on Nov 23, 2022

> arguably I shouldn't have all these long-lived branches in the first place (but it works well for me, so...)

Given that this scenario is common for you but sounds contrived to others, I would argue that this doesn't work well for you. It's just familiar enough that you're willing to deal with some pain.

Short-lived feature branches sidestep this hell. Longer-lived projects can almost always be partitioned into a series of shorter mergeable steps. You may need support/buy-in from your manager, I hope you get it.

Beltalowda · on Nov 23, 2022

It's not a organisational/manager problem; it's just how I like to work. I often work on something and then I either get bored with it or aren't quite sure what the best way is to proceed, so I work on something else and come back to it later (sometimes hours later, sometimes days, weeks, sometimes I keep working on it until I get it right). I do this with my personal projects as well where I can do whatever I want.

I know some people think this is crazy, but it works well for me and I'm fairly productive like this, usually producing fairly good code (although I'm not an unbiased source for that claim).

In the end I don't want to radically change my workflow to git or other tooling; I want the tooling to adjust to the workflow that works well for me.

frant-hartm · on Nov 22, 2022

Doesn't git's rerere help here?

Beltalowda · on Nov 23, 2022

I looked at it before and decided it was too "magic" and it frightened me.

So probably? But I want to avoid https://i.redd.it/jdqjhi8qv3x71.jpg

michaelt · on Nov 22, 2022

Sounds like you've never worked on a project with a file everyone wants to append to :)

If every error in your system needs a separate entry in the error enum, or every change needs an entry in the changelog - loads of changes will try to modify the last line of the file.

account42 · on Nov 22, 2022

Even multiple appends are not that bad for rebasing - if you put the remote changes before your own then after the first commit the context for your remaining commits will be the same.

If order actually matters then yeah, git can't magically know where each new line should go.

broeng · on Nov 23, 2022

Oh, I have. :-)

I'm not saying these situations are impossible. But you can work towards reducing when they arise. If everyone needs to change the same file, then it sounds like something should be refactored (it's probably a quite big file as well?).

If every error needs to go to the same error enum, that sounds like an error enum that might benefit from being split up.

And if every change needs to write to a common changelog file, I would personally find a new way to produce that changelog.

If it's that big a painpoint, then I would look into different ways to get around it.

olddustytrail · on Nov 22, 2022

Depending on the format of your files, entries like "changelog merge=union" in your .gitattributes file might work for you.

simiones · on Nov 22, 2022

It happens pretty often when two different people are adding a new function in the same area of a file. It's likely that as you're working on that function, you'll be modifying the surrounding lines a few times (say, you have a first pass for the happy path, then start adding error handling in various passes; or, handling one case of an algorithm in each commit).

Rebase is still by far the most common case in our repo, as yes, these cases appear very rarely. But when they do happen, it's often worth it to do a merge and mess up a history a little bit (or squash, which messes with history in another way) rather than resolving conflicts over and over.

Someone else was also suggesting rerere for this use case, but I've never used it myself and I don't know how well it actually handles these types of use cases.

broeng · on Nov 23, 2022

It definitely can, and it also sometimes happens to us.

But we try to reduce the chance this happens quite a bit, by avoiding letting files grow too big, for example.

Other things we do, is use codeformatting with rules that reduce the chance of merge conflicts. For instance, instead of having imports like:

  import SomePackage.{A, B, C}

we format it to:

  import SomePackage.A
  import SomePackage.B
  import SomePackage.C

That alone helps a lot. Other formatting rules that avoid dense lines, and instead splits over multiple lines also have a huge impact on merge-conflicts.

collinvandyck76 · on Nov 22, 2022

It's not as contrived as you may think. I, along with what I imagine are many others, do a lot of frequent micro-commits as time goes on and the feature becomes more complete, with a lot of commits in the same area of any given file. Rebasing a development branch in this state is pretty gnarly when a conflict arises.

Sadly, my current approach is to just reset my development branch to the merge base and make one huge commit, and then rebase.

broeng · on Nov 23, 2022

I do a lot of micro-commits as well, though I rarely find that other members of my team are doing the same, to the same files, at the same time.

When that happens, we look into if it's possible to do more frequent merges (fast-forward rebases through Gerrit, to be specific) of our smaller commits to master, so we don't accumulate too much in isolation.

I find it helps reducing bugs as well, if two or more members are doing active work in the same area in that way, it's not good to be working in complete isolation as it just opens up for bugs because of incompatibility with the work going on in parallel.

ncann · on Nov 22, 2022

Yeah that scenario only ever happens if you have an extremely large branch that hasn't been merged into the target branch for a long time (like a feature branch that takes months to develop), which btw isn't really something that should be done anyway (always try for more frequent merge with small side branches).

semiquaver · on Nov 22, 2022

As sibling mentioned, this is totally solved by git-rerere.

BiteCode_dev · on Nov 22, 2022

Partially. Unfortunatly, rerere is not perfect, and will only solve 80% of the cases.

For big rebase, this can add up to a lot, which I just paid the price last week.

booleandilemma · on Nov 22, 2022

When can we move to Sapling again?

IshKebab · on Nov 22, 2022

How would Sapling avoid this? As I understand it it uses the same data model as Mercurial which is really the same as Git's. I think you would need something like Pijul to solve it nicely. At least as far as I can tell.

I might actually try this in Pijul because I too encounter this semi-regularly (it's not a freak occurrence at all) and my solution is basically to give up and squash my branch before rebasing.

Shish2k · on Nov 22, 2022

I already have - it’s pretty great :D

adgjlsfhk1 · on Nov 22, 2022

you can often solve this by squashing before rebasing.

MaxBarraclough · on Nov 22, 2022

That has its own problems. Separating whitespace-only reformatting commits from substantive commits makes it much easier to inspect the real changes, for instance.

Also, more fine-grain commits can help you trace down a bug, perhaps with the help of git bisect. Once you've tracked down the commit that introduced the bug, things will be easier if that commit is small.

Fortunately you can just merge from master, bringing your code back in sync with master without touching master itself. I see Beltalowda has mentioned this.

erik_seaberg · on Nov 22, 2022

Reviewing a squashed branch is much harder than reviewing one set of closely related deltas, and then reviewing a different set of closely related deltas that happen to overlap.

IshKebab · on Nov 22, 2022

You mean you can often give up and avoid solving the problem by squashing before rebasing?

TheRealPomax · on Nov 22, 2022

To be fair, if you have 10 commits that all change the same file: squash with respect to your first commit, _then_ rebase. If you have lots of commits, always first squash-rebase to your own first commit, and only rebase to current main once that's done.

Rebase is being annoying here mostly because it's doing exactly what you want it to do: warn you about merge conflicts for every commit in the chain that might have any.

jamespwilliams · on Nov 22, 2022

If you squash you decrease the granularity of your git history, though.

TheRealPomax · on Nov 23, 2022

If you have ten different commits all touching the same part(s) of the same file(s), dial down your granularity a little: you've over-committed.

Either that, or you lobbed 10 different issues into the same branch, which is a whole different barrel of "no one benefits from this, you're just making it harder to generate a changelog, can you please not" fish.

quadhome · on Nov 22, 2022

man git-rerere

simiones · on Nov 22, 2022

It often amuses me that some people will say "git is actually easy, you just need to know git commit, git pull, git push, and git branch", but when you go into the details, you find out you have to learn a hundred other rarer tools to actually fix the 5% or 1% use cases that everyone eventually hits.

For what it's worth, I had heard of git rerere before, and have looked at the man page, but haven't understood how it's supposed to work, and haven't had time to play with it to see how well it actually works in practice. `git merge` or `git squash` and accepting a little bit of a mess in history seems much easier than spending time to learn another git tool for some use case, but I fully admit I may be missing out.

Izkata · on Nov 22, 2022

When you hit a merge conflict, rerere (re)members how you (re)solved it and (re)applies the same fix when the same conflict happens again. But using it can create a new problem/annoyance: If you make a mistake with the initial resolution, and revert the merge/rebase to try again, it'll remember the wrong one next time. So you have find and tell it to forget that resolution.

acchow · on Nov 22, 2022

Hmmm I think in your scenario you could avoid resolving the conflict 10 times by using `git rebase --onto`

Suppose "masterX+1" is called latest

Suppose "masterX" is the SHA of your mergebase with master (on top of which you have 10 commits)

`git rebase --onto latest masterX`

jacobsenscott · on Nov 22, 2022

Yes. Usually I just squash merge to main and then `git checkout my-branch; git rebase --hard main`. Sure it squashes all the commits, but keeping them all is nearly never needed.

desbo · on Nov 23, 2022

https://git-scm.com/docs/git-rerere

mamcx · on Nov 22, 2022

I was converted to rebase by my current team, and this hit every time.

I wish it works like merge, or exist a way to merge, resolve conflict, rebase?

HWR_14 · on Nov 22, 2022

Can I asked how they converted you (or do you mean by dictate, as opposed to becoming convinced it was better)? I find myself loving merges and never using rebases. It's not that I cannot describe technically what's happening, but I just don't understand the love.

scubbo · on Nov 22, 2022

(Not the person you replied to, but a passionate rebase-preferred) For me there are two reasons - one aesthetic, one practical.

The aesthetic reason is that it tells a more coherent story. The codebase is a single entity, with a linear history. If I asked you "how old were you last year", and you asked "which me are you asking about?", I'd be confused. Similarly, if I want the answer to the question "what was the codebase like at this point in time // immediately prior to some point?", you shouldn't need to ask clarifying questions. `HEAD^` should only ever point to a single commit.

The practical reason is that it discourages a bad-practice - long-lived branches. The only vaguely compelling reason I have heard for merge commits is that they preserve the history of the change, so that when you look at a change you can see how it was developed. But that's only the case if you're developing it (in isolation) for a long-enough time that `main` will get ahead of you. You should be pushing every time you have a not-incorrect change that moves you closer towards the goal, not waiting until you have a complete feature! If you make it difficult to do the wrong thing _while also_ making it easy to do the right thing (too many zealots forget the second part!), you will incentivize better behaviour.

(Disclaimer - I've been lucky enough to work in environments where feature flagging, CI/CD, etc. were robust enough that this was a practical approach. I recognize this might not be the case in other situations)

And yeah, I'm kinda intentionally invoking Cunningham's Law here, hoping that Merge-aficionados can tell me what I'm missing!

HWR_14 · on Nov 23, 2022

> what was the codebase like at this point in time // immediately prior to some point?", you shouldn't need to ask clarifying questions

I would assume that such a question would talk only about the main branch. However, I will point out that "what was the state of feature X" is only answerable with a non-linear story.

> The practical reason is that it discourages a bad-practice - long-lived branches.

Wait, long-lived branches are bad? Merging in partially done features is good? That seem insane.

First, if the feature is small enough to knock out in an hour, that's great. But sometimes it can take a couple of days. I should hope you have enough activity that the main branch will move in that time.

But committing partial features is crazy. Sometimes you realize the way you are implementing it (or the whole feature) is a bad idea and all the work should be orphaned. Other times, a feature requires changing something (e.g. an API) where a partial change cannot really work - and sometimes where you need to have a meeting before you do it. Consider the feature to be "update dependency X", which means you now have some number of bugs to track down due to the new verison.

Heck, sometimes a feature might need to be mothballed. Sometimes you have to wait for an external dependency to be fixed. And you can chuck your work, commit something broken, mothball it and come back when the external dependency is fixed or switch your dependency.

scubbo · on Nov 23, 2022

> long-lived branches are _bad_? Merging in partially-done features is _good_?

...uhhh, yes? I've never heard anything to the contrary. Can you explain why you think the opposite?

For long-lived branches: The longer a branch exists separately and diverges from main, the more pain you'll create when you try to merge it back in - both because of changes that someone else has made in the meantime (and so, conflicts you'll (possibly) have to resolve), and because you are introducing changes that someone else will have to resolve. The pain of resolving conflicts scales super-linearly - it's much better to resolve lots of small conflicts (ideally, so small that they can be algorithmically resolves) than to resolve one large one. Plus all the arguments from the point below...

For checking-in early and often: flip it around - what is _better_ about having the change only on your local repo, as opposed to pushed into the main codebase? If the code's checked in (but not operational - hidden behind a feature flag), then:

* your coworkers can _see_ that it exists and will not accidentally introduce incompatible changes, and will not start working on conflicting or overlapping work (yes, your work-planning system should also account for that - but extra safety never hurts!) * if you have introduced a broken dependency, or a performance black-hole (which might only be possible if you're running your code in "shadow mode", executing but not affecting the output until it's fully ready - which, again, is only possible if you check in early-and-often!), you can discover that breakage _early_ and start work on finding an alternative (or, if necessary, abandon the whole project if it's intractable) earlier than otherwise

In fact, to take your example - "sometimes you realize the way you are implementing it (or the whole feature) is a bad idea and all the work should be orphaned" - yep! This happens! This is not a counter-example to my claim! Orphaning an inactive "feature" that has been pushed to (but not fully activated in) production has no more impact than abandoning a local branch. Even orphaning a feature that has been partially activated is still fine, so long as it didn't result in irreversible long-term state-updates to application entities (e.g. if it added a "fooFeatureStatus" to all the users in your database, rolling it back will be tricky. But not impossible!). So there are very few (or no) downsides, and all the advantages I described above.

I do agree that API changes are the one exception to this rule - you should have those reasonably nailed down and certain before you make changes, since those affect your clients. But any purely-internal change which can be put behind a feature flag, on an inactive code path, in shadow mode, in canary/onebox testing, or any other kind of "safe to deploy in prod, but not _really_ affecting all of prod" - do it!

HWR_14 · on Nov 23, 2022

I'm not advocating branches should be made longer for no reason, but I see no reason to avoid them. I do think they should be made long if they need to be to encapsulate a feature. I don't think that the pain of resolving conflicts scales super-linearly and that idea doesn't make sense to me. In fact, I think the opposite is true. I admit, that could be a taste issue.

I mistyped at one point by saying to avoid a partial-feature commit when I meant partial-feature merge onto the main branch. Yes, commit to the feature branch often. Hopefully clarifying that resolves most of the issues that you raised as advantages.

Meanwhile, managing partially built features by feature flags seems worse. It has orphaned code migrate into the main codebase and stay there. You brought up a broken dependency. What happens if a dependency is broken and not likely to get fixed for a month? Just leave that code in the main codebase orphaned for a month? Further, having multiple partial feature commits complicates bisecting or simple reading a feature's history.

I concede feature flags for deployment has some advantages, especially for feature specific elevation through testing.

scubbo · on Nov 28, 2022

> I don't think that the pain of resolving conflicts scales super-linearly and that idea doesn't make sense to me. In fact, I think the opposite is true. I admit, that could be a taste issue.

Then we'll have to agree to disagree, as this is pretty fundamental to my argument - everything else ("Your coworkers get to see what you're working on and will notice clashes of intention earlier", "You can run incomplete features in shadow-mode to ensure they don't affect performance in production", etc.) is just sugar.

I really appreciate your well-reasoned and civil discussion!

andreareina · on Nov 23, 2022

Maintaining orphaned code has a cost. Keeping a change you've made to a function (and its callers) that's no longer needed obscures both the history and probably what it does.

Not saying trunk-based is wrong, but to say abandoning a feature is as cheap as in branch-based development fails to account for everything.

xeyownt · on Nov 22, 2022

In my case, I switched rapidly to git-rebase because it produces history that is much cleaner and easier to understand. I only do merge if there is a good reason to preserve history (e.g. some other branches depend on it, or some test reports refer to a given commit).

HWR_14 · on Nov 23, 2022

I guess I find it easier to parse with feature branches than all on main.

mamcx · on Nov 22, 2022

Mostly is about the way they do things, and I always adopt the team ways (also: my initial PRs look weird to them!).

cerved · on Nov 22, 2022

merge, then resolve conflicts with rerere, undo the merge and rebase

YetAnotherNick · on Nov 22, 2022

You should do reverse rebase(if it makes sense lol) for this. Instead of rebasing branch to master, rebase master to branch. The only downside is that it requires many force push in the branch.

moonchrome · on Nov 22, 2022

Yeah force push on master is a huge no no - I can't even remember the number of times I've force pushed wrong shit in a hurry - I can't imagine entire team dealing with this.

simiones · on Nov 22, 2022

I would first say that I would sooner re-code the whole feature by hand from memory than ever rebasing master onto anything for any serious project.

Even if we were to do that, rebasing master is likely to lead to the same issue.

My preferred solution is rebase featureB onto master for the 99% or 99.9% of use cases where this is smooth, and in the rare case that you have too many conflicts to resolve, merge master into featureB (and/or squash featureB then rebase onto master, depending on use case).

cerved · on Nov 22, 2022

> it requires many force push in the branch

Can't say I recommend this approach.