Hacker News new | past | comments | ask | show | jobs | submit login
Hitting every branch on the way down (rachelbythebay.com)
160 points by zdw on April 30, 2024 | hide | past | favorite | 140 comments



So, someone at some point in some commit that we will never see because it got squashed with other commits thought it would be cooler to use absl::StrCat() instead of the "+" operator, and in the process of doing that, they went "what's this useless code using angled brackets instead of quotes?! It works with quotes too, let's delete it!". Or maybe that part was difficult to test, so they simply deleted it to increase test coverage? Guess we will never know, but still, open source is now a bit shittier because of it. Thanks, anonymous clueless developer!

    --  std::string left = "\"";
    --  std::string right = "\"";
    --  if (use_system_include) {
    --    left = "<";
    --    right = ">";
    --  }
    --  return left + name + right;
    ++  return absl::StrCat("\"", basename, "\"");


In the chromium repository, the use of angle brackets in #include statements is banned -- they only use double quotes. They also don't use any system headers per se since their flavor of clang comes with their flavor of libcpp vendored in.

So if the chromium repo is representative of the state of C++ in rest of Google, they ditched it silently like this probably because it's so natural to them :)


So, instead of using

> <> means system headers and "" means project headers (my naive understanding of the difference)

they just convert everything to a project header? That seems bonkers to me. It is intentionally removing useful information.


First, here's the actual guide: https://google.github.io/styleguide/cppguide.html#Names_and_... . Of course they still have to use stuff like #include <windows.h> but it's very very limited.

They also are not removing any info -- most of it IS project headers. To me that's the actual bonkers bit :)

With GCC/Clang, headers found in paths passed with -isystem are headers that are immune to compiler arguments like -Werror because they are, by definition, out of your control. In Google's case, ALL code is already checked in the project repo, including language stdlib. So none of them are system headers "per se".


The main difference is how the search for the file works. “” prefers local directory before the search path, <> goes to the search path first.


Hierarchy is problematic.


Why do you think it was squashed? This article is clear about this being a merge commit. This person got a merge conflict and decided that this was the best way to fix it. It probably worked on their machine. Perhaps not necessarily the optimal fix when you have thousands of users depending on this code, but what do I know?


But according to the article, both of the parents contained the same old code. Could be that it was just collateral damage in a larger conflict, but still it seems that someone used the merge commit to make an off-topic, ad-hoc change that was not only superfluous but code-breaking.


Ok, maybe I'm misunderstanding the line "There's no explanation or other context. Presumably that all got squashed out when it was exported from whatever they use internally." - I was thinking about git squash, but it could have also been some other step in the process of transferring the code from Google's internal systems to GitHub.


It's such a spectacular Chesterton's Fence. Always so frustrating dealing with people who're like "from where I'm standing…" and then they go and make things better, meaning 'more abstract and/or fewer keystrokes'.


Am I misunderstand or is this not on whoever abused the merge commit, whether they made the change personally or not.


It's a shared responsibility.


> I told it to install "protobuf" since I use that library in my build tool. That actually installed "protobuf-24.4,1" which is some insane version number I'd never seen before. All of my other systems are all running 3.x.x type versions.

I was curious about this, so I took a look at the list of protobuf releases[0] and they're...confusing, to say the least. Chronologically, the most recent tags at the time I write this comment are:

v5.27.0-rc1 v3.27.0-rc1 v27.0-rc1 v27-dev v26.1 v5.26.1 v3.26.1 v26.1 v26.0 v5.26.0 v3.26.0

Does anyone here happen to know what's going on here? As best I can tell, they're simultaneously supporting 3 major versions while keeping their minor and patch versions in lockstep, and then having one major version be implicit?

I can almost imagine a scenario where they started out with just major and minor version and then realized they wanted to make breaking changes, which led them down the path of adding a third separate number to the versions, but if they already were going down the path of assuming "wider" versions are newer , why not just stop using version numbers with only major-minor and instead just add a 0 or 1 to the front of all of the continuations of that branch? Also, why synchronize every single minor and patch version between all three major versions? I can understand why it might be useful to continue providing support for multiple major versions at the same time, but I'd expect that _sometimes_ there might be a bug or something in only one of them, and pushing out a release of the other two that don't contain any changes would be pretty strange.

[0]: https://github.com/protocolbuffers/protobuf/tags


I think protobuf might just be a performance art piece, exploring the question of how much complexity it's possible to insert into a simple concept.


Indeed. Having used it and examined the resulting binary formats it uses, I don’t get it. Google makes some nice things, but protobuf isn’t one of them.

IIRC (5 yes ago), it was somewhat self describing with respect to struct offsets but had no (data format) versioning or type information. Never understood that design choice, even in mixed endian+architecture environments.

Ends up being horribly inefficient for small/one-time instances, and not all that great for client/server use.

We were looking at it to get away from C structs and handmade serialization — only to unexpectedly realize that our methods were still better. (We genuinely did not want this outcome!). JSON/yaml were not options for other reasons.

Protobuf is the modern realization of “the king has no clothes”.


That depends. In low latency environments you will see some protobuf but not as much as you'd think. SBE is quite common with the odd exception of some bespoke serialization protocols with delta compression and such.


It's because they changed the versioning format: https://github.com/protocolbuffers/protobuf/releases?page=5 / https://protobuf.dev/news/2022-05-06/

But I suppose old version still receive bugfixes.


I don't understand how either of those links clarify anything; the former shows that they had a version 3.y.z and a version 20.y, and the later shows a change from 3.20.x to 4.21.x, but none of it gives any explanation for why a new major version should "inherit" the old minor version instead of restarting from 0, or why they kept around `x.y` when updating to use `x.y.z` instead of just continuing the existing branch as `0.x+1.y` or `1.x+1.y`. If anything, the fact that they already seem to assume that new major versions should always inherit the minor version would make it _more_ consistent than having one "special" branch that has "narrower" versions.


If only they had a language agnostic way of encoding version information…

Perhaps someone wrote a library for that…


I wont claim to understand C and the reason why <> is better than “”. I assume it is.

But the fact that a merge can have arbitrary changes in it always bothers me!

This is a case for rebase over merge if there are conflicts.

You could have a merge of 2 empty repo parents where the result is the complete source of the latest version of Kubernetes!


Yep. Stuff like this is part of why I'm a rebaser.

Rebase is simple. Always. The end result is obvious and clear and can only be interpreted in one way.

Merge has lots of little sharp edges and surprises if you don't know every single tiniest detail.

Almost nobody knows it in that level of detail, so it's a terrible choice for interacting with anyone else. If you're on your own, sure, do whatever - many things are not built solo though.


My personal preference is merge but using the --no-ff flag. That way you get all the advantages of a rebase (since all your original commits are rebased into the target branch) but you also get a merge commit to confirm that all those changes were a part of the same set of patches.

That can often help a lot to figure out why something ended up the way it did, but you also don't turn your entire history into a flat pile of disparate commits.


Yeah, I do kinda like this setup too. You can have both readable (rewritten) history and structured sub-commits for "a change" rather than a totally flat stack. Plus I don't care about your local history, but I do care about the final history.

It's definitely how I prefer to review code (big changes broken up to isolated portions that are easier to validate, and the whole thing at once so you don't get lost in the trees), so it's how I would prefer to read it later too.

It does still have merge commits where stuff can hide though :/ and you've got to remember --first-parent :/ and all not-merge-focused things so have problems with it :/


"But it is littering the commit history with useless commits!" is what I always hear


And the best answer is: "Why do you do useless commits?".

With `git amend` and `git fixup` you can arrange your commits to be clean, properly documented and self explanatory (and maybe atomic but that's a little harder). It takes a little time but it is hugely beneficial to code reviews and bug investigation.


Some people however see using features like amend, squash, and force push as potentially destructive actions in the hands of a novice, which can lead to loss of not only the author's work but also other people's. Using merge almost never results in any sort of loss and is easier to work with for those who still don't quite understand the risks.


You are associating the use of `amend` and `fixup` with force push. It's perfectly fine to rework your commit history locally and even force push to your own local branch. It should never be possible (except to people administrating your repo) to force-push to any public or even shared branch.

Nobody should be able to force-push to master (or any public branch) except on specific occasion. In that case, someone is authorized, performs their specific action and then get de-authorized.

This is pretty basic.


"Force push" is something that should be restricted to a very few senior people anyway; once you do that, you can't rewrite shared history any more and a lot of the worries go away.


This is a great way to get me to clutter the shared repo with throwaway branches that I’ll later replace (deleting the old one—if you let me).


This is fine! This is a normal part of several popular git workflows. After all, a branch is just a pointer to a commit.

(Our workplace has a mix of github flow, which is one branch per PR: https://docs.github.com/en/get-started/using-github/github-f... ; Atlassian Gitflow https://www.atlassian.com/git/tutorials/comparing-workflows/... ; and the completely different Gerrit flow which ends up very rebase and amend heavy: https://gerrit-review.googlesource.com/Documentation/intro-g... )


I thought you meant blocking all force-push. Shared branches should absolutely be protected (with an “in case of emergency, break glass” option)


Meanwhile the biggest issues I actually see in novices are when their IDE presents them with buttons that don't coincide with a single version control action (can you guess what "sync repo" will do?) and using those puts their checkout into a weird state.

Usually when using the commands directly they're more careful, but have the mindset "an IDE wouldn't intentionally break something so this button must be safe to click".


Custom terms on top of git is a bafflingly bad decision.

IDEs in particular should expose the details all the time, and show the command that's being run, so it can teach people passively instead of misleading them and leaving them stuck when it breaks.


totally agree here. commits are not for saving "your-current-work". Its about marking a definite step of change in the realm of the project itself.

making commits atomic is harder because we tend to just write code, without first breaking up the requirement into atomic pieces


Commits are for saving your current work. Commit early, commit often. Just clean them up when you're done!

Don't push half-baked work on other people! You waste their compute cycles needlessly, from now until the end of time.


I sometimes wish git supported hierarchical commits.

I.e., git can retain two representations of a sequence of commits: the original sequence, and also a larger commit that (a) produces the exact same code change as the sequence, and (b) has its own commit message.


Isn't that what a branch and a merge commit do?


Yep, as long as you use "--no-ff" to force a merge commit (and presumably edit the merge commit message instead of just using the default).

For viewing history you can use "git log --first-parent" to view only those merge commits, which should satisfy the people who love a linear history, without actually losing the intermediate commits.


I have entertained similar thoughts, but then on the other hand people already, and with some righteousness, criticize git for being too complex. It also requires careful assessment where the wormhole ends, how many levels of grouped commits should exist.

Then I remember that I have enough trouble getting a few dozen people together to write well formed and understandable commit messages for one level of commit messages alone. This scheme would require people to extend more energy on constructing commits which is at best something very few care about.

Then there are tickets and other corresponding information, but they could rot for all I care, as they so often do, unless a decent commit log is in place.


FWIW, Mercurial has this, and calls it changeset evolution:

https://www.mercurial-scm.org/doc/evolution/


Merge does that, yes, hence the preference for rebase flows.

(I'm surprised this got a downvote when that's how we got here: a situation in which a change was ""hidden"" in a merge commit that would have been explicit in a rebase workflow)


One idiot with rebase destroys history with no trace. I worked with such an idiot in a parallel team. I can't say how many weeks of work randomly got destroyed by said idiot.

I hate rebase on shared code I don't care how clean jt looks. Don't mess with history.


I really don't understand how you can lose weeks of work. The person that would have done the force push would have the original commit in their reflog. ORIG_HEAD would be set.

Everyone else that had a copy of the repo would have had a copy of the "lost" commits.

I really cannot imagine how many things would have to go wrong for weeks of work to be lost.


There is a hierarchy to these things:

- Person who destroys git history

- Person who hates destroying git history

- Person who knows how to recover "destroyed" history

- Person who knows how to truly destroy git history


> Person who knows how to truly destroy git history

The Gitsatz Haderach


> - Person who knows how to truly destroy git history

… tell me more!


A rebase can be undone, because the old commits keep hanging around in the repo. Rebase doesn't delete or rewrite anything, it just creates new commits and adjusts branch pointers, so the old stuff is still there just hard to get at because nothing points at it anymore.

You just need to find an old commit ID somewhere, normally the reflog.

The old stuff will go away on its own eventually due to git's self-maintenance procedures removing unreachable commits, or it can be done forcefully by adjusting the gc parameters to get rid of it.


I cannot imagine how one could _truly_ destroy git history. You could destroy it locally, sure no problem. You _might_ be able to destroy it on your remote, but if you're using something like Github/Gitlab/Bitbucket I'm sure they'll have a cache that isn't trivial to remove from. But even if you remove it locally and from remote, there's no way you're removing it from other peoples clones. And other people could have pushed to other remotes.

Stuff "leaks" so much in git, that it's really hard to lose work. The only way I could see someone losing work is if they never commit or if they never push. But even if you don't push and just rebase, you're not losing work. You would have to go out of your way to delete git history locally.


What went wrong was multiple teams on unsynchronized 2 week schedules, and a culture that said that we had to accept the force push when other teams released.

So we released at the end of our cycle. It gets used for, if a vague memory serves, end of months billing. Meanwhile someone on another team "merged" our code, and actually randomly dropped a big chunk of our work. A week later they release. Some (but not all) of our features disappear. We pull that and don't notice because we're on a new sprint. Wait until the end of the month, users try to do billing. "Hey, why did you take away those features you built for us a month ago?" "What, we never...?"

We had no clue what happened.

This got to repeat a couple of times before we figured out what must be happening. We made changes to the release process so we could track what was actually released each time, with its history. We tracked down who we thought was making the mistake, but didn't have enough evidence to prove it to his manager. That didn't stop the idiot from making the mistake, but it did streamline the process of recovering it. Meaning we had the version with our feature, we had current code, and "just" had to sort out conflicts rather than rewrite from scratch.

Now that you've heard the story, can you see how weeks of work could be lost before we figured it out? And can you understand how we could have lost history?

This was a decade ago. At my next job we had more competent people. But there we had a huge debates between rebase and merge people. There are arguments on both sides. My conclusion was that about 90% of the time, rebase makes things simpler and easier. But that remaining 10% of the time makes the 90% not worth it.

Just learn how to merge properly.


I'm pretty sure your code was still around at that point, by default git keeps stuff around for 90 days. Although to be fair I don't know if that's the case today nor if it was the case a decade ago.

What you're describing does sound awful, but I'm pretty sure that idiot could have found a way to mess up a merge. The entire workflow sounds completely fucked, I'm not convinced it's entirely fair to blame rebase in that case.

> Just learn how to merge properly.

I know how to merge and rebase properly. My favorite PR merge strategy is rebase + merge --no-ff. So your master branch is nice and linear, but you can still see where your PR merges came in. Let's you have a "all PRs get squashed" view of the world by just adding '--first-parent' to your git commands, but also lets you have the inner details for when you're git bisecting or spelunking trying to figure out why a certain line exists.

Most people hate what I describe though, similar to mixing spaces and tabs.


My code may have been around somewhere. I suspect I'd done gc, in which case it wasn't. But my git skills then were certainly not as good as they are now. (I'd only recently switched from svn at that point.)

I agree that the workflow was a mess in multiple ways. A lot of which were organizational decisions that I was in no position to influence.

Your favorite PR strategy is fine if you're doing it locally. However when it is done on master, you're going to have to get master again by force. Because changed history creates conflicts. Which means that you're going to have to hope that everyone only did it your way, and no idiot created conflicts in some other stupid way that you'll suffer for later.

I'd prefer to merge to head early. Merge to head often. Merge from head often. Don't have long-running shared branches. This does take some other forms of discipline though.


I've never worked anywhere on master directly. Always in feature branches that then get merged to master (ideally with my strategy). So basically master always moves forward and it's history is never rewritten.

Master is always locked down anyway by "something" - no idea what the technical term for Github/Gitlab/Bitbucket is. Stopping people from force pushing to master prevents the sort of stuff that happened to you. Even if you don't have any "idiots", you really don't want a poor intern accidentally slightly pissing off everyone.

> I'd prefer to merge to head early. Merge to head often. Merge from head often. Don't have long-running shared branches. This does take some other forms of discipline though.

I agree with everything there, except I rebase instead of merge. So when I merge my branch to master, it's a nice neat little package that sits on top of master. It doesn't have the history of 10 merges I did while I was developing because I don't see the value in those merges.

But hey, to each their own. When I was younger, I used to get into heated debates about why I was right, now I don't really care. I'm either in a branch of my own and can do whatever I want, or working with someone and then I'll just copy whatever they do to not confuse them.


Unless the strategy is really bad, I'd prefer to go along with what everyone else does. When multiple people push their preferred optimum, the resulting inconsistency is clearly worse than a single suboptimal, but consistent, approach.


One never rebases shared code. They rebase their own work branch. Messing with history of master/main/integration branches should be blocked.

Rebase is a necessary part of a workflow even if you like merge commits. You're severely missing out if interactive rebases are not part of your toolbox.


I wouldn't consider rebasing your own local commits on top of a more recent remote master to be messing with history in any meaningful way, and that's the most useful method of rebasing.


I can give an example scenario.

Assuming "H" is the hash of the current state of the repository content, consider this initial state of the repository (most recent first):

    H(3) Implement feature B
    H(2) Implement feature A
    H(1) Initial commit
Now you implement "shiny feature", so your history in your branch looks like this:

    H(5) Shiny feature, improvements.
    H(4) Shiny feature, initial implementation.
    H(3) Implement feature B
    H(2) Implement feature A
    H(1) Initial commit
You tested H(4) and H(5), and everything looks good.

Then you `git pull --rebase`, and your history looks like this:

    H(10) Shiny feature, improvements.
    H(9) Shiny feature, initial implementation.
    H(8) Pulled commit C
    H(7) Pulled commit B
    H(6) Pulled commit A
    H(3) Implement feature B
    H(2) Implement feature A
    H(1) Initial commit
You test H(10) because it's the current state of your repo, looks good, and merge (or create PR, whatever).

With the usual pull request flows, `H(9)` (i.e. anything between your new "base" and your most recent commit) usually stays untested, entirely ignored by the developers, and you would only ever find out if you ever need to bisect.

Not usually a problem, unless you have a rule of "every commit should be verified/tested" and the untested commits have a change that doesn't prevent a build but still causes issues (e.g. something that's only visual, or a new config file was added to a "conf.d" directory and its presence changed some behavior, stuff like that).


To avoid this you can squash H(9) and H(10) before pushing to a shared branch, this way only one tested commit will be added on top of existing commits.


Rebasing unpushed commits is ok. But I have yet to see a workflow that provides good enough guardrails to make it something you can do safely.


Protect your main branch?


One of the great advantages of git is being able to pull from other people's feature branches, not just master. So protecting just master isn't good enough.


Yeah so you have them go through the workflow that doesn’t ruin things, like pull requests?


I don't want to have to go back and forth with someone to pull their branch. I want to just be able to pull anything they've pushed.


Isn't "protect your main branch" still the answer to this?

Your two feature branches would be unprotected so you can merge away if you like. When one of you wants to commit something to master, that's when you'd check for dodgy merges.

Also, "git cherry-pick" is a good alternative to merging for this use case.


Protecting the main branch is definitely a good practice, but the other potential hazard is:

- Having a developer on your team that rebases their own feature branch

- Then tries to "git push", only for it to be rejected since a force push is required

- Then performs a "git push --force", which will force-push all of their local branches, including feature branches from other developers that they may have checked out previously

Our team uses merges because they are safe from this kind of problem, although a rebase workflow would have cleaner history. I wish that "git push --force" would not push all branches by default, and just fail unless a (remote, branch) pair or --all is given.


> - Then performs a "git push --force", which will force-push all of their local branches, including feature branches from other developers that they may have checked out previously

This is (part of) why, for most common operations, I use a Git GUI (SourceTree). Force pushing all branches can only be done by very explicitly selecting them all and initiating a force push; the default when pushing is to push only the currently active branch.

It's also overall much clearer and more intuitive to use than the Git CLI. I use it when I have to—there are things that I can't do through SourceTree, and a few things that are complicated enough that I just want to be 100% sure I know exactly what's happening—but for 99% of the Git operations I do, it handles them perfectly and without any worry that I've mistyped something or forgotten to specify a branch.


> Isn't "protect your main branch" still the answer to this?

No, the feature branches need to be protected or something, to enforce that they only rebase locally and don't rebase the parts that I've merged into my branch (and vice versa).

> Also, "git cherry-pick" is a good alternative to merging for this use case.

No it isn't, it means you get multiple unrelated commits for the same change, which causes conflicts and can be disastrous if a commit is deliberately reverted.


I'll often just do a

   git reset --hard origin/branch-name


Right but that doesn't help if you've done your own work on top of their changes.


I think rebase is generally the correct approach here. If you've done your own work on top of their old changes, rebase your work on top of their new changes.


That's possible but it requires a bunch of manual tracking and results in wasted/duplicate effort with people resolving the same conflicts multiple times.


Just use:

> git pull --rebase


    git config --global pull.rebase=true


Right but assuming I have a branch that's diverged from theirs I have to do a fiddly git rebase --onto and likely resolve the same conflicts again.


This to me is a sign that some commits should be squashed, because it implies the same lines have changed multiple times in the commits that are ahead of the remote branch. It's worth doing the rebase interactively and squashing them up.


If you find yourself fixing the same rebase conflicts over and over again, because you for some reason need to work on conflicting changes simultaneously (which is of course best avoided for other reasons), use "git rerere".


I don't trust rerere, it can do major damage in cases like where a commit was reverted. And you still get multiple people solving the same conflicts, so even if each person only resolves each conflict once it's still wasteful compared to a merge workflow.


git pull --rebase --autostash


--force-with-lease

And only on working branches. I do this every single day.


Not good enough, that can mean you rebase changes that someone else has based further work on (but hasn't pushed it yet, or has pushed it to a different branch).


Why are you having people base their work off your in progress work? Git is not the issue with what you are describing.


> Why are you having people base their work off your in progress work?

To collaborate more closely and reduce (or get ahead of) conflicts. The whole point of using git at all is to be able to base your work off other people's in-progress work; if you're not interested in doing that then Subversion works better.


Rebase cannot destroy "weeks of work". No git command can delete commits. Unless you have some insane garbage collection policy that is very far from any defaults. This is your fault for not understanding your tools.


Didn't you guys have filesystem backups of a shared git repository?

This is exactly what backups are for.


I believe that the semantics of < > vs "" is actually compiler-dependent but on every compiler that matters, #including with angle brackets is the semantic for "the system header" whereas using quotes gives preference to files in your local source tree.

So for example if you #include <foo> then the compiler (actually the preprocessor, but whatever) looks in the system's standard location, whereas if you #include "foo" then it looks in the local tree.


I think that’s just the ordering though. “” will also end up searching the system paths, it will just check the local paths first.


You are right; a good explanation of the rules is in the C FAQ [1], which points to a newsgroup posting by Kaz Kylehu [2].

I am posting the summary here, although please do read the original if you have time:

The most portable thing to do is to use "" for including files within your project, and to use <> only for implementation supplied files.

(Disclosure: I was one of the contributors to the C FAQ).

[1] https://c-faq.com/cpp/inclkinds.html

[2] https://c-faq.com/cpp/inclk.kaz.html


What I find strange is that <> traditionally included system header files and "" included local files. They used different include paths, so you could have a header file in your sources with the same name as the system header file and then could control whether you are including one or the other based on using <> or "".

Anyway, I thought the distinction was lost in later compilers in favor of a single include path and then just taking the first file found when looking at potential matches through include path.

It seems the author of that merge thought the same thing. So, the distinction is actually still used by compilers?


With both GCC and Visual C++, the “” form first searches local paths and then system paths, while the <> form only searches system paths. Guess some BSDs are stricter about local paths.


> But the fact that a merge can have arbitrary changes in it always bothers me!

After that xy thing where they were trying to install a back door having changes that are hidden like this is a big red flag.

In fact changing include <something.h> to include "something.h" with a hidden commit like this isn't a red flag it's a big rotating alarm with a siren. Someones trying set things up to include malicious code via a faked system lib.


Sadly, not all of us can live in the tech equivalent of Bond films. There is only so many xz backdoors to go around.


There could be thousands of similar manchurian developers right now and it wouldn't even be a significant effort.


Until they're activated how are you going to know?


All but the first commit has parents. All commits point to the state of the file tree at that point.

A "merge commit" is nothing more than a commit claiming any number of parents greater than one. It is still its own file tree reference that decides how the tree looks, and nothing dictates that it should be related to the parents.


If that were true, then git log -p would have worked. The reality is that merge commits are treated differently from other commits by many parts of git. Saying that they are "just a commit with multiple parents" gives people the wrong impression.

Git is more than the data structure backing it. And many parts of git make all sorts of assumptions that treat things that are more or less identical in the data model as actually being different. Tags are not the same thing as branches, for example, even though they are stored in virtually the same way.


Well, yes - git log does have special handling of commits with multiple parents because everything it shows is a special cased lie. Why? Because commits do not contain diffs or patches, but are instead full snapshots of the repository as a whole at a point in time.

git log -p is a convenience tool that tries to show code progression, and so it comes up with these simple diffs. Showing a graph of N-way conflict resolutions would not help the user trying to get an overview. Other tools exist to track the origin of a particular change, such as git blame.

It is important to understand what a git commit actually is exactly because of the caveats of such convenience interpretations. Otherwise you'd have no idea where a conflict resolution is stored, and you'll run into the surprises mentioned here.

In my opinion, git also becomes a lot easier to work with once you understand the low-level details, as you realize which high-level tools are similar, compatible, or fit or unfit for a a specific purpose.


What git log shows is not "a lie", it is a part of its data model. Git is all of its commands, not just the low level details. Commits are both snapshots of the entire repo, and diffs, and delta compressions - none of these is "a lie".


> Commits are both snapshots of the entire repo, and diffs, and delta compressions - none of these is "a lie".

Commits are never diffs. Commits are snapshots, and sometimes git computes a diff between two commits. Commits are also never delta compressions, but can be stored within a delta-compressed packfile.

Whether you like it or not, git is primarily its low level details. The porcelain stacked on top changes, and differs depending on the user's client (e.g., a GUI using libgit2). However, that "git log -p" is "part of git" that git log -p is not trying to convince you that commits are diffs and show you a true chronicle. It instead assumes that you know what commits are, and that you are asking for an easy to read overview of what has been going on.

Accepting that commits are always solely snapshots will make the issues you run into when working with the porcelain easier to understand, especially when exposed to more than one client.

(Knowing about packfiles and delta compression can also be useful when looking into performance/resource utilization.)


*However, that "git log -p" is "part of git" does not change that git log -p is...


You are right that conceptually this is okay. But it is a UI problem that commands the author tried didn't manage to show the difference between the merge commit against any of its parents.


Technically you can have multiple first-commits in a Git repository. For example, Linux had 4 initial commits in 2017: https://www.destroyallsoftware.com/blog/2017/the-biggest-and...


Indeed, through commits with multiple parents (merges), you can end up having multiple orphan commits (initial commits).

Multiple initial commits are a bit rarer, usually stemming from merging in entirely different git repos with their own separate history as part of consolidation.


Wouldn’t you have the same amount of merge conflicts with rebase? Especially if you don’t do it often, which you frankly also should with merge?

I have to admit that I never really understood the advantages of rebase, and what I mean by this is they I actually don’t understand how the dangers of rebase out-weighs any form of advantages. Especially because on of the major advantages of merge is that you can squash your local commit history when you submit it to your main branch.

What we do is that we tie every pull request to a relatively small feature task, and because we do this, we genuinely don’t care about the individual commits developers do. Which means they can commit really silly messages if they are heading to a meeting or if they are just tired at the end of the day. It also helps with them merging main into their branch often, because it doesn’t taint the history.

The biggest advantage we’ve seen, that maybe we didn’t expect, is that nobody ever fucks up our tree in a way that needs someone who actually understands git to solve. We’ve also locked down the use of force push so that is not available to anyone unless it’s absolutely needed. Part of the reason I set this up initially was to protect myself from me, but it’s been a good thing since.

But I’m actually curious if it’s wrong.


> Especially because on of the major advantages of merge is that you can squash your local commit history when you submit it to your main branch.

Squashing is in no-way limited to merging and is actually done by doing an interactive rebase. Nothing is stopping you from squashing without creating a merge commit. It's entirely separate.

If you're squashing everything anyway, what does merging even give you? Is your main branch just:

* merge B

* squashed commit B

* merge A

* squashed commit A

If you didn't merge, you'd have:

* squashed commit B

* squashed commit A

> What we do is that we tie every pull request to a relatively small feature task, and because we do this, we genuinely don’t care about the individual commits developers do.

Except eventually there is a large feature task and then you end up with a giant commit that is annoying when git-bisecting.

But at the end of the day, these things only matter if you are spelunking through git history and/or using things like git bisect. If your git history is "write-only & rollback", then none of this stuff matters.


> advantages of merge is that you can squash your local commit history

No, it's the other way around. Squashing is a type of rebase.

Most workflows involve both. Merges can also be fast-forward merges, which are indistinguishable from rebases. Choosing between a rebase and a merge operation is often the wrong question to ask. The question is what state you wish the repository to end up in.

> I’m actually curious if it’s wrong

Look at "git log". It is readable and easy to understand? It is obvious why each commit was made, and why alternative solutions were turned down?

Are you able to use "git bisect" to track down problems?

Then you're doing it right. If not, think about what a functional commit log would look like and how you would get there. Working together is culture, and what type of merges you decide to use is just a tiny part of that culture.


But a rebased commit can also have arbitrary changes!

---

P.S. Any commit can have any change. Or no change.

A "commit" is a version...a message, a tree, some metadata, and 0 or more parents. In fact it's not even a change/diff/patchset per se. Though will often compare it against its assigned parents. If it has multiple parents, you'd have to choose which to compare. If it has zero parents, you can't compare against any parents.


Yes, except git log will show all the commits that got into the branch, while with merge you need git log -m otherwise there are invisible commits(and diffs) in a pretty common workflow. I don’t know why this is the default behaviour.

Git log only shows one tree not parallel trees from the merge.


? Not sure what you mean.

git log will show all ancestors.

And git diff shows any difference between two refs.

Nothing invisible unless you deliberately make it so.


Git log (and many other tools as well) pretend that merge commits do not introduce changes. I learned about it in the hard way when someone managed to implement an entirely new feature, contained within a hidden merge commit.

It's only partially the fault of Git - the entire idea of a merge requires new concepts like 3-way diff, which are not needed for rebased commits. I'm not even sure that most software like GitHub can display such a diff.


The blog post explains it pretty clearly: git log -p doesn't show the diff for those merge commits like it does for a normal commit.


The <> version searches the library path (usually /usr/include/* but can be modified with flags) whereas the "" searches the current working directory.


> I wont claim to understand C and the reason why <> is better than “”. I assume it is.

That one's obvious, you can type <> and you can't type “”.


    $ git show  d85c9944c55fb38f4eae149979a0f680ea125ecb  | wc -l
    11067
    $
From `man git-log`: "Note that unless one of --diff-merges variants (including short -m, -c, and --cc options) is explicitly given, merge commits will not show a diff, even if a diff format like --patch is selected, nor will they match search options like -S. The exception is when --first-parent is in use, in which case first-parent is the default format."

Presumably the author would have been happier using the -m-flag in addition to -p.


And that's one of the reasons why I advocate against merges in the codebase on every project I work for and in every HN thread where the topic of merges is mentioned.


That might reveal the depths of my ignorance of Git, but how do you manage moving changes from one branch to the other if you don't use merge? Edit: continuing to read the discussion, it seems it's rebase? I have some reading to do


rebase is git's swiss army chainsaw.

i use rebase frequently, but i never remember which direction the operation goes in. do you need to checkout the source branch or target branch? truly, it is unknowable. my workflow is to type `man git rebase` and hit space to page through the manual until the first ascii tree surgery diagram appears. then i stare at it until i remember that i need to have checked out my feature branch and am meant to type `git rebase main`. i have trained myself to read the man page every time, perform the operation correctly, then immediately forget.

https://git-scm.com/docs/git-rebase


That's why I always use git cherry-pick for specific commits that I want.

It's essentially a "cp thing-i-want ."

Combined with git reflog your repo becomes as understandable as a floppy disk.


I use "branch.autosetuprebase=local" and/or --set-upstream-to ; then I can just type "git rebase" while on the thing I want rebased and not have to think about it. (Useful in the gerrit workflow which kind of forces you to have stacks of rebased changes)


When people talk about avoiding merges, I think they mean this: https://trunkbaseddevelopment.com

The approach described on that site doesn't strictly rule out "git merge", but it emphasises short-lived branches and unidirectional commit flow. If you do things that way you find you just don't really need merges. The next step is to think "merges are rarely useful and sometimes dangerous, so let's just avoid them completely".


> Streaming small commits straight into the trunk

Gotta say, I find that horrifying. What about peer reviews? What about breaking up a change into smaller commits, none of which make sense until they're all together (changing the signature of a method, then changing the places that call that method, etc)?

It's worth noting that is mentions that work on a branch and the use of PRs are acceptable, but... the two statements appear to contradict each other. Why say "only do x" and "do thing that isn't x" on the same page?


I can’t find the quoted text in the parent post or article, but as someone who’s fought the short-lived branches crusade in the past (in favor of it) I’ll do my best to answer the criticism you raise :-)

> What about breaking up a change into smaller commits, none of which make sense until they're all together (changing the signature of a method, then changing the places that call that method, etc)?

The general-form solution to this is to make a three-part change: 1. add the new code, 2. migrate all the callers, 3. delete the old code in three separate PRs (or more, if migrating the callers takes several PRs, or some of the old code can be deleted earlier than the rest). I believe arbitrarily large changes can be made this way, and as your origination grows, eventually all large changes _have_ to be made this way.

Isn’t that a lot of extra work? IME it’s a lot less work (and risk!) than resolving a massive merge conflict.

The problems with long-lived branches all derive from the basic problem that eventually the complexity of maintaining two parallel implementations affects the work of everyone at the company.

New person joins before `Big_Refactoring` is merged? You’re either onboarding them twice into two branches, or they’re getting nothing done while they wait for the merge so they don’t have to learn a bunch of code that’s going away soon anyway.

Someone else wants to make a significant change? They either carefully patch each PR into both branches, or they decide “screw it” and do all their work in `Big_Refactoring` anyway, diverging the branches _even more_ and creating more risk in the ultimate merge and more of an incentive for others to start developing in `Big_Refactoring` and make the problem even worse. Soon the feature branch is a de facto main with failing tests while there’s incredible pressure to just ram the merge through so all these changes can go out.

The only way to make it work is to demand that only one person develop in `Big_Refactoring` and everybody else manually cherry-pick their changes into both branches (which quickly just means implementing them twice). IME everyone finds this so annoying that small branches, feature flags, and three-part changes (which makes code sharing between the old and new implementations much easier) become broadly preferred anyway.

As far as PRs, I can’t speak to the linked article, but everywhere I’ve worked that implemented short-lived branches still did PR review. But the PRs had to be small and quick to review (which IME actually helps catch bugs too)


> Isn’t that a lot of extra work? IME it’s a lot less work (and risk!) than resolving a massive merge conflict. > > The problems with long-lived branches all derive from the basic problem that eventually the complexity of maintaining two parallel implementations affects the work of everyone at the company.

This seems to imply that there's a choice between A) committing to the main branch, and B) long lived branches. I always work on branches, almost always with multiple commits, and then merge it into the main development branch once the feature is complete. I almost never have to deal with complicated merge conflicts.

That being said, you're talking about short lived branches. The article talked about committing directly in the main trunk/master branch; which is what horrified me.


Or use squash merges.


> 7764c864b and 0264866ce, right? I should be able to sync to those with git checkout and see which one dropped it, yeah? Well, I'll spare you the effort and just say that BOTH OF THEM have the old code in it.

When you make a merge commit, the merge commit contains all the changes. It's what happens when you fix a merge conflict. The fix for the conflict only exists in the merge commit. Similarly you can just add whatever you want in the merge, and it won't appear in any other commit.


> That actually installed "protobuf-24.4,1" which is some insane version number I'd never seen before. All of my other systems are all running 3.x.x type versions.

They obviously changed their version numbering convention at some point, and this is protobuf 2.4.something.

Honestly, I see the shit she tripped on all the time. It doesn't even register anymore.


Reminds me of https://github.com/protocolbuffers/protobuf/issues/1491 , which has effectively been WONTFIX (why does github not have this useful distinction?) because Google are happy with how it works and it's really difficult to make this particular thing work with the (also broken) Python module import system.


What distinction? Issues can have custom labels, and there is a default `wontfix` label


Seems like precisely the sort of thing ESR's new de-autotools tool is designed to eliminate. https://gitlab.com/esr/autodafe


This reminds me of a comment my new boss made, "you like learning on hard mode". He meant that instead of following doc to learn, I want to go find out how it works from first principles and then follow the docs, maybe improving them, based on what I saw from "beneath them" looking up.


I like to think of that as "actually learning"


If more people did this (across many different industries) life would likely be substantially better. However, humans always optimize for the wrong things.


That's just regular "learning". It's just that for some people it's a bit out of fashion.


Rather than the sed post-processing, the author could also have used -iquote for the place where protobuf is installed, which makes it findable by quoted includes.


Is this a common solution or documented in obvious places?

It wasn’t until I just read her article that I’d even considered some systems/distros doing weird things like rewriting C include syntax for questionable reasons.

What a terrible thing to deal with, simply frustrating.


It is documented in the gcc and clang documentation. It's also described in the gcc manpage, but not the clang one.

I don't know how "common" it is, but when dealing with third party code, I run across <> / "" confusion quite commonly, so when that's a significant part of your job, you'll probably stumble upon this flag eventually.


I don't want to victim blame too much, but this line stood out to me

> There's no "body" to this commit. It's just a "Merge:" and two other commits

Commits are snapshots of repository state, and merges "obviously" have differences from its parents. So not having "body" for a commit is bit nonsensical in git (yes, technically you can make empty commits but that's a special case). These sort of things are where having good mental model of git is useful.

As I have my share of hairy merges, it is pretty intuitive that merge commits can, and in many cases need to, have changes that are not part of either parent.

Maybe something like pijul (/darcs) would handle things differently here, but I believe that merges are fundamentally difficult problem.


Merges of course should have changes, but IMO they shouldn't have changes that are not resolving conflicts (either actual conflicts marked by git, or conflicts that manifest in failing tests, etc...). An entirely unrelated change stuffed into a merge commit is inappropriate.


> So not having "body" for a commit is bit nonsensical in git (yes, technically you can make empty commits but that's a special case). These sort of things are where having good mental model of git is useful.

This isn't a problem with the author or her mental model, it is a problem with `git log -p`. The output she is describing is exactly how merge commits show up there, with no other flags.


That's why reasonable projects limit their usage to the smallest reasonable scope, and use rebase/squash. You don't have to adopt a hard problem.

It actually escapes me why Linus decided to make git a merge-first VCS in the first place. There aren't many projects which are more linear in nature than Linux kernel.


Merging a branch is one of the "special cases" where an empty merge commit can be used. Git will use fast forward by default but if you want to preserve the history as a separate branch you can use an empty merge commit instead.


Piper and Protobufs lmao. Based on personal experience, the two services are best used as litmus tests of a person’s character — the most insufferable googlers I've met are fans of both. Any other usage of either generally results in undescribable frustration.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: