Nothing in the "plumbing" defines what merge/diff algorithm is used or is even needed. You can pick any of the merge strategies, or plug in any diff utility.
The plumbing cares only about one thing, that you have a valid data structure, a data structure that among many things tracks "what was/were the previous state/s before this commit?" as a pointer to the previous commit/s, nothing more. I can construct a git tree by hand / shell script just piping in full text/binary files, and git's porcelain will happily spit out "diffs" of the structure.
Your porcelain on top of that decides what to do with that information, standard builds admittedly just provide diff tooling that focuses on , more expansive UIs can diff some binary data (e.g. GitHub shows diffs of images using a UI more suited for that data, with side by side, swiping, or fading to allow someone to examine the changes)
> The assumption that you are storing text files is very much built in to the design of git
I haven't dived deep in a while to git's source code, but the last time I did, this argument would only really hold true for packfiles / where delta encoding is used to compact the .git directory's contents as that does indeed seem tuned for generic text content instead of binary data.
That isn't related to the diff/merge functionality of git though; that is a feature used to reduce disk space by reducing loose objects (full copies of the previous versions of files), trading increased compute time (rebuilding a previous version of a file from a set of diffs) for reduced disk space.
Would you agree that keeping track of the ancestor(s) of a commit, and the fact that there can be multiple ancestors, is central to git's design? If so, why is that the case? Why is keeping track of ancestry, and the ability to have multiple ancestors, central to the design?
It's so you can do diff/merge. So yes, diff/merge is not directly part of the core. But the core is designed specifically to support diff/merge. That's the whole point. That's the reason the core data structures of git are designed the way they are. It's also the reason that git comes with a diff/merge algorithm built-in, and 99.999% of the people who use git use the default diff/merge. And the default diff/merge assumes text.
So yes, you can use git to store binary files, and you can even modify it to do something sane with them without ripping out everything and starting over from scratch. But that's not what git was originally designed for. That the design is adaptable to other use cases is a testament to Linus's foresight, but it does not change the fact that the assumption of text is woven deeply into the design. Dividing up the design into "plumbing" and "porcelain" does not change this. This part of the porcelain is as much a part of the design as the plumbing is.
The plumbing cares only about one thing, that you have a valid data structure, a data structure that among many things tracks "what was/were the previous state/s before this commit?" as a pointer to the previous commit/s, nothing more. I can construct a git tree by hand / shell script just piping in full text/binary files, and git's porcelain will happily spit out "diffs" of the structure.
Your porcelain on top of that decides what to do with that information, standard builds admittedly just provide diff tooling that focuses on , more expansive UIs can diff some binary data (e.g. GitHub shows diffs of images using a UI more suited for that data, with side by side, swiping, or fading to allow someone to examine the changes)
> The assumption that you are storing text files is very much built in to the design of git
I haven't dived deep in a while to git's source code, but the last time I did, this argument would only really hold true for packfiles / where delta encoding is used to compact the .git directory's contents as that does indeed seem tuned for generic text content instead of binary data.
That isn't related to the diff/merge functionality of git though; that is a feature used to reduce disk space by reducing loose objects (full copies of the previous versions of files), trading increased compute time (rebuilding a previous version of a file from a set of diffs) for reduced disk space.