Why Git Ain’t Better Than X

cookiecaper · on March 27, 2010

The author claims that a 10x speed differential is inconsequential. I don't agree. Note that in his sampling, things were already taking 2-3 seconds on a relatively small repository. What happens when you get to the size where a git diff takes 2-3 seconds? Does bzr take 20-30? Surely that's not acceptable.

I understand that people can go overboard on the optimization, and can ignore more important things because of misleading or unimportant benchmarks, but there is something to be said for speed when it's taken on balance.

This article didn't point out any reason I would trade git for the much slower Bazaar, just said that Bazaar is not as bad as it looks on the Git is Better Than X website. That's not really a very convincing argument if I'm already using git.

astine · on March 27, 2010

On small projects like many of the ones on Github, the difference in speed doesn't really make a difference. But when you get to really large projects, like the ones git was originally made for, it really does. Not everyone works on huge projects, but some of us do. The question you need to ask yourself is: "Is this project going to be large enough for it to matter?" That's hard to say.

skybrian · on March 27, 2010

It's amusing to hear the common myth that the Linux kernel is a huge project. No, a huge project is something like Gentoo or Ubuntu. Try to put all the source code for every package in a Linux distribution into a single Git repository and it (obviously) will fall over, because it would be tough to fit that on one machine. So if you're using Git, the best practice is "don't do that, then."

But I work at a large company where all the source code is in a single repostory. Being able to do a partial checkout (your package and its dependencies, on the head branch only) is essential for us and it's not a workflow Git supports at all.

I do agree, however, that Git is fast enough for any open source project out there. It's just not scalable to large organizations.

js2 · on March 27, 2010

> But I work at a large company where all the source code is in a single repostory.

Sure, Google does this with perforce.

You are correct, that if you desire a single truly massive repository, you will eventually have trouble with git. However:

> It's just not scalable to large organizations.

Of course it is. The solution is to have multiple repositories and then use something on top of git to tie the repositories together as needed. For example, this is how the android project is being managed:

http://source.android.com/download/using-repo

jedbrown · on March 27, 2010

I've heard of this scenario before, but don't understand this situation. What possible technical reason would justify a billion lines of code needing to be in one repository? If it's a matter of history, perhaps we just need a better import tool.

sorbits · on March 27, 2010

A company with a couple of applications will likely have a dozen or more shared libraries or other resources.

You could create a repository for each resource, but resources are merged and split over time, a change in a resource may require application changes as well, etc.

All in all it is often just simpler to make a nice directory structure and treat it as one repository rather than deal with dozens of repositories.

I have sort of this problem myself, I have 20 or so shared frameworks in my “big repository” where I would like to release a few of them as F/OSS, but I am using Git and it is too much of a hassle for me needing to then put these frameworks in their own repositories.

zb · on March 28, 2010

Git submodules pretty much solve this problem in many cases.

sorbits · on March 28, 2010

But not in the case explained.

Git submodules not only are still separate repositories (making merging and splitting the shared resources problematic) they also require maintenance in the repository they are added to, so in practice a symbolic link is much simpler (as you do not need to update that link each time the submodule is updated).

dlsspy · on March 27, 2010

git does have the ability to do partial history clones and partial repository checkouts.

However, you can surely see that the goals of requiring the code to be in a single repository but requiring the ability to only access part of it at a time are conflicting.

There are numerous ways to assemble a product out of lots of subprojects. It would be ridiculous for something like ubuntu to try to put everything into a single repository when all of the common usage patterns have the users working as if they were independent (and often even under different administrative domains).

skybrian · on March 28, 2010

In Ubuntu it wouldn't make sense because they're taking most of the code from upstream anyway and it changes slowly; I was just using it to give a sense of scale.

The scenario where a single repository becomes handy is when you want to change a method in a shared library fix all the callers in the same commit, and you can do that because the code isn't public and you know all the callers. (It's much the same reason Linux prefers drivers to be in-tree.)

But I'm not sure I entirely want to defend this model; just pointing out that it exists, and the people who like it aren't going to change how the whole company works just to adopt git.

ableal · on March 27, 2010

Being able to do a partial checkout (your package and its dependencies, on the head branch only) is essential for us and it's not a workflow Git supports at all.

Collecting check-list points ;-). Thank you, this is a very good one. Anyone know off the cuff about Bazaar and Hg ?

(P.S. Are you one of those Perforce users with large amounts of binary data ? Those use cases being ignored is giving me pause about the whole DVCS hoopla, e.g. here http://news.ycombinator.com/item?id=1219082 )

stevelosh · on March 27, 2010

> This article didn't point out any reason I would trade git for the much slower Bazaar, just said that Bazaar is not as bad as it looks on the Git is Better Than X website. That's not really a very convincing argument if I'm already using git.

I'm guessing that's why the article is titled "Why Git Ain't Better Than X" and not "Why Bazaar is Better Than Git"...

ableal · on March 27, 2010

A couple of weeks ago (apropos http://news.ycombinator.com/item?id=1180434) I looked up the 'advertising' at http://doc.bazaar.canonical.com/migration/en/why-switch-to-b...

For v2.0+, they quote 1.0 sec commits on the Firefox 3.5 repo test (vs. 1.1 for hg, 0.35 for git).

[Pointers to binary data performance figures (e.g. repo size deltas vs. SVN, etc.) gratefully accepted]

cookiecaper · on March 27, 2010

Well, it doesn't show that Git isn't better, it just shows that Bazaar isn't as bad as it seems on that site. Probably just because of updates to bzr since it was made.

mrinterweb · on March 27, 2010

Personally, I am a big fan of Git's staging feature. I think Git takes the right approach here. I Love having the ability to commit only certain things, and yes, (in cases) only certain changes in a file. This allows for very granular commits if you are trying to keep good history/notes in your commit messages. This also allows you to easily commit changes specific to what you are trying to accomplish with the commit. I prefer small commits with comments specific to the changes rather than "well its working, time to commit everything". I'm not saying you can't do this with other dvcs, but I like Git's default workflow for this.

stevelosh · on March 27, 2010

Mecurial's MQ extension is basically git's index on steroids.

With a single MQ patch you basically have git's index. You can continually add things to the patch and then finalize it into a commit when ready.

With multiple patches you have multiple indexes which can be pushed and popped, reordered, folded together, finalized into commits, etc.

To be fair, MQ's user interface needs some work. It's far from perfect.

jedbrown · on March 27, 2010

Hmm, I would have said that short-term branches with `rebase -i` are MQ on steroids. I always found MQ to be an odd beast, introducing a new set of commands that are far from orthogonal. Now that bookmarks are reasonably well-supported, albeit with somewhat surprising semantics and certain limitations (assuming you are used to git branches), I only use MQ as an analogue for `git stash`.

koenigdavidmj · on March 27, 2010

And if you don't like it, the Record extension ( http://mercurial.selenic.com/wiki/RecordExtension ) does most of what Git people would want, although it doesn't yet have the ability to split a modification into modifications of smaller granularity.

nsm · on March 28, 2010

try crecord ( http://mercurial.selenic.com/wiki/CrecordExtension ) for granularity

micampe · on March 27, 2010

> So I don’t need to mentally deal with repositories vs branches, I just store my large projects in a shared repo, and it’s all good.

And here I was thinking he was going to explain why it's better to have them separate. If I must decide if I need a repository before creating branches, to me it means exactly I do have to deal with repositories vs branches.

Deestan · on March 27, 2010

> I haven’t used Mercurial, but this page indicates that the basic “clone” command does a full expensive branch, while the “branch” command does a cheap local branch. So this argument only applies to non-distributed VCSes.

I don't know how branches work in Bazaar, but Git branches are cheaper than Mercurial's when you factor in administrative overhead.

Local Mercurial branches aren't disposable, like Git's are. In Git, I often whip up and subsequently dismiss 3-4 branches when tidying up my repository history (for example when I committed something in the wrong place, committed with wrong log message, etc...). Mercurial branches live forever in the repository graph and I have to explicitly "close" them, while in Git I just forget about them and the repository does too.

durin42 · on March 27, 2010

Git's branches aren't _branches_ per se. They're movable references to heads, and there's no indication of a commit having been done on a branch once it's merged and the ref deleted.

If you want that workflow in Mercurial, use bookmarks. Some work is left to make bookmarks pushable, but it's going to be done, as many people want that feature.

dlsspy · on March 27, 2010

Don't delude yourself that there's anything correct about storing the name of a branch in a commit, or that that's what makes up a proper branch.

When I receive a changeset in mercurial and see that it was made on a branch called "dev", that means nothing to me. When I get changes from several locations and they all say it was made on "dev", that helps me even less.

If it's actually important to you to know that a change was done on a branch called "dev" (and I doubt it ever will be), it's fairly easy to know just by examining the merge commit that brought it in.

In practice, I'd say it's just confusing. I wrote "hg log -b" initially for viewing logs only within a branch. As it turns out, you have to be really careful to name your branches since they can be reused (or worse as shown above) and it's very difficult to figure out whether a change occurred on a given branch or a different branch that had the same name.

gecko · on March 27, 2010

Mercurial actually has lightweight branches like Git; they're just anonymous, as in Monotone, not named like Git. There's an extension for Mercurial, called bookmarks, that allows you to name your anonymous heads, giving you Git-equivalent functionality.

You're correct that Mercurial's (IMHO badly-named "branch" command) is much heavier-weight than Git branching. I consider them closer to changeset labels than branches.

prog · on March 27, 2010

bzr has the bzr-colo plugin to support co-located branches. Like I said in another comment, I think bzr, git and hg are comparable and there isn't a clear winner (and there need not be).

https://launchpad.net/bzr-colo

jobenjo · on March 27, 2010

We've been using bazaar at Fluther for a while and I have to agree with the author's sentiment for the most part.

The whole "Git is better" thing annoys me. Dvcs _are_ better, but I think it's mostly a wash between bzr, hg, and git. Yes, bzr is slower here and there (used to be annoying, now it's fast enough that it's a basically a non-issue). The parts I like more than git (though I'm no git pro) is the very flexible work-flow options (we use multiple, and they're awesome), the excellent merge algorithm, and the directory/branch structure, to name a few.

Git is great. So it bazaar. It's all this machismo that bugs me. We should really be on the same side trumpeting why dvcs are better.

prog · on March 27, 2010

I have to agree.

IMO bzr, git and hg are comparable and these dabates are no different emacs vs vim.

My personal preference is bzr but I quite like hg also. I also like git but the only thing that keeps me away from git is the whole fanboy culture around it.

xtho · on March 27, 2010

I personally don't care if X is 1ms faster than Y, I wish though the developer of X and Y would agree on common command-line arguments and workflow (and hide the details from me) so that I wouldn't have to write my own wrapper.

dlsspy · on March 27, 2010

I gave a long detailed response to this over on reddit:

http://www.reddit.com/r/programming/comments/biv72/why_git_a...

loup-vaillant · on March 27, 2010

He doesn't mention Darcs. Is this system so irrelevant? That would be too bad, I love relatively independence of it's patches. (Plus, the UI is really good at reminding you what you could have forgotten.)

durin42 · on March 27, 2010

In my experience it doesn't take too long to find a pair of patches that commute which actually let you checkout a broken tree state. It's nice in theory, but in practice the DAG properties of the "mainline" DVCSes are very useful so that you can have guarantees about the non-brokenness of every point in the tree.

I think there's a lot of good to be had for commuting patches during code review, before they're final, but once finalized, they make the most sense in a DAG.

applicative · on March 27, 2010

Isn't it obvious that two patches in, say, different files might repair the same bug, but in incompatible ways? --It's the nature of programming, everything is connected. But they won't depend on each other from the point of view of the VCS, of course. If I pull and apply both, things will go wrong. Isn't this possible in all systems? I can always apply my way into a broken system.

When I have things right, and it compiles and I'm ready, I `darcs tag` -- for example. There are many ways of referring to patch-piles as adequate to compile together.

Though it's clear that `darcs` isn't good for all purposes, at least in its present form -- I don't think anyone says it is -- it is excellent for many; certainly almost everything up on GitHub would do better with `darcs`.

The most important advantage of it, it seems to me, is the complete transparency of the process and the underlying model; it is completely missing the black box aspect of something like `git`. People rightly praise many of the features of `git`, which I use, with amazement, in connection with some projects I help with. But I feel that all the enthusiasm for it is a bit like enthusiasm for something like Word, or some similar monster GUI program, on the part of people who have no idea what it's doing, how it goes about it; and all `git` tutorials are like tutorials for people about how to use Word. Learning new combinations of commands, is like learning about esoteric drop down panels in Word. --Though I suppose one can't blame `git` for the incredible tiresomeness of its enthusiasts.

loup-vaillant · on March 29, 2010

"`darcs` isn't good for all purposes"

I dont understand. What Darcs isn't for that Git is for? And for what usage specifically Darcs is better? I like Darcs, mind you. I like it better than Git. However, I don't have enough experience with either to really make an informed choice, let alone a choice depending on intended usage.

chmike · on March 27, 2010

I'm about to adopt DVCS but can't decide between git, hg, bzr or whatever. These flame wars are paralyzing because they look like I have to pick a religion.

s_tec · on March 27, 2010

DVCS folks like to say that the real enemies are the people still using Subversion, rather than the competing DVCS tools. What they fail to realize is that DVCS fragmentation a big part of the reason people like me continue using Subversion. If there were a clear DVCS winner, I would have switched years ago.

Fortunately, it looks like git is starting to take the lead in this fight. As a result, I've started moving some of my projects over. From what I've seen, git has the best internals. The absence of explicit deltas in the core database is a particularly brilliant move on git's part. In the long run, I hope that wins out over bzr and hg's slightly better user experience. After all, it's much easier to fix user interface problems than to change a tool's core architecture.

pieter · on March 27, 2010

Good to hear Bazaar speeds have improved. When I tried using it, it was painfully slow, taking 20-30 seconds for a simple 'bzr log'.

prog · on March 27, 2010

> When I tried using it, it was painfully slow, taking 20-30 seconds for a simple 'bzr log'.

When did you use it? On what history size? bzr 2.x series is quite fast.

dlsspy · on March 27, 2010

Here's the difference between

bzr log on my clone of emacs:

    23.040u 1.518s 0:25.31 96.9%	0+0k 0+73io 0pf+0w

git log on my clone of emacs:

    2.197u 0.226s 0:02.89 83.3%	0+0k 36+14io 0pf+0w

prog · on March 28, 2010

What was the version of bzr and git? Was this windows or unix? I suspect its unix only. It would be good to see the numbers on both platforms. Is this a limited rev history or the entire history? You also don't mentioned the exact commands you run. Do you use n=0 or n=1 with bzr? I would normally use n=1 as n=1 is not of much interest. Such limited microbenchmarks merely spread FUD.

Also, while choosing a DVCS, the common operations have to be fast not all operations. So the interesting benchmark would provide numbers for all the common operation. Just running one operation can be misleading. I do much more with my DVCS than just run the log operation :-)

Also, I don't see myself doing a log of a full 20+ year history multiple times during the day (In fact I have never done that). The commands I use normally, are fast enough large repos with bzr (emacs included).

dlsspy · on March 28, 2010

You know, I had bzr 1.17 or similar. I upgraded to 2.1 and got the same time (slightly over 25 seconds). git version is 1.7.0.3. Both on my mac (I don't do Windows). In both cases, it was whatever "<cmd> clone <url>" does by default. So pretty much, all defaults all around.

I mentioned this because someone else was saying it was as fast. I don't know anything about the n= things... I just ran "bzr log > /dev/null" and "git log > /dev/null" (my shell automatically times all commands).

I tend to use and search logs a lot in my projects. With 2s to generate a log output, you won't think about the cost of looking through the whole thing for something (I did similar in an active hg project earlier today).

I do get your point. It's not a... holistic benchmark. In another thread, someone called fowl on the article discussing size vs. bytes transferred. I thought it'd be better to think of that as time. Cloning roughly the same content via bzr and git (memcached master branch -- which is cloned at lp and contains just about the same history, though is somehow smaller at lp) takes a bit over 20 seconds and a bit under 4 seconds respectively.

I suppose it's valid to argue that you don't clone much, either. At some point, one would wonder whether you avoid certain things because they're slow.

juliusdavies · on March 27, 2010

X is a windowing system. It makes no sense to compare Git to X.

gojomo · on March 27, 2010

whygitisbetterthanxIsBetterThanwhy-git-aint-better-than-x:

- more readable font sizes and font-vs-background colors

- better use of whitespace

- supporting graphics

- minimal, tasteful use of color/shading