GitHub won because Git won. It was obvious by the late 00s that some DVCS was going to upend subversion (and more niche VCS like TFS). It ended up a two horse race between Git and Mercurial. GitHub bet on Git. Bitbucket bet on Mercurial.
Git took the early lead and never looked back. And GitHub's competitors were too slow to embrace Git. So GitHub dominated developer mindshare.
It seems strange now but there was a period of time during the late 00s and early 10s when developers were pretty passionate about their choice of DVCS.
Not just that. They invented "pull requests" and offered (initially minimal) code review tools. This made contributing in the open.much easier, and making small contributions, vastly easier.
Something like git had to take over svn / cvs / rcs. It could be Perforce, it could be BitKeeper which apparently pioneered the approach. But it had to be open-source, or at least free. Git won not just because it was technically superior; it also won because it was at the same time free software.
Pull requests predate Git. The kernel developers used them in the Bitkeeper days:
I exported this a patch and then imported onto a clone of Marcelo's
tree, so it appears as a single cset where the changes that got un-done
never happened. I've done some sanity tests on it, and will test it
some more tomorrow. Take a look at it and let me know if I missed
anything. When Andy is happy with it I'll leave it to him to re-issue a
pull request from Marcelo.
I do not know to what extent Bitkeeper had browser-based workflows. Moving cross-repository merges away from the command line may actually have been innovative, but of course of little interest to kernel developers.
That's interesting. I know BK had "pulls", but iirc it didn't have a "request-pull" command, so clearly the "pull" terminology came from BK and the "request" part came from how people talked about it in email.
I actually just shot a video showing how BitKeeper was used. I'll post that and a blog post on our GitButler blog soon.
Mercurial also supported pull requests. The unique thing about github was an easy central place to do them from and ensuring they didn't get lost. Once you have a github account you can fork a project make a change and pull request it in a few minutes. emailing a patch isn't hard, but with github you don't have to look up what address to email it to, if you just say open pull requests it typically goes to the right place the first time.
I remember we used a tool, I think it was Gerrit, before I'd heard of GitHub or Pull Requests. It worked with patches which is also how we used to share code, through email with patches. GitHub won because it had a cleaner UI and a likable name.
Git also massively benefitted from GitHub. Do you know a single person who even knows you can use git without a "forge" like GitHub, let alone knows how to or actually does it?
It's hard to remember but there was a time when git was resisted. When I first started to use it, a lot of people were saying you don't need it, you only want to use it because it's hipster and the kernel uses it, but you're not the kernel etc. It's exactly the same as k8s is all these years later (the tide seems finally turning on k8s, though).
Without GitHub (or something else), git would have remained a weird kernel thing. But, equally, without git GitHub would have had no raison d'être. It's a symbiotic relationship. GitHub completely the picture and together they won.
I taught my research group Git version control in college. It was part of a "new student/researcher onboarding" series that we put all the new grad students and undergrads through. But we were in Radio Astronomy, so there was a lot of data processing and modeling stuff that required being comfortable within a remote ssh session and the basics of Linux/bash/python. I know it was already being used in Radio Astronomy (at least in the sub-field of Pulsar Astronomy) at the time and was part of the reason I didn't get pushback when I proposed making sure our group was trained up on using it.
We switched to Git as a whole in early 2009 since it was already a better experience than SVN at the time. Could be off by a year or two, given how long ago this was and the fact that I was working with the group through the end of high school in 2007-2008.
We only added GitHub to our training later in 2011-2013 era, but we ran our own bare git repos on our department servers until then. And students/groups were responsible for setting up their own repos for their research projects (with assistance/guidance to ensure security on the server).
Last job also made use of our own internal bare repos, admittedly mirrors of our private GH projects, and our stack pulled from that mirror to ensure we always had an instance that was not dependent on an external vendor.
Current role also makes use of bare git repos for similar reasons.
I think the knowledge is there and plenty people do it, it's just not news/blog worthy anymore. It's not new or groundbreaking so it gets little attention.
> Git took the early lead and never looked back. And GitHub's competitors were too slow to embrace Git. So GitHub dominated developer mindshare.
And Mercurial spent an enormous amount of effort going after Windows users and basically got absolutely nothing for it.
In my opinion, this was what really hurt Mercurial. Nobody in Windows-land was going to use anything other than the official Microsoft garbage. Consequently, every ounce of effort spent on Windows was effort completely wasted that could have been spent competing with Git/Github.
Sorry, but Git won because Github won. Lots of people loved (and still use) Mercurial. It lacked the network effect because Github didn't support it.
> GitHub bet on Git. Bitbucket bet on Mercurial.
Bitbucket didn't lose because of Mercurial. They lost because Github had a better product (in terms of sharing code, etc). It also was neglected by Atlassian in 2010.
> It seems strange now but there was a period of time during the late 00s and early 10s when developers were pretty passionate about their choice of DVCS.
Sorry buddy, but there are still plenty of us Mercurial users. Maybe, just maybe, even dozens!
Me, I picked Python and Mercurial for primary language and DVCS, respectively: one of those worked out really well. I still miss hg and have never really gotten the hang of git.
Regarding Mercurial, would you happen to have recommendations for a GitHub/Bitbucket-like service that still works with hg?
As someone who used both git and hg, I must say I'm sorry git won. Its chrome sucks (though less than it did) and the naming is confusing as hell. Still, if everyone uses git, and you have to use BitBucket for hosting instead of GitHub/Lab... Nah, not worth it. Kudos to you for sticking with it!
I've only recently started to use mercurial in earnest for one project (legacy reasons). It's branches for me too. At least considering my experience with it is limited.
I don't like how every time you pull someone else's changes you end by default in a state that is probably similar with git's detached head. With git, most of the time you are on a named branch, you know where you are and you pull/push stuff out of said named branch. With mercurial some of the branches are unnamed and it's still confusing why I'd want that. Perhaps the original designers didn't like having private local-only named branches, I don't know.
This may just be an artefact of my very limited experience with hg though.
When not sharing with others, bookmarks is the way to go - not branches. Mercurial bookmarks act more like git's branches. I think they've now made it so that you can share them too, but since no one else at work uses mercurial, I don't have experience with distributed bookmarks.
Have you ever checked out code directly from a colleague's machine? GitHub is very central-looking from where I'm standing, and the differences between Git and SVN are very academic and does not really apply in practice any more.
GitHub allowing forks of repo's to request PRs between one another is probably the only DVCS thing about all this. But this model does not apply to orgs hosting their proprietary code on GH, where the developers don't have their own forks of their employer's code repos. I'm pretty sure it would have been possible to replicate pull requests with SVN on GitHub in some alternative reality.
Git is still a DVCS, even today it's not being used in the way it was designed to be used by Linus and co.
The key distinguishing characteristic is the fact that every git checkout contains the full repo history and metadata. This means a consistent network connection to the master server isn't necessary. In fact, it means that the concept of a "master server" itself isn't necessary. With Git, you only need to connect to other servers when you pull down changes or when you want to push them back up to the remote repository. You can happily commit, branch, revert, check out older revisions, etc. on just your local checkout without needing to care about what's going on with the remote server. Even if you treat your remote repo on GitHub as your "master", it's still a far cry from the way that centralized VCS works.
If you've never worked with true centralized VCS, it's easy to take this for granted. Working offline with a system like Perforce or SVN is technically possible but considerably more involved, and most people avoid doing it because it puts you far off of the beaten path of how those systems are typically used. It basically involves you having to run a local server for a while, and then later painfully merging/reconciling your changes with the master. It's far more tedious than doing the equivalent work in Git.
Now, it's important to note that Git's notion of "every checkout contains all the repo data" doesn't work well if the repo contents become too large. It's for that reason that things like sparse checkouts, git-lfs, and VFS for Git exist. These sorts of extensions do turn Git into something of a hybrid VCS system, in between a true centralized and a true decentralized system.
If you want to understand more, here's a great tech talk by Linus himself from 2007. It's of note because in 2007 DVCS was very new on the scene, and basically everyone at the time was using centralized VCS like SVN, CVS, Perforce, ClearCase, etc.
Having been on call 24/7 for production services, I found "git log" absolutely essential, almost more so than the latest code. We were usually expected to roll back rather than take additional risk fixing forward on outages, so the question was "roll back to what?"
Git is a DVCS though. Just because GitHub exists does not exclude Git from the category of a DVCS. You get a local copy of the entire history with Git which is what pushes it into that category, nothing to do with GitHub. SVN is centralized in the sense that you are not grabbing the entire copy of the repo locally. Not academic differences.
It's been a hot minute since I've used SVN at work, but in my last job where it was SVN, each dev checked out the entire repository locally. Even though you /could/ check out a section of the repo, it made no sense to do that, because you need the entire codebase to run locally. Branching was still a mess though, and Git has really innovated in this space. We used to all dev on `develop` branch, and we'd daily pull from the server, fix merge conflicts locally, and then push up to the server. On releases our lead dev would merge dev with master and run off a build.
I still maintain the differences are academic, because even though Git is a DVCS (and I agree it is), and it is possible to use it as a DVCS. But given that GitHub is the defacto standard, and everyone uses it for work and OSS, I posit we are actually using Git as a CVCS, and any argument about Git being better than SVN because it's a DCVS is moot because nobody is using Git's distributed features anyway.
I think we are missing something here, would like to be corrected if wrong.
Git is a DVCS because when you clone/pull a repo it includes the entire working history of the repo. Thats why its distributed, you could pull a repo from somewhere and never need to touch that source again. Has very little to do with Github.
SVN which I have not used in recent history but historically your local copy does not include the full change history of that repo and relied on a SVN server for that information.
I actually don't quite follow your arguments because while yes, we tend to setup Git so that is "centralized" the distinction is not about Github but that your local working copy is everything.
So much has gotten better thanks to distributed VCS that I think this perspective is a bit like a fish in water.
Every commit is identified by a globally unique content-addressable hash instead of a locally unique or centrally managed revision number. This means two people on opposite sides of the globe can work on the same project with no risk that they will think the same revision identifies different code, nor that they must coordinate with a distant server constantly to ensure consistency.
Moreover, core VCS operations like committing and branching require no server interaction at all. Server interaction is a choice that happens when synchronization is desired, not a mandatory part of every VCS command. "Commit early and commit often" could never happen with CVS or SVN on a large or geographically distributed team. And, of course, you can continue working on a cloned Git repo even if the server goes down.
Finally, forking a repository is still common even in the age of GitHub dominance. In fact, GitHub natively understands Git's internal graph structure, and makes forking and pulling across forks pretty painless. Yes, those forks may all be hosted on GitHub, but there can be far more dynamic collaboration between forks than was ever possible on say SourceForge.
So sure, everybody working on the same code may have the same GitHub repository as their origin more often than not, but we are still miles ahead of the world of non-DVCS.
It's probably worth noting too that even the canonical example of Git in practice, Linux, is essentially "centralized" in the same way. Linus Torvalds's clone is "the" Linux kernel, any clone that differs from his is either not up-to-date or intentionally divergent and thus unofficial. A lot of work gets merged first in other people's clones (with some hierarchy) but Linux also has tens of thousands of contributors compared to the average Git repository's handful or less.
Git is a DVCS, and GitHub uses Git, so it's a DVCS because I can clone locally. GitHub is a central location, that's true, but I can still have my local clones, and I can host my own forks locally, anywhere I want, at a GitHub competitor, under multiple GitHub orgs / users, whatever. So, yes, it's a DVCS.
Yes, article seems to miss this. I believe (at the time, and still) that git won because the cost to host the server side of it is orders of magnitude lower than the competitors (svn, perforce, etc). All those other revision control systems ended up with a big server cost that couldn't justify a free hosting service. Plus git provided a reasonable (but still not great) solution to "decentralized development", which none of the others attempted to do.
I'm curious how you come to this conclusion. GitHub has always had fairly insane hosting problem sets. When someone clones the Linux repo, that's like 5G in one go. The full clone issues and the problems of a few edge case repos create sometimes crazy hosting costs and scaling problems. Most centralized systems only have to deal with one working tree or one delta at a time. There is not much that goes over the wire in centralized systems in general, comparatively.
Multiple other distributed version control systems in the 2000s had support for easy hosting. Darcs was actually the best in this era, IMO, because it was far simpler than both Hg and Git -- a Darcs repository was just a directory, and it supported HTTP as the primary pull/patch sharing mechanic. So, you could just put any repository in any public directory on a web server and pull over HTTP. Done. This was working back in like 2006 as the primary method of use.
In any case, the premise is still wrong because as mentioned elsewhere, the distribution of repository sizes and their compute requirements are not smooth or homogonous. The cost of hosting one popular mirror of the Linux kernel, or a project like Rails, for 1 year is equivalent to hosting 10,000 small projects for 100 years, in either SVN or Git. The whole comparison is flawed unless this dynamic is taken into account. GitHub in 2024 still has to carve out special restrictions and exemptions for certain repositories because of this (the Chromium mirror for example gets extended size limits other repos can't have.)
Git also lacked a lot of techniques to improve clones or repo sizes of big repos until fairly late in its life (shallow + partial clones) because 99% of the time their answer was "make more repositories", and the data model still just falls over fast once you start throwing nearly any raw binary data in a repository at any reasonable clip (not GiB, low hundreds of MiB, and it doesn't become totally unusable but degrades pretty badly). This is why "Git is really fast" is a bit of a loaded statement. It's very fast, at some specific things. It's rather slow and inefficient at several others.
Why didn't mercurial win then? There were almost a dozen other distributed version control systems built in those early days, most of which I cannot remember but all had the same distributed ideas behind them and should be been as easy to host (some easier).
At my university, performance. The CS department was clued into Linux development but also the Haskell world so darcs use among students was high. Our underpowered lab machines and personal devices struggled with darcs for reasons I no longer remembered and a group of us made use of mercurial for an OS project and had a rough go of it as the patch sets got more and more convoluted. Back in those days the core was C but a lot of the logic was Python which struggled on the memory constrained devices available. Some one of us learned about git trying to get into Linux kernel work, told the rest of us and it was just comically fast, is my memory. I spent a tedious weekend converting all my projects to git and never looked back, myself.
Some years later Facebook did a lot of work to improve the speed of mercurial but the ship had sailed. Interesting idea though.
Git took the early lead and never looked back. And GitHub's competitors were too slow to embrace Git. So GitHub dominated developer mindshare.
It seems strange now but there was a period of time during the late 00s and early 10s when developers were pretty passionate about their choice of DVCS.