Is Git and GitHub a DVCS though? Have you ever checked out code directly from a ...

snovymgodym · 2024-09-10T14:55:39 1725980139

Git is still a DVCS, even today it's not being used in the way it was designed to be used by Linus and co.

The key distinguishing characteristic is the fact that every git checkout contains the full repo history and metadata. This means a consistent network connection to the master server isn't necessary. In fact, it means that the concept of a "master server" itself isn't necessary. With Git, you only need to connect to other servers when you pull down changes or when you want to push them back up to the remote repository. You can happily commit, branch, revert, check out older revisions, etc. on just your local checkout without needing to care about what's going on with the remote server. Even if you treat your remote repo on GitHub as your "master", it's still a far cry from the way that centralized VCS works.

If you've never worked with true centralized VCS, it's easy to take this for granted. Working offline with a system like Perforce or SVN is technically possible but considerably more involved, and most people avoid doing it because it puts you far off of the beaten path of how those systems are typically used. It basically involves you having to run a local server for a while, and then later painfully merging/reconciling your changes with the master. It's far more tedious than doing the equivalent work in Git.

Now, it's important to note that Git's notion of "every checkout contains all the repo data" doesn't work well if the repo contents become too large. It's for that reason that things like sparse checkouts, git-lfs, and VFS for Git exist. These sorts of extensions do turn Git into something of a hybrid VCS system, in between a true centralized and a true decentralized system.

If you want to understand more, here's a great tech talk by Linus himself from 2007. It's of note because in 2007 DVCS was very new on the scene, and basically everyone at the time was using centralized VCS like SVN, CVS, Perforce, ClearCase, etc.

https://www.youtube.com/watch?v=MjIPv8a0hU8

d-man · 2024-09-10T19:15:12 1725995712

I’d say that the key distinguishing characteristic is that your non-merge interactions with GIT do not rely upon a connection to the master server.

I think it would be possible to have a DVCS without the full repo history and metadata. Doubt that it would be worth the effort though.

erik_seaberg · 2024-09-10T19:18:03 1725995883

Having been on call 24/7 for production services, I found "git log" absolutely essential, almost more so than the latest code. We were usually expected to roll back rather than take additional risk fixing forward on outages, so the question was "roll back to what?"

infecto · 2024-09-10T11:52:40 1725969160

Git is a DVCS though. Just because GitHub exists does not exclude Git from the category of a DVCS. You get a local copy of the entire history with Git which is what pushes it into that category, nothing to do with GitHub. SVN is centralized in the sense that you are not grabbing the entire copy of the repo locally. Not academic differences.

beAbU · 2024-09-10T12:42:06 1725972126

It's been a hot minute since I've used SVN at work, but in my last job where it was SVN, each dev checked out the entire repository locally. Even though you /could/ check out a section of the repo, it made no sense to do that, because you need the entire codebase to run locally. Branching was still a mess though, and Git has really innovated in this space. We used to all dev on `develop` branch, and we'd daily pull from the server, fix merge conflicts locally, and then push up to the server. On releases our lead dev would merge dev with master and run off a build.

I still maintain the differences are academic, because even though Git is a DVCS (and I agree it is), and it is possible to use it as a DVCS. But given that GitHub is the defacto standard, and everyone uses it for work and OSS, I posit we are actually using Git as a CVCS, and any argument about Git being better than SVN because it's a DCVS is moot because nobody is using Git's distributed features anyway.

infecto · 2024-09-10T13:44:46 1725975886

I think we are missing something here, would like to be corrected if wrong.

Git is a DVCS because when you clone/pull a repo it includes the entire working history of the repo. Thats why its distributed, you could pull a repo from somewhere and never need to touch that source again. Has very little to do with Github.

SVN which I have not used in recent history but historically your local copy does not include the full change history of that repo and relied on a SVN server for that information.

I actually don't quite follow your arguments because while yes, we tend to setup Git so that is "centralized" the distinction is not about Github but that your local working copy is everything.

Snild · 2024-09-10T15:47:40 1725983260

I think it was a misunderstanding based on different views of what "the whole repo" means -- all the files or all the history.

It quite nicely demonstrated the difference in philosophies, albeit accidentally. :)

kbolino · 2024-09-10T13:56:06 1725976566

So much has gotten better thanks to distributed VCS that I think this perspective is a bit like a fish in water.

Every commit is identified by a globally unique content-addressable hash instead of a locally unique or centrally managed revision number. This means two people on opposite sides of the globe can work on the same project with no risk that they will think the same revision identifies different code, nor that they must coordinate with a distant server constantly to ensure consistency.

Moreover, core VCS operations like committing and branching require no server interaction at all. Server interaction is a choice that happens when synchronization is desired, not a mandatory part of every VCS command. "Commit early and commit often" could never happen with CVS or SVN on a large or geographically distributed team. And, of course, you can continue working on a cloned Git repo even if the server goes down.

Finally, forking a repository is still common even in the age of GitHub dominance. In fact, GitHub natively understands Git's internal graph structure, and makes forking and pulling across forks pretty painless. Yes, those forks may all be hosted on GitHub, but there can be far more dynamic collaboration between forks than was ever possible on say SourceForge.

So sure, everybody working on the same code may have the same GitHub repository as their origin more often than not, but we are still miles ahead of the world of non-DVCS.

It's probably worth noting too that even the canonical example of Git in practice, Linux, is essentially "centralized" in the same way. Linus Torvalds's clone is "the" Linux kernel, any clone that differs from his is either not up-to-date or intentionally divergent and thus unofficial. A lot of work gets merged first in other people's clones (with some hierarchy) but Linux also has tens of thousands of contributors compared to the average Git repository's handful or less.

immibis · 2024-09-10T17:18:14 1725988694

You only checked out the latest version of each file in the entire repository. You did not check out the entire respository, like you do in Git.

cryptonector · 2024-09-11T03:46:49 1726026409

Git is a DVCS, and GitHub uses Git, so it's a DVCS because I can clone locally. GitHub is a central location, that's true, but I can still have my local clones, and I can host my own forks locally, anywhere I want, at a GitHub competitor, under multiple GitHub orgs / users, whatever. So, yes, it's a DVCS.