Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This was actually pretty fascinating to me. On one hand, I am astonished at how long it takes to perform seemingly trivial git operations on repositories at this scale. On the other hand, I'm utterly mystified that a company like Facebook has such monolithic repositories. Even back when I was using SVN a lot, I relied on externals and such to break up large projects into their smaller service-level components.

I'd be very interested to see some benchmarks on their current VCS solution for repositories of this scale.



From a followup post: “We already have some of the easily separable projects in separate repositories, like HPHP. If we could split our largest repos into multiple ones, that would help the scaling issue. However, the code in those repos is rather interdependent and we believe it’d hurt more than help to split it up, at least for the medium-term future. We derive a fair amount of benefit from the code sharing and keeping things together in a single repo, so it's not clear when it’d make sense to get more aggressive splitting things up.”


He notes that these repositories are somewhat broken up already, and wants to keep them together.

There are good reasons to keep code in one repository; particularly, git's submodule support has a number of nasty interface tradeoffs; I wouldn't say it breaks git, but you have to keep a clear understanding of all your submodules in your head when you have a lot of them.

OK, it pretty much breaks git to have submodules that are interdependent. I know this because I am currently moving one of my organizations off this exact plan -- it's the opposite of useful and speedy to have to worry about versions across a large number of backend / frontend repositories.

It is MUCH easier and therefore better for developers to put them together, and release together.


Given that Facebook is compiled into a single 1 GB executable, a git repo with 1.3 M files doesn't really surprise me.


Oh God, it's like Amazon was 10 years ago.


What? Do you have a reference for that?


Here you go: http://www.facebook.com/note.php?note_id=10150121348198920

"We can build a binary that is more than 1GB (after stripping debug information) in about 15 min, with the help of distcc. Although faster compilation does not directly contribute to run-time efficiency, it helps make the deployment process better."



For that and other info, check out the "Push" Tech Talk given last year by the Facebook release engineering team's leader, Chuck Rossi: https://www.facebook.com/video/video.php?v=10100259101684977


Facebook Engineering posted a video about their build process in May 2011. Seek to 25:55 for the source: https://www.facebook.com/video/video.php?v=10100259101684977...


Not sure if it's still the case, but Google hosts all their internal source code on a modified version of perforce, so they essentially have everything in one repo.


Remember that this is a synthetic repository.


What do Facebook and the National Institutes of Health have in common? I'm pretty sure this will end with Facebook building their own versioning system from scratch and give it some kitchsy name like "Retro".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: