Hacker News new | past | comments | ask | show | jobs | submit login
Monorepo or Multirepo? Role-Based Repositories (7mind.io)
141 points by optician_owl on Oct 12, 2019 | hide | past | favorite | 115 comments



The main driver of success in either model is in the tooling and practices invested in it to make it work in an organization. Google is successful with their monorepo because they have invested in building (blaze), source control (piper, code search), and commit to always developing on HEAD. Multirepo is currently easier for most companies because most public tooling (git, package manager) is built around multirepos. One place I see multirepos fall over is awful dependency management practices internally and in open source. Many dependencies quickly become outdated and are not updated in cadence, slowing down writers and consumers. Better tooling can help here but an organization needs real discipline to stay on top of things.


When I started at Google the tooling was not very good, and monorepo was pretty painful. They used perforce, and it simply couldn't keep up. Commits could take minutes. Code review was also unbearably slow. Blaze didn't exist yet; just before I started they had tools that generated million+ line makefiles that everyone hated. So yeah, you need good tooling, but even Google didn't have it for the first ~10 years of its existence.


I went to graduate school with a guy who ended up on Google's infrastructure team. He'd previously worked as head or lead dev for subversion (a little hazy on the details).

We worked on a small project where we put together statistical measures for codebases. It was a lot of fun, even if the infrastructure was out of my wheelhouse at the time.

Folks that can manage billion-line codebases are on a whole different level I think. I wonder sometimes how many folks like that there are.

EDIT: Looks like he left for a bit and is now back. Good on him!


I wonder why nobody have made a good public monorepo offering similar to what Google have internally. Would probably be a hit at many companies since it fixes so many issues related to working in very large teams.


At large enough scale, it causes a lot of problems and breaks like every other dev tool requiring a lot of work to get it back together.

That said, there are some open source pieces to help. Facebook open sourced their mercurial stuffs so you can get version control at scale (and before then you just use perforce). Google open sourced bazel. Google open sourced some parts of the underlying infra behind code search, but not enough to really work properly. And of course lower level there's a plethora of reasonable db offerings, etc.

It would still require a lot of glue though.


It is just that Googles tooling around code works really well together. Code search to view code and directory based history so you aren't swamped by others commits, tap to run all unit tests all the time but with sectioned projects so you don't run all tests on every presubmit, sponge to gather every test log ever (even for the tests you run locally) so you can link full test logs to coworkers when you have problems (also note that the logs source file links are actual links into code search), critique for easy versioned code reviews where you can diff code between any sets of comments so you can see its evolution and running presubmit checks and sponge links for tests, blaze to make a structured directory dependency management system to make partial checkouts and distributed cached builds work well.

I'd like a set of tightly coupled tool like that working outside of Google, but I guess it might be a just a dream, it is a bit too big of a project.


> directory based commits This whole thread is interesting because subversion is exactly this, and works with large code bases. We used to have these told and we moved away from them.


Subversion works with large code bases, but not crazy massive codebases.

The version control is just the tip of the iceberg though, and is largely a solved problem: git or subversion, then perforce or straight to Facebook's mercurial stuff.

It's the other tooling that breaks on large enough mono repos that you have hard time with publicly. Searching your code takes awhile. Cross references don't go wide enough or take too long to generate. Builds take too long. Refactoring tools either take too long or don't support the repo of sufficiently large size.

Mostly this isn't a problem though, because few repos are actually large enough to cause real problems.


Now that Git Virtual File System (VFS) is coming to GitHub, I think we shall see an uptick in monorepo adoption. Though what I said early about compatible tooling still applies. The repo style affects so much of the tooling, from Continous Integration, deployment, building, versioning, etc.

It really is not a small difference.


Git VFS, to allow monorepos to be operable at enterprise scale (courtesy of Microsoft) is out there, as is Uber's white paper on ML to predict branch merge compilation success - an actual problem at enterprise scale. That doesn't cover all issues but every (huge) company's workflow is different, thus there's no "one size fits all" that's totally suitable - part of git's success is that it fits into whatever existing workflow a company already has. So I'm not sure the full suite of Google's tools would be at hit at all companies, especially anything smaller, and less dedicated, than. Google. Having a team to work on and another to operate, Bazle (and the internal Google project it springs from); not all enterprises are willing to staff that work, and definitely not as strongly as Google.


You know Google's monorepo is successful because they have so many of them! google3, Android, Chrome browser, ChromeOS...

Kidding aside, my point is Google recognizes obvious boundaries between e.g. their web stuff and android, and organizes their code accordingly.


Google uses no versioned libraries?


Libraries internal to Google are kept at the latest and consumers are updated to use the latest APIs. 3rd party libraries are checking into the monorepo at a specific version and everyone uses the same version.



I think they do, but there is only ever one single version in use - the version in the repo.


Yes, and that prevents duplication of work when it's time to upgrade: https://github.com/microsoft/TypeScript/issues/33272


If so, they would have a massive problem upgrading libraries like numpy because there are too many and too big breaking changes between releases.


Inside Google's main repo there are different build targets for libraries with incompatible API changes that are too difficult to fix all at once, e.g. there might be numpy_1_8 and numpy_1_10 separately.

Python at Google muddled along for years without numpy at all so it's not like anyone would be seriously harmed by having an old release in the repo.


Monorepo shortcomings 1 and 2 seem like bullshit to me. Perforce, the popular monorepo at most companies I've worked at, supports access control. Monorepos do not prevent you from segmenting your code into modules and pushing binary/source packages into source control so that builds can avoid compiling everything(TiVo used to do this, and it worked well when you got the hang of it).

I feel like these debates are often fueled by false arguments. Either way you go, you're going to want to build support tools and processes to tailor your VCS to your local needs.


VCS access control are the wrong tool for solving the "people use code they shouldn't" complaint.

First, VCS ACLs will massively reduce the benefits you're supposed to get from a monorepo. How will you do global refactors in that kind of a situation? How does a maintainer of a library figure out how the clients are actually using it? (The clients must have visibility into the library, but the opposite it unlikely to be true.)

Second, let's say that I maintain a library with a supported public interface that's implemented in terms of an internal interface that nobody's supposed to use. How will VCS ACLs allow me to hide the implementation but not the interface? When they kick off a build, the compiler needs to be able to read the implementation parts to actually build the library. It can't be that the clients have access to read the headers but then link against a pre-build binary blob. At that point you don't have a monorepo, you've got multirepos stored in a monorepo.

The actual solution are build system ACLs. Not ACLs for people, but ACLs for projects. Anyone can read the code, but you can say "only source files in directory X can include this header" or "only build files in directory Y can link against this object file".


VCS ACLs can allow for read-only access. You can also split public interfaces into their own header. If you want the maintainer of a library to be able to refactor clients of the library, then you have to grant them access to the client code. How does a multirepo solve this issue?

> How will VCS ACLs allow me to hide the implementation but not the interface?

If you don't give people access to the code, they can't build it. So what? Publish pre-built binaries from your CI system back to source control.

> At that point you don't have a monorepo, you've got multirepos stored in a monorepo.

I think it's a spectrum. It would be stupid to dogmatically stick to either extreme. You modify things in a pragmatic fashion to solve the problems you're facing. In my experience, starting with a monorepo and making exceptions as needed has worked better than the alternative.

Your post sounds similar to a lot of the multi/mono repo discussions. You've focused on one problem and one way to solve that problem without considering that there are many ways to work around it. Neither approach is going to be pain-free and both require tooling for special scenarios.


Bazel has this via the 'visibility' attribute on packages and build rules: https://docs.bazel.build/versions/master/skylark/build-style...


> VCS access control are the wrong tool for solving the "people use code they shouldn't" complaint.

I agree

> The actual solution are build system ACLs.

Or, maybe, better languages enforcing better design. In most of the cases artifacts and libraries are not related to the domain, engineers create them just to establish artificial boundaries between code components, isolate irrelated things, enforce encapsulation and avoid accidental mixing of metalanguages.

It would be lot better to have a smart compiler for this.

A tool which can prevent us from mixing different abstraction layers, creating unneccessary horizontal links between our components, etc, etc.

I have a couple of ideas how such a thing may look like.


> Monorepo shortcomings 1 and 2 seem like bullshit to me.

It's a blogpost and the author didn't try to build a total and exhaustive formal system. These shortcomings are not absolute truth but actually they are true.

I've seen this multiple times: a small projects evolves over years into a monster. Engineers add new components and reuse any other components they may need creating horizontal links. At some point they feel like they lost their productivity and they blame monorepo because it's easy to create horizontal links in a typical monorepo. So, they try to build a multirepo flow and they spend a lot of effort, time and money trying to make it working. At some point they feel that their productivity is even worse than it was before because now they need to orchestrate things so they merge everything back.

Same applies not only to VCS flows, but to system design as well.

When we discuss monolith/microservices controversy all the monorepo/multirepo arguments may be isomorphically translated to that domain. What is better, monolithic app or a bunch of microservices? A role-based app of course: https://github.com/7mind/slides/blob/master/02-roles/target/...


Monorepo/multirepo and monolith/microservice are orthogonal concepts. When organizations don't understand that then they may end up building a distributed monolith in across multiple repos. (The "Distributed Big Ball of Mud" anti-pattern.)

Monorepo advocates are typically advocating for microservices, but within a single code base.

The way you provide access control is through code review and build system visibility.

In order to modify another group's code you require their approval on the review for that section of the code base. (Using mechanisms like github/gitlab owners files or rules within upsource.)

This still means that if one group needs to make extensive changes to another groups code, the path of least resistance may be to fork it into your own group's section of the repo.

Build tools provide another point of control. If you're using a tool like bazel, the way you link to a component in another portion of the repo is through target names. The only targets your code will have access to are those that the owners has declared as being available for external builds.


> Monorepo/multirepo and monolith/microservice are orthogonal concepts.

Yes and no. In both the cases it's a story about components and their isolation.

> they may end up building a distributed monolith

Yup, seen that many times.

> Monorepo advocates are typically advocating for microservices, but within a single code base.

I'm avocating roles. Everywhere.

> If you're using a tool like bazel

If only Bazel supports Scala well enough...


> If only Bazel supports Scala well enough...

Many companies build their Scala code using Bazel[1]. For example, Databricks wrote about their experience using Bazel on a monorepo containing mostly Scala[2]. Can you share the specific concerns or issues you faced? Thanks.

[1] https://github.com/bazelbuild/rules_scala/blob/master/README... [2] https://databricks.com/blog/2019/02/27/speedy-scala-builds-w...

(Disclaimer: I work on Bazel)


Thank you, I know. Though I need to build ScalaJS (and I have one small Scala Native) project. This is a total no-go for Bazel. Unfortunately.


All of the supposed flaws of a monorepo in this article are actually flaws of git. This is a very common phenomenon. I often joke there are two kinds of developers: those who prefer monorepos and those who have never used perforce.


This is all true BUT I think the monorepo as described here is the act of treating all your projects as directly referencing each other.

Sure you could just use a manyrepo style of dependency tracking in a monorepo but I think that's not exactly what the author is exploring.


> This is all true BUT I think the monorepo as described here is the act of treating all your projects as directly referencing each other.

From what I read that is a correct assessment. What the OP is proposing is something of a strawman argument. No advocate of monorepos I've ever met believe that a monorepo should imply a monolith.

Generally they're advocating monorepos in order to develop microservices faster, and with less effort. Using a monorepo and the associated tooling side steps the pain that comes from complicated CI, the difficulty of sharing code, the difficulties of non-atomic cross-repo reviews, and the difficulties of making multi-app refactorings.


Can you elaborate on “monorepos do not prevent you from checking packages into source control” and how that helps to avoid recompiling everything? Why would you check a package into source control anyway? Surely source control is for source code? And I lean toward monorepos, btw, but there are still lots of obstacles and monorepo proponents don’t tend to acknowledge them or offer clear suggestions for how to solve or workaround them.


You can use something like a shared binary repo such as maven or you could just check in dependencies and not worry about an external server being available for builds.

>Surely source control is for source code?

This is just pedantry. Checking in binaries is a pragmatic solution that solves a lot of problems.


I was rather under the impression that checking in binaries was discouraged because it led to performance issues and tends to blow up the repository size. I don't think it's just pedantry.


I wasn’t trying to be a pedant, I’ve just never heard of anyone doing this. I was wondering how it helped solve the problem of not rebuilding everything.


In short, the binaries are already built. Usually its faster to link to a prebuilt binary than to build from scratch.


So where do these binaries get built and how does the system know which binaries to rebuild for a given change? If developers are building binaries and committing them directly, doesn’t that open up security or even correctness issues? How does this approach satisfy compliance concerns (how can the CTO or a manager sign off on the changes that went into the binary if it’s just something a random developer committed?)? How does this scale to tens of deployments per day? These are hard monorepo problems, and they keep being handwaved away.


Suppose the binaries in question are build tools or similar: then this is good, because they never get rebuilt. The paperwork is done, the binaries get committed to version control, and everybody that builds the code then builds the code with the approved binaries. Everybody is happy.

Suppose the binaries are build byproducts, and people just check this stuff in, like, whatever. Well, if somebody needs to sign off on the output, that's a problem - so that person then doesn't use what's in the repo, but instead builds the output from scratch, from the source code, hopefully with known build tools (see above!), and signs off on whatever comes out.

But, day to day, for your average build, which is going to be run on your own PC and nowhere else, nobody need sign off on anything. If you link with some random object file that was built on a colleague's machine, say, then that's probably absolutely fine - and even if it isn't, it's still probably fine enough to be getting on with for now. If you work for the sort of company that's worried about this stuff, there's a QA department, so any issues arising are not going to get very far.

Overall, this stuff sorts itself out over time. Things that are problems end up having procedures introduced to ensure that they stop happening. And things that are non-problems just... continue to happen.


>So where do these binaries get built and how does the system know which binaries to rebuild for a given change?

For simple things, if the code in a directory changes then the CI system does a rebuild of that directory. You can have the CI system either validate that the binary matches or commit the binary itself. More complicated things you'll have a build system such as Bazel which figures out what changed.


(Sorry for being terse—on mobile). Validate the binary matches what? If the compiler has to compile the artifact to verify the artifact provided by the developer, why bother having the developer commit the artifact? The CI system could just do it. Never mind that having a bit-for-bit reproducible build is incredibly difficult. Anyway, such simple cases where a whole app lives under a single directory are vanishingly rare.


>The CI system could just do it.

Depends if you want to wait for the CI system to upload or not. Also if you want CI to have commit permissions.

>Never mind that having a bit-for-bit reproducible build is incredibly difficult.

Debian is at something like 90% reproducible packages once they fix two outstanding things. Most languages will have settings and best practices at this point that will give reproducible builds.

>Anyway, such simple cases where a whole app lives under a single directory are vanishingly rare.

Then use Bezel once you get past that stage.

Look, to be blunt, it seems like you're trying to nitpick whatever anyone says while ignoring large parts of answers. Fact is, many people at small and large companies use monorepos successfully. They work for those people, you can keep trying to argue they don't or try to learn why they do.


> Depends if you want to wait for the CI system to upload or not. Also if you want CI to have commit permissions.

I guess you could deploy first and verify automatically later. Hadn’t thought of that.

> Debian is at something like 90% reproducible packages once they fix two outstanding things. Most languages will have settings and best practices at this point that will give reproducible builds.

Never the less, getting (and keeping) bit-for-bit reproducibility is a ton of work, especially for software that changes every day, and the benefits aren’t compelling for many projects.

> Then use Bezel once you get past that stage.

This seems to be the answer, but it’s not very satisfying since Bazel’s support for many popular languages (e.g., Python) is lacking and there are lots of rough edges to iron out.

> Look, to be blunt, it seems like you're trying to nitpick whatever anyone says while ignoring large parts of answers. Fact is, many people at small and large companies use monorepos successfully. They work for those people, you can keep trying to argue they don't or try to learn why they do.

I never understand why people get defensive about things like this. I’m not attacking monorepos. I manage a monorepo at my small company, and I’ve run into lots of issues trying to make it work. I’m here trying to understand why so many people rave about monorepos, but often don’t have good answers for things like “how to manage rebuilds?”. You see this as “nitpicking”, but the distinction between “just git diff a directory!” and “use something like Bazel” is important.


Its really not any different than depending on the exact version in some dependency manager. Instead of just the dependency config you check in the binary. When a dev needs a newer version of a dependency they can pull it down and check it in. You wouldn't check in random nameless binaries, just hard copies of things you would have linked to from a dependency repository.

This doesn't work well for dependencies where you're expected to be using the latest version of something that changes 10 times a day.

The rest of your questions are fairly irrelevant as they would be answered the same way as the in the dependency repo case. ie, use official binaries.

...but this is closer to multi-repo than monorepo. If you're in a monorepo you might as well use the source.


> So where do these binaries get built and how does the system know which binaries to rebuild for a given change?

By the CI. All major CI/CD tools support rules like build binary x whenever a file under x-src/* changes; commit binary x when the ref matches /v[0-9.]+/; don't allow developers to manually push to these refs / paths; (run a script to) bump the dependent x of y whenever binary x changes; merge the bumped version if all tests still pass; etc.


The problem is dependency graphs aren’t strictly hierarchical, so it doesn’t suffice to say “rebuild whenever something under this directory changes”.


Not sure how people do this in practice. But in principle it seems rather straight forward.

A compiler is just a program that takes some input and create some output. Both the compiler and the input can have a cryptographically secure hash. Putting both in a sealed box, like a docker image, with its own hash, gives you a program that takes no input and produces some output.

If the box changes, run it in a trusted machine and save the output together with a signed declaration of which box version produced it


Docker makes this drastically easier (need the exact same versions of all libraries and the compiler), but there are still compile time things that are unique per-compile. Debian has been working hard to get hashes of binaries to be useful but the work is far from trivial.

(See also: trusting trust)


I see this an inherited technical debt though not a flaw in principle. It would be nice if we could solve this at the foundational level instead of forcing every dev organization to struggle with it on their own.

Edit: I think we’re getting there though, with all the efforts going on with containers, webassembly blockchains, ipfs and so forth it’s getting closer


At Google we check in the source of every library into the monorepo and compile them ourselves with cached builds from a central server, I don't think we use package managers.


You don't have to use a package manager, that's just the approach the TiVo folks came up with a couple decades ago. They use RPM to package independent software modules and check them into (IIRC) a separate build repository which saves the last n months of work. A local config file is used to choose the binary package version to use, or, alternatively, the locally built files to use. They probably could have just made tarballs, since I don't think they used any of the dependency checking.


How do you track dependencies of dependencies. Do you need to manually add the full dependency tree and re implement the dependency tracking through your internal system? If a project uses maven or gradle, you need to rewrite those files to point to your internal builds instead?


Not a Googler, but I think the answer is: yes. At least, it is for my monorepo company.

Usually somebody else has already gone through the work of doing it for you. Sometimes there are tools that do the translation for you. For example, Go modules are quite easy to translate to a BUILD file.

It’s actually not as bad as it sounds. You only have to do the hard stuff once, and every engineer in the org who uses it in the future is thankful for it.


They use a tool called Blaze (Google around for “Bazel” which is the open source tool inspired by it). Basically you model the dependency tree such that the tool knows which targets are affected by a certain change, and then Blaze builds them in a clean room environment such that an undeclared dependency would cause the build to fail (hermetic builds). As far as I’m aware, this is the only way to sustainable operate a monorepo, but I would be happy to learn more if someone has other solutions.


I assume you mean third party dependencies that are not in the monorepo? Pretty much yes, monorepos struggle if they are expected to handle dependencies that aren't stored in the monorepo, so step 1 of using a dependency from outside of a monorepo should be to copy the source into the monorepo (and transitively copy the source of dependencies, etc).


Full dependency tree yep. No build in google's main repo ever retrieves code externally.


It's version control, not necessarily just source control! If something could benefit from being versioned, why would you not check it in? You then guarantee everybody has the same version. That's exactly what this thing is there for.

Git's design can limit its usefulness in this respect - though perhaps you could solve this to some extent with git LFS? - but not all version control systems have this problem.


git annex (or git LFS, if you buy into github's NIH) is requisite if you want to use git like this, broadly. git will happily store any and all binaries you ask it to, but upon (blind) checkout, it will grab every single revision of said binary, taking up as much however much space that takes.

(partial clones avoid this, but, as git isn't designed for this use case, grabbing all of history happens far too easily.)


I regularly join projects where someone has decided to place the project's code in half a dozen different repositories.

Even though it's one project.

Even though they refuse to allow a release of a single component - it must all be released together without forwards/backwards compatibility.

I think most of of the time, the mono/multi debate is spoiled by people who feel they can have their cake and eat it too.


I think that whether to use mono/multi repo depends on whether you're willing to dump money into updating everyting at once, or not. If not, monorepos are really a big hindrance. It's better to split on the project boundary (things that may have different development paces), and use git worktree for having different versions of libraries checked out for building/bundling.

It works fairly nicely with meson, as you can simply checkout a worktree of a library into a subprojects directory, and let individual projects move at their own paces even if you don't do releases for the libraries/common code.

It's not really clear why having to update every consumer in sync with library changes is beneficial. Some consumers might have been just experiments, or one off projects, that don't have that much ongoing value to constantly port them to new versions of the common code. But you may still want to get back to them in the future, and want to be able to build them at any time.

It's just easier to manage all this with individual repos.


> whether you're willing to dump money into updating everyting at once

I think the majority of projects in this world only update everything at once. They haven't investing in testing, sensible api's and testing to allow updating small pieces of their solution.

From my experience, I also think the majority of people who think they have a library and need multi repos to deal with that, don't have a library.

To further clarify, one user of your library means you could stop pretending you have a library and avoid the pain.

I don't mean to insist these problems do not exist, I simply don't think many people have them.


Project != company. Project != consultancy.

Monorepo just didn't work for me. I have ~10 web projects for different customers + my personal projects that use various versions of some common code.

It doesn't make any financial sense to evolve my common code by updating all the customer's code for free when they're not even asking for it. So on this level it doesn't work.

Even within a single company with just two devs and around 90 repos for the main product and plugins, it was hard to justify making a mono-repo with plain git, because plugins and the main app had different release schedules, priorities, so it never really happened that it was economical to port all plugins right away to the new version of the main app on every change.

I still think going multi vs mono is a business decision, rather than a technical one. You'll have to have special tooling for either case, just a different one.


Yep, I agree and that all makes sense.

Don't you find those projects within the company wanting multi-repos also?


They have multi-repos. There was some thought given to an idea to transition to mono-repo, but it never happened.


One of the things I've done at a couple companies now is flatten multi into mono - it just simplifies everything, it's all deployed as one unit, so easier to track and do changes across different parts of the code base in unison.

I have typically left mobile iOS/Android in separate repos however - they have a different deployment cadence, so you need to manage breaking changes differently anyway.


There's a lot of people on here defending their current workflow, whatever that is.

I for one find it refreshing that people are willing to think about different workflows, even if they are different.

It feels like what is described is a cross between a good language package manager and git submodules. It's an interesting space to explore, because a lot of nice things come out of submodules, but it's not a proper package manager.

A proper dependency manager that puts code in a workspace and manages it as you are working on it in a non clunky way is not something we have right now and may be a game changer. Thanks for sharing to the authors.


I'm curious: how would most people here define monorepo vs multirepo?

On the surface, most people seem to think of a monorepo as a source control management system that exposes all source code as if it's a traditional filesystem accessed through a single point of entry. Multirepo, in contrast, seems to be about multiple points of entry.

But that's a superficial and uninteresting distinction. All the hard parts of managing code remain for both and, for a sufficiently large organization, you'll still need multiple dedicated teams to build tooling to make either work at scale. All the pros listed in the article need a team to make them work for either approach, and all the cons are a sign that you need a team to be make up for that deficiency for either approach.

Aesthetically a single point of entry appeals to me, in that it allows for a more consistent interface to code. But I'd go for good tooling above that in a heartbeat.


I've shifted to focusing on repo == team. If your organizational structure is to have many little teams that are independent from each other, then you build your source code management to reflect that.

I built my engineering staff to focus on any of the initiatives that my boss hands to me (changes week/week) - so we went monorepo so we could move between those projects/apps/programs quickly.

We knew that we didn't want to pay the maintenance cost just because microservices/multirepo was a buzzword AND we wanted future ventures to get faster (example: we solved identity for authn/authz once and now every app that needs it after can leverage it and we can upgrade identity and all of its consumers in one pull request).


This is my conclusion too. Team becomes fundamental entity and projects/products belong to teams. Everything the team produces stays on the team's repo. So you always know which team owns what. It is also easier to review, supervise and clean up.


Sounds like that would enforce Conway's Law a lot and make cross team collaboration more difficult.


It's easy to use a monorepo in a way that feels like a multirepo, and vice versa. I'm inclined to say that the defining difference is around versioning. To put it another way, can you choose to ignore that your dependencies have upgraded?

In a monorepo your builds are at the same point in time horizontally across all of your dependencies. You build together or not at all (though not necessarily at HEAD). In a multirepo you have the option to build against any (or some subset of) point-in-time snapshots of your dependencies on a dependency-by-dependency basis.

If you have a single monorepo that all of the code is in, but your build system allows you to specify what commit to build your dependency build targets at instead of forcing you to use the same commit as your changes, you actually have a multirepo. If you have a bunch of repos but you build them all together in a CI/CD pipeline that builds each at it's most recently released version then you actually have a monorepo.


Why not both? I've been using https://github.com/mateodelnorte/meta and having a great time so far, it's just that GitHub (and others) don't have a simple way to bundle multi-repo commits in pull requests.


I agree with Christian... Why not both? Lots of teams I interact with have great reasons for a monorepo which they admit requires some work in tooling and processes and claim they're successfully releasing software faster with less effort if their code lived in disparate repos. I believe teams must choose the appropriate patterns that work best for their architectures and situations.


Whats the current state of git submodules? It seems like you could get some of the benefits of mono-repos in that you can reference dependency projects directly like a mono-repo. You can, in theory, treat many projects like a single code base.

I don't see it used very often though. Why not?


Even with submodules it's still a PR per repo. Global, atomic changes are super powerful.


There is another tool called git-subtree that should solve these problems, I think. But I've never seen it in use


git subtree is just a wrapper around subtree merges, I don't think they solve the same problem as submodules.


Good point! Hmm. Maybe git(hu|la)b could solve this with some kind of pull request batching.


Because if some change requires you to make commits to 3 repos, submodule approach will require to make 4 commits instead. Very annoying.


In one of my first jobs like 15 years ago at a large software company we had just moved to a monorepo.

It was introduced to counterbalance what many saw as a big mess. Result was a lot of process being introduced which slowed everything down, but that was probably necessary at that stage. To my knowledge the company keeps switching back and forth- but new projects that need to move fast typically are done independently still.


I would expect you need really good training in place to make it work. e.g. Microsoft uses a git monorepo for the Windows codebase; obviously that is not something you could just come in on and do a "git clone" as you might on a small project.


I bet you could address this with a third approach: metarepo. The metarepo is a repo that uses sub modules to combine your multi repo ecosystem into a simulated monorepo. The metarepo is what ultimately gets built and deployed—no versioned dependencies to manage. Local development usually happens at the multirepo level, and the metarepo is managed mostly via CI.


Good idea. But what gets checked in at the metarepo level? The names of the branches that are checked out in the submodules under it?

Can you have two metarepos, each with its own set of checked-out branches of the same original submodules?


I believe submodules track specific branches.


Also Google's repo tool. I like the idea of manifests and the commands it includes for creating feature branches across repos


For TypeScript/JS projects I think NX CLI is pretty awesome as it handles multiple frameworks


So, in a monorepo world, isn't it often that you have to deploy components together, rather than "it's easy to"? How are services deployed only when there has been a change affecting said service? Presumably monorepo orgs aren't redeploying their entire infrastructure each time there's a commit? Are we taking writing scripts which trigger further pipelines if they detect change in a path or its dependencies? How about versioning - does monorepo work with semver? Does it break git tags given you have to tag everything?

So many questions, but they're all about identifying change and only deploying change...


Each service has its own code directory, and there's one big "shared code" directory. When you build one service, you copy the shared code directory and the service-specific directory, move to the service-specific folder, run your build process. The artifact that results is that one service. Tagging becomes "<service>-<semver>" instead of just "<semver>". You may start out with deploying all the services every time (actually hugely simplifies building, testing, and deploying), but then later you break out the infra into separate services the same way as the builds.


> Are we taking writing scripts which trigger further pipelines if they detect change in a path or its dependencies

Unless one enforces perfect one-to-one match between repo boundaries and deployments, this is also an issue with multirepos.

In practice, it's straightforward to write a short script that deploys a portion of a repo and have it trigger if its source subtree changes and then run it in your CI/CD environment.


I worked in a big bank in the UK using monorepo "cuz Google uses it", error number 1, your not Google. The clones were gigantic, Jenkins would timeout cloning the whole project when all it needed was a bunch of files. Merge conflicts all over the place, but the best part, we had scripts on our pipeline literally removing folders after cloning the repo to avoid automatic inclusions of libs etc. In my opinion separation of boundaries is one of those things that should t be mess with.


Monorepos with Git don't play together nicely. Perforce is key if you have lots of devs on a monorepo.


I don't understand the isolation difference. You can hide, protect and branch code in a monorepo so why is isolation a concern?


It depends on which VCS you use. Git for example, doesn't have any native support for hiding or protecting code in particular folders within the repository.


Hmm seems unfair to judge monorepos on what git is capable of. I hate perforce but it accomplishes this easily.


For all its benefits, Git has plenty of limitations that don't exist in other systems.


git has hooks for access control (which is how e.g. gitolite manages permissions, albeit at the whole repo level - I'm not familiar with an open-source hook that does directory level).

With respect to hiding, git has sparse checkouts that can give you a limited view of a repository (for performance reasons - not for security reasons)

But that's just today's git. Other VCSs like perforce provide much finer grained access control and hiding.


We do multi-repo. It makes it a little slower, cause we have to get commits into our common libs repos (there are two) before we can do app/product repos updated. Using the environment package manager (composer, nom, yarn) rather than git-sub-module helps a lot.


GoCD provides “fan in” which supports monorepos

https://docs.gocd.org/current/advanced_usage/fan_in.html


Amazon does multi-repo. I don't see what the problem or debate over this is. We seem to be handling it pretty fine despite a massive-scale SOA architecture.


The multi-repo pattern certainly meshes well with Amazon's team structure, and of course integrates well with the build system and deployment system, given that they were created around it. But "handling it pretty fine" seems like a stretch.

When last I was there things were finally beginning to burst at the seams. Platform architecture migrations were failing or being abandoned over too many untracked dependencies on specific versions of platform-provided libraries. (RHEL5, anyone?) Third-party had become a jungle of unmaintained libraries with dozens of versions that nobody ever upgraded, that may or may not have security vulnerabilities or known bugs, and many teams hadn't released new versions of their clients/libraries into Live for years in fear of breakage. The Builder Tools team was talking about giving up and abandoning both Brazil and Live as unsalvageable. Framework teams (Coral) were throwing their hands up in the air about how Coral-dependent services would not be able to upgrade to Java 11 without fixing a bunch of breaking changes that they would never agree to fix. The solutions being proposed to these problems by the Builder Tools team looked a lot like moving toward a monorepo, at least conceptually.


When I was there, they were migrating away from perforce because they could no longer scale perforce fast enough to meet demand. I've not seen this talked about much outside of Amazon.

It was also a huge day-to-day quality of life improvement for the users (the developers.) There are UX problems with git, but they pale in comparison to the UX problems with perforce which is truly unpleasant software.


The Alexa division migrated aggressively to git as soon as it was available and nobody publicly voiced any regret about losing perforce.


Several of the people I worked with at Amazon were skeptical of git, at least initially. Some people prefer tools they already know, prefer the routine and habit over learning a new tool. And I totally respect that by the way, git's UX is superior in mainly aesthetic ways, in terms of tactical productivity it's more of a wash. I still think git has the edge, but there is nothing to say a seasoned developer who's used perforce for years isn't being exceptionally productive with it.

Nearly everybody I talked to about it eventually came around to prefer git though. Once you've been forced to swallow the bitter pill of learning something new and changing your workflow, I think the advantages begin to shine through.

On the other hand, maybe I'm just biased because I was proficient in git years before I was ever exposed to perforce. So maybe it was myself who was balking at learning something new, and that's why I was so relieved when my team switched to git. But I do genuinely believe that git has a superior UX.


Our team did have the complicating factor that we were doing private builds - which mean that our source code was in a private subversion repository and then perforce was used to track the brazil primitives and private build decryption key stuff.

Once git support was good enough, leadership was very supportive of an en masse exodus.

I also think my team was pretty junior, which meant they'd never actually seen perforce, so as you say, moving to git was going back to something familiar for nearly everybody.


I'm curious about those who use a monorepo with microservices: how do you solve CD/CI? Is Bazel the only solution?


CI and CD are more workflows than tools. It doesn't really matter what your repo setup is, you just adapt your workflows to it. On one project I work on we use a monorepo for a handful of microservices. We use standard GitHub flow, no special repo consideration for the CI.

For CD, we have scripts that ask what service you want to build, and they specifically package that service using the set of files & processes dedicated to that service. The build generates a versioned artifact. After that, repo doesn't matter at all, we're just moving service artifacts around.


The cons to multi repo are all anti patterns for microservices anyway. If you're doing microservices you shouldn't have build dependencies on other projects. The should only call eachother at a network level.


Calling eachother at network level is still a dependency. (And even a build dependency if you use something like protobuff or other protocol description files)


A network dependency is not a build dependency. Protobuf files should be copy pasted, not referenced directly. Saying you need a single repo to build correctly for your network dependencies is like saying you cant use a third party system (aws, etc.) Without having a link to their code base.


Point is that when you want to do a change in the "API" (or call it "protocol"), you need to touch the different repositories and coordinate to use the right versions together.

About the copy/paste of protobuf files, it works but makes it more difficult to keep them in sync.

And I did not say you need a single repo. I'm saying the stated disadvantages of multi repo are real.


That’s the part about monorepos I can’t quite wrap my mind around - yes deploying a single large change to many different systems simultaneously is cool in theory, but how does it actually pan out? Deployment is never instant, so any system-to-system breaking changes would cause a short downtime while everything deploys. In the world I operate in, that’s absolutely not acceptable.

Not that you can’t still make your changes backwards compatible with themselves. But if I’m going to have to deploy everything in two steps anyway, what’s the point?


But... we do better things than microservices: https://github.com/7mind/slides/blob/master/02-roles/target/...


Read this, not worthwhile and OT.


> not worthwhile

Why so?

> and OT

Not at all. In both cases it's about roles.


It misses the point of microservices in the same way that posts titled 'sharding' instead talk about load-balancing.


I am a big fan of monorepos and I've worked on a few open source projects that have used mutli-repos and at some places that used a hybrid approach. I agree with some of the ideas this article has put into writing but I wanted to provide some pointers from my experience.

Some background: at my current place of employment I have 28 services, should be 30 in the next few days, and so I think my use current case is very representative of a small to medium monorepo. At my last job right before this one we had sort of a monorepo that was strung together with git submodules although each project was developed independently with it's own git repo+ci.

> Isolation: monorepo does not prevent engineers from using the code they should not use.

Your version control software does not prevent or allow your developers from using code they should not use. It is trivial to check in code that does something like this:

    import "~/company/other-repo/source-file.lang" as not_mine;
Or even worse in something like golang:

    import "github.com/company/internal-tool/..."
Because of this it is my opinion that it is impossible to rely solely on your source control to hide internal packages/source/deps from external consumers. That responsibility, of preventing touching deps, has to be pushed upwards in the stack either to developers or tooling.

> So, big projects in a monorepo have a tendency to degrade and become unmaintainable over time. It’s possible to enforce a strict code review and artifact layouting preventing such degradation but it’s not easy and it’s time consuming,

I think my above example demonstrates this is something that is not unique to monorepos. The level of abstraction that VCS' operate at is not ideal for code-level dependency concepts.

> Build time

Most build systems support caching. Some even do it transparently. Docker's implementation of build caching has, in my experience, been lovely to work with.

---- Multi repo section ----

> In case your release flow involves several components - it’s always a real pain.

This is doubly or tripply true for monorepos because the barrier of cross-service refactors is so low. Due to a lack of good rollout tooling most people with monorepos release everything together. I know my CI essentially does `kubectl apply -f`. Unfortunately, due to the nature of distributed compute, you have no guarantee that new versions of your application won't be seen by old versions (especially so of 0-downtime deployments like blue-green/red-black/canary). Because of this you constantly need to be vigilant of backwards compatibility. Version N of your internal protocol must be N-1 compliant to support zero-downtime deployments. This is something that new members of monorepo have a huge huge difficulty working with.

> It allows people to quickly build independent components,

To start building a new component all one must do is `mkdir projects/<product area>/<project name>`. This is a far lower overhead than most multi-repo situations. You can even `rm -r projects/<product area>/<thing you are replacing>` to completely kill off legacy components so they don't distract you while you work. The roll out of this new tool whet poorly? Just revert to the commit before hand and redeploy and your old project's directories, configs, etc are all in repo. Git repos present an unversioned state that inherently can never be removed f you want a source tree that is green and deployable at any commit hash.

--- Their solution ---

I accomplish the same tasks as a directory structure. As mentioned before if you just put your code into a `projects/<product area>/<project>` structure you can get the same effect they are going for by minimizing the directory layout in your IDE's file view. The performance hit from having the entire code base checked out is very much a non-issue for >99% of us. Very very few of us have code bases larger than the linux mainline and git works fine for their use cases.

Also, any monorepo build tool like Bazel, Buck, Pants, and Please.build will perform adequately for the most common repo sizes and will provide you hermetic, cached, and correct builds. These tools also already exist and have a community around them.

[0] - https://docs.microsoft.com/en-us/azure/devops/learn/git/git-...




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: