What you describe is microservices developed in a monorepo, and a lot of compani...

jurre · on Sept 17, 2020

I think the key difference here is that there is no network in between components in a componentized monolith, each component runs the entire “monorepo”

dodobirdlord · on Sept 17, 2020

Whether there’s actually network between components is something a platform team can handle based on their best judgement. Having collections of containers that always run together is a common pattern.

pbourke · on Sept 17, 2020

Certainly, but such a system is not a monolith. A core trait of the monolith is that there are no network calls between components.

inopinatus · on Sept 18, 2020

This negative-space definition of "monolith" is unhelpful to the point of irrelevance. It's unreasonable, in the sense that adopting it gives us nothing to reason about, as with the comment above. By such a standard the last monolithic in-service system was a Burroughs mainframe ca. 1975. I've got statically linked binaries that would fail this definition.

Even the plainest Rails application depends on network traffic, including to communicate with parts of itself. It cannot function without an operating system, which is also talking to parts of itself via network protocols, and this runs on a server whose internal buses are themselves a distributed system.

It's networks, all the way down, and a heads-in-the-sand attitude doesn't help us reason about performance, reliability, scalability, maintainability et cetera.

Put this in a "Falsehoods programmers believe ...": calling a stateless function in a stack-based language to compute an immutable result won't lead to a network call.

Monolithic applications are defined by something they are, not something they don't do, and what they are is a single unit of code for development and deployment purposes that includes everything necessary to fulfil an entire system's purpose. The issue of intentionally crossing a network boundary, and when, and why, is an dependent topic in comparative systems architecture, but it's analytically orthogonal.

thebean11 · on Sept 17, 2020

Is there really that big of an advantage to avoiding the network boundary though?

nthj · on Sept 17, 2020

Absolutely:

* Avoid network and JSON serialization overhead

* Perform larger refactorings or renamings without considering deployment staggering or API versioning

* testing locally is far easier

* Debugging in production is far easier

* Useful error stack traces are included for free

* Avoid (probable in my experience, at least in larger security software organizations) dependency on SecOps to make network changes to support a refactoring or introducing new components

If an organization is or will pursue a FedRAMP certification, as I understand it, that organization must propose and receive approval every time data may hit a network. Avoiding the network in that case may be the difference between a 50-line MR that's merged before lunch and a multi-week process involving multiple people.

gen220 · on Sept 17, 2020

FWIW, I think that gRPC/protobufs have pretty compelling answers to each of the historically-valid complaints you've listed here.

- cpu cycle overhead: this is valid if the overhead is very high or very important. otherwise, most companies would love to trade off cpu cycles for dev productivity.

- refactorings/renamings without deployment staggering. protobufs were specifically designed with this in mind, insofar as they support deprecating fields and whatnot. However, writing a deprecatable-API is a skill, even with protos. If you have many clients and want to redo everything by scratch, you will have problems.

- "testing locally" (which I take to mean integration testing locally) is the only one that requires some imagination to solve, assuming all your traffic is guarded by short-term-lease certs issued by vault or something similar. But even this is quite achievable.

- error stack traces included for free: may I introduce you to context.abort(). It's not a stack trace by default, but you can actually wrap the stack trace into the message if you so-care to. opentracing isn't quite free, in a performance sense, but in a required-eng-time-to-setup-and-maintain-sense, it is pretty cheap.

- dependency on secops to make network changes: I've never encountered this, but I bet you that a good platform team can provide a system where application teams effectively don't need to worry about this. It's impossible to overcome this challenge in an existing company that's used to doing things this way, though.

mpweiher · on Sept 18, 2020

> cpu cycle overhead

The original poster's point was CPU and network overhead. A local procedure/function call or message-send takes on the order of one or up to a few nanoseconds. Depending on how you organize things, an IPC is going to be in the microsecond or even millisecond range. That's a lot of orders of magnitude. It's also latency that you just aren't going to get back, no matter what extra resources you throw at it. [1][2]

In the early naughties, a rewrite of very SOA/microservice-y BBC backend system I re-architected as a monolith became around 1000x faster. [3]

In addition, in-process calls are essentially 100% reliable. Network calls, and various processes attached to them, not so much (see [1], again). The BBC system not just became a lot faster, it also became roughly 100 times more reliable, and that's probably low-balling it a bit. It essentially didn't fail for internal reasons after we learned about Java VM parameters. And it was less code, did more, and was easier to develop for.

[1] https://en.wikipedia.org/wiki/Fallacies_of_distributed_compu...

[2] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.115...

[3] https://link.springer.com/chapter/10.1007%2F978-1-4614-9299-...

gen220 · on Sept 18, 2020

Ah gotcha, thank you for locking-in on the issue. You're absolutely right that network hops introduce overhead (I was intending to wrap i/o blocking on network calls under the banner of cpu cycles, adjacent to serialization)

Like any other design decision, there's a trade-off here. (see my other comments in this tree, about how many 9's in reliability/latency you're targeting).

If you're working in an environment where sub-5ms latency to the 4th or 5th 9 is critical, inter-machine communication is not for your application, period.

Reliability, as an orthogonal concern, is one that has improved incredibly since the early aughts. The "transport" and error-handling layer of open-source RPC frameworks has improved by orders of magnitude. I'd recommend taking a long look at the experiences of companies built on gRPC.

It's much easier to build a reliable SOA-esque system today than it was even 5 years ago. It's been an area of rapid progress.

mpweiher · on Sept 21, 2020

Yes, obviously these are trade-offs.

However, I find the way you framed these trade-offs decidedly...odd, in terms of "who needs that kind of super-high performance and reliability????", as if achieving that were only possible through herculean effort that just isn't worth it for most applications.

The fact of the matter is that a local message-send is also a helluva lot easier than any kind of IPC. Also easier to deploy, as it comes in the same binary so is already there and easier to monitor (no monitoring needed).

So the trade-off is more appropriately framed as follows: why on earth would you want to spend significant extra effort in coding, deployment and monitoring, for the dubious "benefit" of frittering away 3-6 orders of magnitude of performance and perfect reliability?

Of course there can be benefits that outweigh all these negatives in effort and performance/reliability, but those benefits have to be pretty amazing to be worth it.

gen220 · on Sept 22, 2020

> as if achieving that were only possible through herculean effort

I encourage you to reread my comments, I'm not suggesting anywhere that high-performance requires exceptional effort.

In fact, I'm actively admitting that for applications where high-performance is required, IPCs/RPCs are not an option.

> just isn't worth it for most applications

Performance is valuable, but it's one dimension of value.

My premise is that, given the maturity of RPC frameworks and network tooling in 2020, most already-networked applications can afford to trade the performance hit of additional hops on the backend.

Whether what you get in exchange for that performance hit is valuable?

That is mostly a function of the quality of your eng platform.

> a local message-send is also a helluva lot easier [on the programmer?] than any kind of IPC

This strongly depends on your engineering org, although it seems like this is the point that's hardest to imagine for some people.

If you're on a team that depends on the availability of data maintained by N other teams,

(given the maturity of RPC Frameworks and network tooling in 2020, again)

It is much easier to apply SLOs and SLAs to an interface that's gated by an RPC service.

> spend significant extra effort in coding, deployment and monitoring

The extra effort here is made completely negligible by the existence of a decent platform team.

FWIW, I wouldn't be able to imagine it if I haven't experienced it myself.

> benefits have to be pretty amazing to be worth it

I still think you're overestimating some of the costs (see above). FWIW, I've worked in an RPC-oriented environment for years now, and reliability has never been a concern. Our platform team is pretty good, but we are not a Google-esque company (200 engineers, including eng managers)

The performance trade-off has been demonstrably worthwhile, because we've used it to purchase a degree of team independence that would not have been otherwise possible.

mpweiher · on Sept 22, 2020

>In fact, I'm actively admitting that for applications where high-performance is required, IPCs/RPCs are not an option.

But you're framing it as "...for applications where high-performance is required", as if taking the performance, expressiveness and reliability hits should obviously be the default, unless you have very special circumstances.

My point is, and continues to be, that it's the other way around: you should go for simplicity, reliability and performance unless you have, and can demonstrate you have, very special requirements.

lmm · on Sept 18, 2020

Thrift or protobuf is a huge step up from the alternatives, but you still have a lot of overhead. Generics are limited and you're essentially forced to "defunctionalise the continuation" everywhere: any time you want to pass a callback around you have to turn it into a command object instead.

gen220 · on Sept 18, 2020

I don't disagree with you, this actually sounds like the beginning of a super interesting conversation.

Can you share some examples of the generics problem and "defunctionalizing the continuation"?

Does google's `any` package help with the generics problem you describe? (Acknowledging that it's obviously clunky)

lmm · on Sept 20, 2020

> Can you share some examples of the generics problem and "defunctionalizing the continuation"?

Well, the generics problem is that you don't have generics. So you just can't define a lot of general-purpose functions in gRPC, and have to make a specific version of them instead. Even something like "query for objects like this and then apply this transform to the results" just can't be done, because there's no way to pass the transformation over the wire, so you have to come up with a datastructure to represent all the transformations that you want to do instead. "Defunctionalizing the continuation" is the technique for doing that, https://www.cis.upenn.edu/~plclub/blog/2020-05-15-Defunction... is an example, but it's a manual process that requires creative effort each time.

> Does google's `any` package help with the generics problem you describe? (Acknowledging that it's obviously clunky)

Not really, because you don't have the type information at compile time. Erased generics are fine in a well-typed language, but just using an any type you can't even do something like: a function that takes two values of the same type.

vosper · on Sept 18, 2020

People who are downvoting the parent comment: I’d love to know why? I won’t claim expertise here, but it doesn’t strike me as clearly incorrect.

closeparen · on Sept 17, 2020

How are you getting around API versioning with independently deployable components?

sokoloff · on Sept 18, 2020

If you call a piece of functionality from your own single deployable that you are refactoring, it’s much more like refactoring a function call than if it were an independent micro-service across a network.

heavenlyblue · on Sept 17, 2020

What is a network?

Spivak · on Sept 18, 2020

Any application boundary that requires that you serialize your calls/requests to the other service/component in some form.

Any form of IPC basically.

gen220 · on Sept 17, 2020

I think there used to be, before "off-the-shelf" RPC frameworks, service discovery, and the like were mature. There still are, for very small companies.

In 2020, if you have an eng count of >50: you use gRPC, some sort of service discovery solution (consul, envoy, whatever), and you basically never have to think about the costs of network hops. Opentracing is also pretty mature these days, although in my experience it's never been necessary when I can read the source of the services my code depends on.

Network boundaries are really useful for enforcing interface boundaries, because we should trust N>50 programmers to correctly-implement bounded contexts as much as we trust PG&E to maintain the power grid.

That being said, if you have a small, crack team, bounded contexts will take you all the way there and you don't need network boundaries to enforce them.

twunde · on Sept 17, 2020

It depends on your speed requirements and whether calls are being sent async or not. Also keep in mind that even with internal apis, an api call is usually multiple network boundaries (service1 --> service2 (potential DNS lookup) --> WAF/security proxy --> Firewall --> Load balancer --> SSL handshake --> server/container firewall --> server/container). Then you get into whether the service you're calling calls other apis etc. You can quickly burn 50ms or more with multiple hops. If you're trying to return responses within 200ms you now have very little margin.

gen220 · on Sept 17, 2020

Acknowledging that there are indeed many hops, I think it might be a bit disingenuous to say 50ms is easy to burn, depending on what p-value we're talking about.

IIRC, a typical round trip service call at my current place of work (gRPC with protobufs, vault/ssl for verification, consul for dns, etc) carries a p99 minimum latency (i.e. returning a constant) of around 2ms.

A cold roundtrip obviously takes longer (because DNS, ssl, etc).

It depends on how many 9's you want within 10ms, but there are various simple tricks (transparent to the application developer) that a platform team can apply to get you there.

As a sidenote on calling other APIs, my anecdata suggests that most companies microservice call graphs are at most 3-4 services deep, with the vast majority being 1-2 services deep.

This doesn't show the call graph, but it does demonstrate how many companies end up building a handful of services that gatekeep the core data models, and the rest simply compose over those services: https://twitter.com/adrianco/status/441883572618948608/photo...

jurre · on Sept 17, 2020

It depends, but it means you don’t have to serialize/deserialize data, deal with slow connections, retries, network failures, circuit breakers etc

djohnston · on Sept 17, 2020

I agree 100%. It gives you the boundaries but also the whole world maps to a single revision in VCS