Does this story seem kinda…fake…to anyone else? Like, obviously companies do sometimes make decisions this stupid, but the way this is written seems a little too carefully optimized to make for a morality play of the kind HN enjoys. (And there's a potential motive, since there's a whole bunch of links to paid books and such, somewhat clumsily tied to the main narrative.)
> “It’s not complex if you do it right. Netflix — “
> “WE’RE NOT NETFLIX!” I finally snapped. “Netflix has 500 engineers. We have 4. Netflix has dedicated DevOps teams. We have one guy. Netflix has millions of users. We have 50,000.”
Then
> Lesson 5: The Monolith Isn’t Your Enemy
> A well-structured monolith can:
> Scale to millions of users (Shopify, GitHub, Stack Overflow prove this)
Because Shopify, Github and Stack Overflow have 4 engineers each as well.
It kind of seems real because it reads like the it's written by the kind of person that would make high level arch decisions without even understanding what the f they are doing.
This criticism seems like a complete non-sequitur to me. They didn't claim that Shopify, Github and Stack Overflow scaled to millions of users with 4 engineers each. Is the implication that, because Netflix and those companies both had to hire more engineers to scale, the decision between monoliths and microservices has no impact on a 4-person team? I genuinely don't understand what you're trying to imply.
Based on my experience microservices do introduce additional fixed costs compared to monoliths (and these costs can be too expensive for small teams), so everything you've quoted makes complete sense.
In the interest of helping you understand what I was saying, the two quotes are completely contradictory (even if the base argument is correct/valid).
The first one says we shouldn't follow Netflix's example because it is a massive company with an enormous team. The second one says we should follow the example of these companies instead, while ignoring that they are also a huge company with a massive team.
So the criticism/joke stems from the logical inconsistency between the two. The fact that you stopped with microservices, using a rant about Netflix, while at the same time lauding monoliths, using companies of similar scale as examples, highlights your lack of understanding of using team scale as a reason to pursue either alternative. Dealing with such a person in management is common where they often contradict their own reasoning and pick whatever they fancy at that time. You cannot argue logically when the system changes are not based on objective standards but subjective standards, where you can be wrong for one thing but they can be right for the same thing.
That's why it seems like the person making the decisions is lost in terms of the choices they're making.
Interesting. If I only look at the lines you quoted I can see how you arrive at your interpretation of those two quotes. But if I read them in the industry context, they are a concise response against common arguments for microservices. I'll explain the line of argumentation as I understand it.
- We know that full rewrites are expensive & can kill growing companies, so it's best to start with an architecture that you can keep as you scale
- Common argument for microservices: They scale best, look at Netflix etc.
- Counter argument 1: Netflix has a large team, and microservices add fixed complexity that can kill small teams
- Counter argument 2: Monoliths can also scale (see examples)
That was my initial understanding as someone who has had these discussions before. I don't think I'm adding any arguments, my first point is pretty much universally accepted and known. The author is just assuming a certain level of industry knowledge.
The problem is the article isn’t coherent around this point, because it uses scale vaguely. If you look at the pitch the thing they focus on is _failure domain isolation_ but then the article immediately pivots to how attractive scaling is. Failure domain isolation doesn’t contribute to scale in the performance sense, it can tenuously be tied to scaling teams but that wasn’t part of the pitch.
In fact, I don’t think “scale” is ever part of the pitch of micro services. Independent scaling maybe if you have some particular hot spot. But the real pitch for micro services is and always has been about isolation. Isolating failure domains, teams and change management. That’s been the story since the Bezos letter and if the leadership didn’t understand that it’s a leadership skill issue. Not an architectural problem.
So this is a story about bad technical leadership, not a particular architecture. And if anything the initial pitch by the architect is the most technically valid leadership in the story (as poor as it is). They failed to understand the problem space but at least they identified what problem the architecture would solve. The rest of engineering leadership did the classic pointy haired boss thing of not listening and hearing what they wanted. They paid for it.
> That was my initial understanding as someone who has had these discussions before. I don't think I'm adding any arguments, my first point is pretty much universally accepted and known. The author is just assuming a certain level of industry knowledge.
Or they're just bad at communicating and likely decision-making as well. I would say you're giving the author too much credit to be honest but I get your point. It's a poorly-written article in general imo.
It's amazing what modern hardware can do when used correctly.
Consider moving to micro-services only AFTER reasonable algorithms on commodity bare metal show real capacity limits. There's still higher spec bare metal to carry said designs through a refactor / expansion based on where the performance bottlenecks are. Even absent literal micro-services there's still partitioning / sharding which can spread out many of the pain points.
Early Stack Overflow, in 2008, was an effort of mostly one engineer (Jeff Atwood) and scaled on that single man power to the tens of thousands of users.
I think you're getting caught up on the wrong thing here. The issue isn't that monoliths can scale. The issue is the inherent flaw in their logic within the confines of the post.
Netflix also didn't appear into existence with an army of engineers. It also scaled from a few engineers to what it is today. Which means you can scale using both depending on your setup. That cannot be the reason why you pick one arch over another. The reason has to do with your own setup, your company, and the application's specific context, all of which the author is missing.
Broadly, it feels like decision making without context/understand the why behind the decisions.
The specific comment has nothing to do with how Github or Stack Overflow scaled etc.
The comment is not based on Netflix or Stack Overflow. Just the person making the decision not having any consistent basis in their reasoning. There is a reason why they are self-admittedly not very good at their job.
I have explained it enough, not sure what you're missing at this point.
We are talking with audience reading our comments. You think you are talking to me but you are talking to much wider audience.
Everyone, including me, are making some assumptions when we are writing something. When it is technical, it is easy to verify - obtain source code, run it, check results. When it is somewhat managerial, it is much harder to verify.
For example, original post emphasizing Stack Overflow being scaling monolith (pun intended) may refer to the point of time when SO were run by basically one man, yet scaling.
You dismiss it, that's okay. You do not answer my or OP points, that's okay too.
Our readers are smart enough to judge OP (and ours) points on their merits.
> Our readers are smart enough to judge OP (and ours) points on their merits.
If we go solely by that criterion, the upvotes on the original comment do prove that people are smart enough to judge the points on their merits. It was good talking to you and to a much wider audience through you.
Agree on it being AI, but what really screamed "AI slop" to me was the emojis. I don't know any tech bloggers who use emojis, but everything that ChatGPT or Gemini generates always has too many emojis.
This article looks like a giant stack of bullshit, trying to surf the wave of trendy topics.
If you are small and not have scaling problems, it is highly unlikely that you see a real difference between monolith or microservice except on the margin.
But lots of things look off in the article:
Billing needed to
...
Create the order
What? Billing service is the one creating the orders instead of the opposite?
Monday: A cascading failure took down the entire platform for 4 hours. One service had a memory leak, which caused it to slow down, which caused other services to time out, which caused retries, which brought everything down. In the monolith days, we would’ve just restarted one app. Now we had to debug a distributed system failure.
Hum, they could have restarted the service that failed, but if they had a leak in their code, even being a monolith the whole app would have gone done until the thing is fixed even constantly restarting. And I don't imagine the quality of your monolith service that is constantly restarting in full...
Finally it claims that Monday their service started to be slow, and already Wednesday the customer threatened to leave them because of the service to be slower. Doesn't look like to be a customer very hooked or needing your service if only after 2 days of issues they already want to leave.
Also, something totally suspicious is that, even if small or moderate size of company you could still have people push some architecture that they prefer, no company with a short few months cash runaway will decide to do a big refactor of the whole architecture if everything was good on the first place and no problem encountered. What will happen in theory is that you will start to face a wall, degrading performances with scale of something like that and then decide that you will have to do something, a rework. And then there will be the debate and decision about monolith, microservice, whatever else...
The mistake here is having an architect who is not shipping product. Architects who's job it is to define 'rules' and 'patterns' without actually impending anything are almost always a bad idea. Just focus on shipping. Have at least one experienced engineer who can guide the development but don't give those decisions over to some 'architect' who is not even going to write 10 lines of code in your codebase
> We had 4 backend developers and a DevOps guy who was already stretched thin.
The mistake here was having an architect full stop. The team is too small, a good tech lead can manage to plan a service with 50k MAU (and way beyond) without an architect. The problem with some companies that get millions in seed funding is that they need to spend the money and they do so by adding roles that shouldn't exist at that stage.
Another favourite antipattern: making devops a bottleneck. Don’t over-engineer production, don’t buy abstraction you can’t afford, and educate your colleagues to lower the bus factor.
Dedicated devops that aren’t co-founders are notorious for cv optimizing: working with cool, but time-consuming stuff they don’t yet master, at the cost of delivery-time risk.
One thing I’ve learned is that you should be wary of spending too much time on things that customers don’t see. Customers don’t care about backend engineering unless it results in benefits they can actually see, and if you spend too long on invisible features they’ll think your platform is stagnant and move somewhere else.
> ...you should be wary of spending too much time on things that customers don’t see
I don't think this is entirely true because there are some things that will help you ship faster like good architecture and a system design that is as simple as possible. These are worth investing, despite their obscurity to the end user, because doing it well can result in a faster pipeline and more stability.
Premature distribution killed the startup, not microservices. You split the system before the boundaries were real, paid the tax in latency and coordination, and skipped the hard parts that make it viable: event-driven boundaries, local read models, and boring failure handling and comprehensive logging. Start with a modular monolith, earn your boundaries, then extract.
Ask your coworkers how many of them got any formal training in distributed systems in college. You’re going to find out it’s not many. So far I haven’t found anyone who didn’t go to Berkeley or UIUC. WTF is going on with universities?
Ironically posted on Medium, which showed me the text, then blanked the whole screen to replace the text with light grey polyfills, and then showed me the same text again... several seconds later.
That's because Medium is a bunch of APIs and (micro) services, not a monolith like it should be.
Heck, it could be plain static HTML because it's just text for crying out loud!
Instead, it uses a GraphQL query through JSON to obtain the text of the article... that it already sent me in HTML.
Total page weight of 17 MB, of which 6.7 MB is some sort of non-media ("text") document or script.
This is user-hostile architecture astronaut madness, and is so totally normal in the modern internet that nobody even bats and eye when text takes appreciable amounts of time to render on a 6 GHz multi-core computer with 1 Gbps fibre Internet connectivity.
Your customers hate this. Your architects love it because it keeps them employed.
Those grey loading placeholders for text are called skeleton loaders BTW, polyfills are libraries used to support newer browser APIs in older browsers and not something you can exactly see on a website (without checking the devtools)
A simple modern Dotnet monolith with Postgres on a Linux server could deliver a much better end user experience, and it probably would take a lot less server resources than the current mess.
I tried to explain this to a team that eventually lost their customers to competitors who could generate less interesting pages far cheaper per request. Instead they went off on a two year jag trying to cache page sections.
You know a team has lost the architectural plot when their answer for all performance problems is more caching. And once you add caching it’s hard to sell any other sort of improvements because the caching poisons the perf analysis.
Their solution took forever because the system was less deterministic than we even knew. They were starting to wrap it up when I went on a tear cleaning up low level code that was nickel and diming us. By the time they launched they were looking at achieving half of the response time improvement they were looking for, in twice the time they estimated to do so. And they cheated. They making two requests about 10% of the time, which made the p50 time into a lie, because two smaller requests pull down the average but not the cost per page load. But I scooped them and made the slow path faster, undercutting another 25% of their perf improvements.
I ended up doing more to improve the Little’s Law situation in three months of working on it half time than they did in two man years. And still nothing changed. They are now owned by a competitor. That I believe shut down almost all of their services.
Monoliths vs. microservices has nothing to do with server-side rendering vs. GraphQL. Architecturally monolithic Web apps use GraphQL all the time.
I'm not sure why Medium does the weird blanking thing but my guess is that it's because it's deciding whether to let you read the article or instead put up a paywall. There are a lot of SPA sites out there, many of which aren't particularly economical with frontend resources, and they generally don't do that unless they're trying to enforce some kind of paywall or similar.
I do in fact think it's pretty unlikely that any performance degradation you observed was directly caused by microservices, as opposed to changes that directly affected the frontend.
Monoliths generally server side render. Server side rendering is fast, consistent and performant, the state of the client won't get into wonky territory since they are a button click away from getting current, known good state from the server.
That’s not a microservice vs monolith thing. That’s a client-side single-page app vs server-side rendering thing. Although, granted, I more often see microservice architectures with single-page apps than with server-side rendering.
Pretty sure making a product that people don’t want killed your startup. This is like saying using Python vs Go killed your startup which is absurd (unless your startup is high frequency trading or something).
One of the tricks of the startup dance is attaching improvement to revenue. Any work that’s done to support new customers will get approved. And work that looks like a loss leader, such as to retain existing customers, they may lean on the business or support people to paper over.
I do not agree fully with this article, but it does give food for thought and have some valid points:
- don't blindly jump into a new architecture because it's cool
- choose wisely the size of your services. It's not binary, and often it makes sense to group responsibilities into larger services.
- microservices have some benefits, moduliths (though not mentioned in the article) and monoliths have theirs. They all also have their set of disadvantages.
- etc
But anyway, the key lesson (which does not seem like a conclusion the author made) is:
Don't put a halt to your product/business development to do technician only work.
I.e if you can't make a technical change while still shipping customer value, that change may not be worth it.
There are of course exceptions, but in general you can manage technical debt, archtectural work, bug fixing, performance improvements, dx improvements, etc, while still shipping new features.
Microservices solve people problem not technical. Till ~20 backend devs no point in moving to it. Monoliths are better in terms of performance, reliability and dev speed.
Microservices solve a logistical problem. Rob wants to push code every two days. Steve wants to push every three. Thom deals with business who wants to release at whim and preferably within a few hours. Their commissions and bonuses are not reduced by how much chaos they case the engineering team. It’s an open feedback loop.
As you add more employees they start tripping over each other on the differences between trunk and deployed. Thats when splitting into multiple services starts to look attractive. Unfortunately they create their own weather and so if you can use process to delay this point you’re gonna be better off.
Everyone eventually merges code they aren’t 100% sure about. Some people do it all the time. However microservices magnify this because it’s difficult to test changes that cross service boundaries. You think you have it right but unless you can fit the entire system onto one machine, you can’t know. And distributed systems usually don’t concern themselves with whether the whole thing will fit onto a dev laptop.
So then you have code in preprod you are pretty sure will work but aren’t completely sure. Stack enough “pretty sure”s over time and as team sizes grow and you’re gonna have incidents on the regular. Separate deployment reduces the blast radius, but doesn’t eliminate it. Feature toggles reduce it more than an order of magnitude, but that still takes you from problems every week to a couple a year. Which in high SLA environments still makes people cranky.
reply