As an ardent fan of monoliths and how they generally avoid such tar pits, I have...

OkayPhysicist · on Aug 10, 2023

> Or is it time someone started work on a distributed operating system?

For stuff like this, we've had it since the 80's: Erlang's (and it's sleek offspring, Elixir) BEAM VM. A virtual machine with concurrent, parallel, and distributed systems in mind? Check. A standard library containing batteries-included solutions to most design and technical challenges you'll run into while building such systems? Check. Tooling for stuff like deployments, diagnostics of running systems, and the ability to pull open a REPL for hands-in-the-meat debugging? Check, Check, Check.

Sylamore · on Aug 10, 2023

That's kind of what HP NonStop is, a distributed operating system operating as a huge cluster.

If you followed their coding practices and used their native libraries you could almost always do things like freeze a process, move it to an entirely different CPU (which could be a totally different physical server) then restart it without losing the work in progress, processes could auto-restart and resume from the last checkpoint, add more processes to handle the messages in the queue and all kinds of built into the OS and layered services niceties that everyone keeps reinventing.

raincole · on Aug 10, 2023

> So do we all have to keep reinventing these wheels, but only after a production outage?

We live in a world where programmers' "consensus" is that checked exceptions are bad and we need to remove them from Java. People generally just don't care anything except the happies path.

speed_spread · on Aug 10, 2023

That's not the reason to remove checked exceptions from Java. The reason is that exceptions don't compose - they don't play well with functional style code which otherwise works pretty well in Java.

Groxx · on Aug 10, 2023

Yeah. Checked exceptions are fine - it's just an early returning version of error/result-ADTs (which are widely used and loved).

Java's implementation isn't generic over exception types, so you're forced to make Bad Decisions constantly.

PhilipRoman · on Aug 10, 2023

Fun fact: you can actually parameterize functions and types over exception types in Java, <T extends Throwable> ... throws T will type check as expected.

Of course, if you solve a problem using generics, you will now have 2 problems instead of 1...

Groxx · on Aug 10, 2023

I have never tried that, thanks!

Does it work for a variable list of things, or would supporting two different checked exceptions require two generic params? If it's one per param I think that's neat but probably limited in use to "your exception can contain a generic value", like `throw NewInsertFailedException(value)`.

PhilipRoman · on Aug 10, 2023

I think you might be able to trick the type checker to accept union types, but I'm not sure. I know intersection types are possible, but they are not really useful for exceptions.

Regarding practicality - I've used this feature when implementing a visitor class API, so it definitely has some use cases.

The biggest problem is that all it takes is a single method in the chain which does not support this pattern (think java.util.stream). For internal code it's pretty easy to decorate all functions that take callback lambdas, etc.

Thiez · on Aug 10, 2023

And stuff like the Stream api does not use these generics, so you end up wrapping exceptions in RuntimeException anyway, which... again defeats the point of checked exceptions.

jerf · on Aug 10, 2023

"So do we all have to keep reinventing these wheels, but only after a production outage?"

Lotta cynical replies, and mine is going to sound like one of them at first, but I actually mean it in a relatively deep and profound way: Time is hard. You can even see it in pure math, where Logic is all fun and everyone's having a great time being clever and making all sorts of exciting systems and inferences in those systems... and then you try to build Temporal Logic and all the pretty just goes flying out the door.

Even "what if the reply takes ten seconds" is the beginning. By the very nature of the question itself I can infer the response is expected to be small. What if it is large? What if it might legitimately take more than ten seconds to transfer even under ideal circumstances, but you need to know that it's not working as quickly as possible? Is your entry point open to the public? How does it do with slowloris attacks [1]? What if your system simply falls behind due to lack of resources? The difference between 97% capacity and 103% capacity in your real, time-bound systems can knock your socks off in ways you'd never model in an atemporal system that ignored how long things take to happen.

Programming would be grungy enough even if we didn't have these considerations, but I'm not even scratching the surface on the number of ways that adding time as a real-world consideration complexifies a ton of things. Our most common response is often just to ignore it. This is... actually often quite rational, a lot of the failure cases can be feasibly addressed by various human interventions, e.g., while writing your service to be robust to "a slow internal network" might be a good idea, there's also a sense in which the only real solution is to speed up the internal network. But still, time is always sitting there crufting things up.

One of my favorites is the implicit dependency graph you accidentally start creating once your business systems guys start doing "daily processes" of this and that. We're going to do a daily process to run the bills, but that depends on the four daily dumps that feed the billing process to all have been done first. By the way, did you check that the dumps are actually done and not actually in progress as you're trying to use them? And those four daily dumps each have some other daily processes behind them, and if you're not very careful you'll create loops in those processes which introduce all sorts of other problems... in the end, a set of processes that in perfect atemporal logic land wouldn't be too difficult to deal with becomes something very easy to sleepwalk into a nightmare world, where your dump is scheduled to run between 2:12 and 2:16 and it damned well better not fail for any reason, in your control or out of it, or we're not doing billing today. (Or even the nightmare world where your dump is scheduled to run after 3pm but before 1pm every day... that is, these dependency graphs don't have to get very complicated before literally impossible constraints start to appear if you're not careful!) Trying to explain this to a large number of teams at every level of engineering capability level (frequently going all the down to "a guy who distrusts and doesn't like computers who, against his will, maintains a spreadsheet, which is also one of the vital pillars of our business") is the sort of thing that may make you want to consider becoming a monk.

[1]: https://en.wikipedia.org/wiki/Slowloris_(computer_security)

namaria · on Aug 10, 2023

I believe that, in terms of firm theory and how technology plays into organization side, we're reaching the limits of current paradigms. Over the last three to four decades, transactions costs grew (more regulations on personal data, more complicated cross-borders contracts as services became dominant in most economies - free trade agreements typically cover goods but not services) while coordination costs fell (most business facing software can now be used as a metered service in the browser). This favored growing corporations.

I've seen in my lifetime conglomerates fall out of favor ('synergies' failed to materialize) and then rise up again but this time in the computer technology sector - are you in the Apple, Microsoft or Google corporate tech garden?

But now interest rates are back and investors can't just park wealth in businesses that just grow revenue but not profit. So ballooning complexity can't just be dealt with by throwing bodies (and pay raises) at the problem anymore.

I hope this leads to more niche player offerings and less saas where small local outfits are just independent sales outfits for cloud borgs.

klooney · on Aug 9, 2023

It's more work, is the simple answer.

jiggawatts · on Aug 9, 2023

Once. It's more work once instead of over and over. That's the point of operating systems, standard libraries, and modules!

I see this weird back-lash in modern development against having common, standard platforms. I suspect it comes from the Python and JavaScript world, where having "no batteries included" is seen as a good thing, instead of a guaranteed mess of dozens of half-complete incompatible frameworks.

I'm coming from the perspective of Windows and comparing it to, say, Azure or AWS. All three have some concepts of access control, log collection, component systems, processes, etc...

But all three are proprietary. Kubernetes goes a long way, but it isn't a user-mode system that can be directly accessed from code. Compare with Service Fabric, which has a substantial SDK component that integrates into the applications.

As an example, here's a really basic thing that is actually absurdly difficult to solve well: web application session state.

If you have sticky load balancing using cookies, then the session state is accessed on one VM something like 99.99% of the time... except for that 1% of the time when it isn't. This could be due to a restart, load rebalancing, or whatever.

If you put the session state into something external like Redis, then a zone-redundant deployment will eat a ~1ms delay on every page render, every time.

Service Fabric uses a model where it keeps three replicas of the state: one in the original web server, and two replicas in elsewhere. This way, reads are in process on the same VM most of the time, resulting in nanosecond latencies. Writing the state can occur asynchronously after the page response is already being sent.

I'd like to see concepts like this, along with all sorts of service-to-service communication patterns, consolidated into an "operating system like platform" designed for the mid-2020s clouds instead of 1990s server farms.

lstamour · on Aug 9, 2023

We’re getting there but it takes time to agree on what the best implementation of a reinvented wheel looks like? A good example is OpenTelemetry, which is an obvious idea in hindsight but looks like it will take about a decade to ship.

Or how we move the goalposts when we reach a goal, for example Kubernetes standardized certain aspects of cloud but now that we have that, instead of celebrating we bemoan its complexity and lack of utility at solving actual application or organization challenges such that we still need to use cloud APIs plus container images plus all this other complexity. But hey, we did solve the problem of distributing code to run on machines, it’s just in hindsight it doesn’t seem like it was that hard? We adjust pretty quick to the “new normal” when it’s not even a decade yet since Docker and Kubernetes appeared on the scene.

intelVISA · on Aug 10, 2023

the main critique of k8s is that its poor design adds complexity to an already challenging concept (for non CS folks)

thraxil · on Aug 11, 2023

> I see this weird back-lash in modern development against having common, standard platforms. I suspect it comes from the Python and JavaScript world, where having "no batteries included" is seen as a good thing, instead of a guaranteed mess of dozens of half-complete incompatible frameworks.

Kind of odd to have Python included there as Python's motto for years was (is?) literally "Batteries Included".

https://en.wikipedia.org/wiki/Batteries_Included

thevagrant · on Aug 10, 2023

" I see this weird back-lash in modern development against having common, standard platforms. "

I think it has always been that way. It comes down to personality types. Many devs I've met think that the implementation they wrote themselves is simpler and easier to understand vs learning a platform api or existing library.

They tend to shrug off when I point out security or other potential problems

parasti · on Aug 10, 2023

At least in web development rolling your own is usually the pragmatic choice. It won't break opaquely upon update, you can fix it yourself, it only does what you need. Library and platform updates have a much higher chance of breaking something because of the large impact surface, feature updates being conflated with security updates, insufficient testing, and such breakages are much harder to resolve because they are a black box to you. Really nothing to do with personalities.

klooney · on Aug 10, 2023

In fairness, there are also a lot of frankly bad platforms and all encompassing frameworks.

kneebonian · on Aug 9, 2023

> Or is it time someone started work on a distributed operating system? Vaguely like Kubernetes but full-featured?

May I introduce you to inferno or Plan 9 my friend. Elegant OSes from a more civilized age.