Hacker News new | past | comments | ask | show | jobs | submit login
Using Java's Project Loom to build more reliable distributed systems (jbaker.io)
205 points by CHY872 on May 9, 2022 | hide | past | favorite | 91 comments



Cassandra already does this[1] (and a whole lot more besides to introduce chaos to the simulation), and just accepts that there's a huge penalty of pausing and scheduling threads.

I'm really looking forward to being able to use Loom to run much more efficient cluster simulations.

[1] https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-10...


What sort of performance characteristics does Cassandra see from its implementation? I was fairly impressed with the 2.5M context switches per second (a fairly random unit, I know) I saw from virtual threads.

Have you found that the approach has made it easier for you to ship improvements to Cassandra?


It's very slow, particularly since introducing byte weaving to introduce chaos in concurrency-sensitive places. Presently it takes a single thread in the region of twenty minutes to simulate a cluster from startup to shutdown, and to run a thousand linearisable operations (down from a couple of minutes prior to those changes). I haven't got any numbers on the rate of context switches per second, but it is certainly a great deal less than 2.5M/s.

This kind of approach is in my opinion a _requirement_ for work on distributed consensus protocols, which the Cassandra community is improving at the moment: the upcoming 4.1 release has some major improvements to its Paxos implementation that were validated by this simulation, and the following release is expected to contain a novel distributed consensus protocol we have named Accord, for supporting multi-key transactions.

So, I'm not sure it has affected the speed or ease of shipping improvements, rather than enabled a class of work that would have previously been impossible to do safely (IMO).


The FoundationDb team is working on distributed systems simulation testing via a new startup... Antithesis. See https://antithesis.com/

Stardog, enterprise knowledge graph platform, (stardog.com), is an early adopter of Antithesis and we've found it to be very helpful in HA Cluster, distributed consensus, etc.

Not a stakeholder, just a satisfied early adopter.


This is the first reference to Antithesis being used I've seen since it was announced.

Can you describe what their tools are doing for you and how you're integrating with them?


I feel like the unsung winner of Project Loom is going to be Clojure. Its already immutable first data structures, it should be relatively straightforward for the Clojure project to expose the benefits of Project Loom to their ecosystem, as a language its designed to fit well its execution model.


Loom threads have the same basic semantics as normal threads. There's nothing special about them that relates to immutability.


I think GP's point is that immutable objects (which Clojure uses "by default") can be easily used in concurrent settings. Immutable objects can be shared without concern between threads, whereas mutable objects may need to be copied or have their operations synchronized somehow.


> may need to be copied or have their operations synchronized somehow

Maybe through some kind of non-preemptive user-space scheduling you mean?


Right, but you'd still need to synchronize that with some concurrency primitives (like a mutex or semaphore) and that has the potential for bugs. Whereas on an immutable structure can't suffer from that problem, which can make concurrent programming easier to reason about.


> Right, but you'd still need to synchronize that with some concurrency primitives (like a mutex or semaphore)

Why? If you know your thread won't be interleaved with any other until you a well-defined point, how's that different to using a mutex or semaphore?


IIUC, Loom threads can be scheduled across multiple OS threads (much like go-routines), so there is (at least the possibility of) real concurrency going on.


That doesn’t solve the problem in parallel contexts, only for concurrent ones. But chances are you are not interested in running everything single-threaded.


I think a Clojure with first class legitimate actor semantics would be unreal. Clojure core will espouse async and tell you why actors are bad but man I love the actor model.


https://www.clojerl.org

Clojure for the Erlang VM

:)


Why should Clojure benefit more from Loom than other languages? I think it simplifies reactive/concurrent programming in any JVM language.


Same reason I always felt extremely comfortable with concurrent programming in F#. The immutable first principle makes it easier to reason about what the program is doing during concurrent execution.

Of course all projects will benefit from Loom. I am merely positing that Clojure in particular could leverage this both most quickly and to very deep positive effect, due to the nature of the language itself putting immutability first, which if I recall correctly they built up a pattern of concurrent execution around already.

Kotlin too will be be able to optimize their concurrency story.

I simply wanted to posit an observation I hadn't seen elsewhere on the web about Project Loom yet.


Clojure was explicitly designed to make reasoning about concurrency simpler (most data is immutable and can be referenced in multiple routines without concern; mutable reference types use the same data semantics, wrapped in transactions) with minimal performance impact (immutable data types are implemented as persistent data structures, sharing memory rather than naive strategies like copy-on-write). I don’t know that it’ll necessarily benefit more from Loom than other JVM languages, so much as it’s well positioned to benefit from concurrency improvements generally because its underlying design encourages code well suited for taking advantage of concurrency generally.

Edit to add: I hadn’t initially thought to speculate on the language itself taking advantage, but it occurred to me immediately after posting this comment. Another potential benefit is that many of its fundamental abstractions (most collection APIs are declarative) lend really well to making those operations concurrent, which is a common FP/lisp hypothetical but becomes more tangible if coroutines are less expensive. This isn’t dissimilar to how RDBMS query planners can provide wild performance improvements without any semantic changes, because SQL queries express “what” now “how”.


I’d think it would also make a huge difference for Akka.


This was linked in the article and is a fascinating read by itself: https://shipilev.net/blog/2016/close-encounters-of-jmm-kind


Discussed at the time:

Close Encounters of the Java Memory Model Kind - https://news.ycombinator.com/item?id=11955392 - June 2016 (65 comments)


Shipilev’s entire blog is worth reading. He’ll even email you back if you have questions.

I really enjoy his anatomy quarks series:

https://shipilev.net/jvm/anatomy-quarks/


Unlike goroutines, seems here you have control over the execution schedule for the virtual threads if you provide an executor. This is pretty great.

Think this will obsolete go over the next few decades.


I don't think it will. If everyone was clamoring for Java and settled on Go only because of goroutines, then sure, but I think Go was liked for a lot of reasons aside from that. I also don't often see people complain about wanting more control over the scheduler for Go (could be that I just miss those).

I'd be surprised if Go adoption plummeted because of this, but who knows, I sure don't have a crystal ball.


Sane concurrency is -one- of the reasons people reach for Go, and sure, that may no longer be a differentiator. But it's definitely not the only one I've heard people toss around (and, agreed, I've never heard anyone bemoan the lack of control of the scheduler). In fact, the introduction of virtual threads and no new memory semantics I think means it still fails one of the main benefits of goroutines (channels and default copying semantics); everything in JVM land by default is still going to use shared memory and default pass by reference semantics.

I think it's all a moot point though, as it basically just demonstrates the next iteration of Paul Graham's Blub Paradox. With every iteration of new improvements for the JVM it reinforces the belief of many that the JVM is the best tool for every job (after all, it now just got cool feature y they just now learned about and can use and OMG Blub-er-Java is so cool, who needs anything else?!), and reinforces the belief of many others that the JVM is playing catchup with other languages (it only just -now- got feature y) and there are often better tools out there.


Go structs are copied but not deeply; collections are inherently passed by reference and everything is mutable. Scala and Kotlin get immutability right and Java is getting there with unmodifiable collections and records.


I don't know that I'd say Scala gets immutability right in that it still provides you equal access to the mutable collections (and I have basically no experience with Kotlin), but I cede the point it's way better than either Go or Java here. I readily admit Golang gets this wrong, just, -slightly- better than Java. I'm coming from an Erlang background, and that's the main influence I'm looking at concurrency from; the JVM as a whole gives me a sad when it comes to helping me write correctly behaving code.


It might be equally accessible, but mutable collections require you import classes from scala.collection.mutable that have very different names, e.g., mutable.ArrayBuffer vs List.

The path of least resistance in Scala leads to immutable collections.


Channels are just blocking queues with weirder semantics.


At first I misread "wider" and agreed.

In Go/Kotlin, one can atomically take from/send to exactly one channel with the `select` call.

Then I agree with "weirder" as well, in the case of Go channels. A send on a nil channel blocks forever. Why?


Haha yep, it's those edge cases, to quote one of the many web pages documenting it:

  Channels exhibit the following properties:

    * send to a nil channel blocks forever
    * receive from a nil channel blocks forever
    * send to a closed channel panics
    * receive from a closed channel returns the zero value immediately
I have this bookmarked because I don't write Go enough to remember it by heart.


Not on the first go around. AFAIK, they are looking at exposing more of the internal scheduling API but that's likely not going to be a part of the initial release.

The executor services referred to in the blog are for the order of execution of tasks on the virtual thread pool. For a "virtualThreadExecutor" service, every task will get a virtual thread and scheduling will happen internally.

You can still use a fixed thread pool with a custom task scheduler if you like, but probably not exactly what you are after.


This blogpost does rely on plugging in an executor. While the API was removed, it’s one private variable away (documented in footnote). As you say, it seems like it’s an ‘on the way, but later’ thing - the last Loom preview I used (a while ago) actually had the API so when I started drafting this post I was unhappily surprised!


Right, custom schedulers aren't quite ready for release yet -- there's a small missing piece in the VM that's required to fully preserve the spec, and they need a lot of testing that we don't yet have -- so we decided to go ahead without them and add them later.


What’s the missing piece?


A technical detail. Monitors (synchronized) record their owners as the OS thread, which makes the VM not know whether the carrier or the virtual thread owns a monitor, as they both share the same OS thread.


For any future readers, I think this is the bug: https://bugs.openjdk.java.net/browse/JDK-8281642


IMO, they are focusing on the right thing here. Getting a good virtual thread API GA will be paramount in the decisions around scheduling and continuations in the future.

Maybe a little disappointing for low level nuts and other languages like kotlin, but the right move IMO. Virtual threads alone will be a huge benefit to the ecosystem. The other stuff will help, but won't have near the same impact.


Agreed. And the pluggable scheduling is still a key part of it, just doesn’t need to be a thing to get right out of the gate. I’m honestly mega excited that Loom is even a real thing that exists and that you can use.


Java moves a lot slower so I don't think it will obsoletes Go.

If anything, if Loom is great, then it will keep Go on its toes and hopefully Go will also evolved due to external pressure.


I beg to differ. Go has been moving at a glacial pace. Generics took forever to implement, aren't even feature complete (no type parameters on methods), and have no integration with the standard lib. Meanwhile Java is adding lots of new features and versions at a quick pace.


I've heard some variation of "$java_feature will make $language obsolete" for years now, most recently wrt kotlin/scala, and it's never held true. It's great for the people who use Java, but there are tons of reasons why other people use other languages.


I've noticed the same behavior.

Imitation can only get you so far. Java is changing, and in many cases for the better, by absorbing features from other languages. However, I still think several other languages do a better job curating features to fit a niche.

That said, Loom appears to be a serious upgrade for JVM languages. Now, if startup could get an order of magnitude faster...


Java record vs. Kotlin dataclass?


That was the recent incident, yeah. Records + Pattern Matching led to many saying you no longer need Kotlin, even here on HN.


Well, for a lot of people those things are enough to go back to Java and avoid having to rely on the Kotlin ecosystem (which is not free of problems).

At which point would you say Java has improved enough to catch up with Kotlin (supposing Kotlin does not also keep improving)? As a long-term user of Kotlin, I would say I would not reach out for Kotlin anymore for new projects. The last remaining big thing Kotlin gives is non-nullability, but with simple tools, Java also has that already.


I wouldn't reach for Kotlin for backend projects at all tbeh, since the ecosystem on that side is (relative to Java) immature and doesn't always play well with standard Java tools such as JPA. Non-standard tools are half-baked, inconsistently maintained and not ready for primetime. But for apps, like in mobile, the ecosystem is rich and I would prefer it over Java, especially with advances such as KMM and KotlinJS.

My point being, Kotlin vs Java isn't just about language features, it's about community, ecosystem, use cases etc.

(Fwiw, personally I prefer Kotlin because it's more expression oriented than Java.)


One of the unsung heroes of go is how goroutines sit on top of channels + select. Blocking and waiting on one queue is easy, blocking and waiting on a set of queues waiting for any to get an element is a good deal trickier. Having that baked into the language and the default channel data-structures really does pay dividends over a library in a case like this.

You can kinda do this with futures but I suspect it'll be wildly inefficient. I really hope Java get's something to fill this niche. We already have a menagerie of Queue, BlockingQueue, and TransferQueue implementations. What's a few more?


I guess the Structured Concurrency JEP below addresses the problems you'll get to solve. It'll enable things like AND / OR combinations of virtual threads which IMHO looks like a better way to solve this rather than having a special syntax for select.

https://openjdk.java.net/jeps/8277129

But frankly I'm afraid of how these changes affect garbage collection since more and more vthread stacks are going to be in the heap (I hope they are contemplating some form of deterministic stack destruction along with the above JEP).


> Having that baked into the language and the default channel data-structures really does pay dividends over a library in a case like this.

Kotlin coroutines have the bare minimum in the language, and implement the rest (e.g. channel, select, `go`/`launch`) in libraries. Could you explain what the dividends for Go are?


Channels on the JVM would be sweet. You can do the same with Futures, and its probably not even slower, but it is a lot more clunky. I suspect its never gonna happen, too big a change. Maybe Kotlin will do it.



Interesting. Not quite as clean as if it was a built-in keyword, but close enough. Seems this is still experimental and not available per default.


Go does not need 128MB of memory to run hello world in a container.

People don't pick up Go over Java because of goroutines, Java is still and will forever be an "enterprise" language behind many layers of abstractions.


The JVM is a master of gradually closing the gap. Over the last decade:

- Garbage collectors have required far less tuning; with G1, Shenandoah, ZGC it's likely that your application will need little tuning on normal sized heaps.

- modules, jlink etc allow one to build a much smaller Java application by including only those parts of the JVM that one needs.

- Graal native images are real. These boast a far lower startup overhead and much lower steady state memory usage for simpler applications.

Probably my counterexample of choice is this: https://github.com/dainiusjocas/lucene-grep - it uses Lucene, one of the best search libraries (core of Elasticsearch, Solr, most websites), which is notoriously not simple code, to implement grep-like functionality. In simple cases, they demonstrate a 30ms whole process runtime with no more than 32MB of RAM used (which looks suspiciously like a default).

The JVM is fast becoming a bit like Postgres... one of those 'second best at everything' pieces of tech.


FWIW, this criticism no longer applies to people using AOT compilation. From my macOS laptop:

     /usr/bin/time -l ./hello-world
    Hello World!
        0.00 real         0.00 user         0.00 sys
             3231744  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
                 841  page reclaims
                   1  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   2  voluntary context switches
                   4  involuntary context switches
            22395110  instructions retired
            18507246  cycles elapsed
             1294336  peak memory footprint
So "peak memory footprint" for hello world is 1.2 MB and it starts instantly.

Now, not everyone can/will use AOT compilation. It's slow to compile and peak performance is lower unless you set up PGO, plus it may need a bit of work in cases where apps assume the ability to do things like generate code on the fly. But Go can't do runtime code generation easily at all, and if you are OK with those constraints, you get C-like results.


AOT is not the norm nor ready. Graal is far far from being usable.


Low memory consumption also has a price in this case. On a 1 TB server machine guess which platform will have better throughput by far? Go’s GC will die under that load.

Writing a “hello world”-scoped microservice is a tiny niche.


Microservices are 100x more popular than app than need 1TB of memory.

Also have you proof that Go will die under large memory usage? It's FUD.


> Also have you proof that Go will die under large memory usage?

The Debian binary-tree test is designed to create a ton of allocations and stress the GC. Go comes in at 12.23 seconds with Java at 2.65 [0]

Discords famous article about moving a service off Go because of GC issues [1]

I think there is definitely enough evidence to suggest that Go’s GC does have performance issues and doesn’t give you the knobs to tune it.

[0] https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

[1] https://discord.com/blog/why-discord-is-switching-from-go-to...


One benchmark where Go is slower than Java, what about the others where Go is faster or as fast but use between 2 and 20x less memory, looks like the GC is not that bad after all.

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

As for Discord it was a specific use case, it does not means Go has specific GC issues overall. How Java would have compared against Rust? 100% sure it would have performed worse, but you can't say for sure that Java would have performed better, especially after a re-write, rewriting the Go Discord Go in Go could have fixed the issue, no one knows.


The others don't matter as we're talking about GC performance under load. Moving the goal post to benchmarks that don't involve putting the GC under load aren't relevant to the conversation.


So all memory in Go is managed by the GC but somehow other benchmarks don't matter, right ...

The JVM is probably heavily optimized for what a binary tree is doing, does not mean the JVM overall is better for all use cases.


The stack is not managed by the GC in the ordinary meaning so.. benchmarks that only allocate on the stack literally doesn’t matter.

And by “what a binary tree is doing” you mean like.. garbage collecting no longer used objects? Like, why is it hard to believe that the runtime on which perhaps the majority of serious, huge web services run (twitter, apple’s web services, but google as well are huge java shops), the likes of which handle 325,000 transactions per second (Alibaba) underwent a tremendous amount of engineering and in the GC category is definitely the queen?


Go has an advantage in the niche cases where you can get away with value types. But that is a very rare use case, reminiscent of embedded programs. You almost always need heap allocations, especially for long running, large apps — and Java has the state of the art GC implementation on both throughput and low-latency front.


Well your "niche" use case is one of the reason why Go uses less memory than Java most of the time.


Other benchmarks don't stress the GC enough, only the binary-tree one does.


Sure. What about apps that need 300MB. Or a few GBs?

Where you can get away with barely any allocations is a much smaller niche even for microservices. And Java’s GC is in an entirely other generation of GCs compared to Go’s.


I'm curious why you think a language feature would obsolete an entire programming language. Do you imagine that go programmers secretly wish they were writing Java syntax, in my experience this is very much not true.


I'm tempted to make cheap shots about sets that do actual set operations without a for loop, but I'm hoping Go >=1.19 will start introducing some nice generic collections in the stdlib.


I spent the last few years really working hard to understand async frameworks/libraries like VertX, Mutiny etc I've gone from "I hate it" to "I can live with it". Now I dont have to. :) Yay.


This is an interesting idea. I've been idly thinking about how to make a framework for detecting DB anomalies in Django applications (say, missing transactions, race conditions, deadlocks etc.) which can be hard to detect. You can strap your application to Jepsen but this is a more white-box approach and probably harder to grok for the average Python developer.

I like the OP's idea of using a virtual thread implementation to parallelize the application-layer, while having the test code implement a custom executor to control the interleaving of the units-of-work being scheduled. I can see a few ways this approach could be used more widely; for example in Django you could write a DB driver wrapper and have this as the "test executor". Or for remote API requests, run the test code in green threads and then have a test executor that intercepts and then then chooses how to interleave the requests/responses.


So Loom is just green threads? You have to implement a CPS transformation on top? If so, great decision!


It's more accurate to say Loom is [a particular type of] continuations in the JVM, which can be used to implement green threads / fibers.

You can see the implementation of VirtualThread here: https://github.com/openjdk/loom/blob/fibers/src/java.base/sh...

This uses the internal 'one-shot delimited continuation': https://github.com/openjdk/loom/blob/fibers/src/java.base/sh...

So, at least in principle, there is scope for other styles of concurrency to be implemented over this.


Loom is just green threads (referred to as virtual threads).


For any older java programmers that may have a bad taste in their mouth when they think of green threads: when Java first came out, it had green threads, and all N threads that were spawned were tied to a single kernel thread. It also required cooperative multithreading, where one green thread may need to yield.

Java upgraded to native threads, and then you could have N java threads bound to N kernel threads. This was way better, but had downsides: You're limited on how many threads you can spawn, you need a threadpool to help manage, and any long-running tasks could effectively deplete your thread-pool.

With Loom, now you have M green threads mapped to N kernel threads. These green threads are way cheaper to spawn, so you could have thousands (millions even?) of green threads. Blocking calls won't tie up a kernel thread. So if you have many long-running IO tasks, they aren't going to waste a kernel thread and have it sit around idle waiting on IO. This is similar to async libraries, but without the mental overhead. You should be able to just code synchronously and the JVM will take care of the rest.


> all N threads that were spawned were tied to a single kernel thread. It also required cooperative multithreading, where one green thread may need to yield.

This is pretty much how JavaScript operates. You can consider calling an async function as spawning a user-level "thread"; chained-up callbacks are the same thing, but with manual CPS transform.

This has always perplexed me. Why N:1 threading is hated, but Node.JS is so loved?


You can still introduce very hard to debug blocking operations in nodejs, which will cause the whole runtime to slow down.


no.

javascript has exactly 1 thread (with the exception of web workers, but they are a pain to use)

the whole idea is to not have to relies on "async" stuff.


>> single kernel thread

> This is pretty much how JavaScript operates

> javascript has exactly 1 thread

No disagreement here. I understand what you are saying. Would be great if you had tried to understand what I said.

Maybe my explanation is lacking. Let me quote pron.

> Again, threads — at least in this context — [...] refer only to the abstraction allowing programmers to write sequences of code that can run and pause.

https://cr.openjdk.java.net/~rpressler/loom/Loom-Proposal.ht...

An kernel thread running code line-by-line is a "thread"; callbacks that are sequenced (and have the dreaded pyramid indentation hell) forms a "thread"; when an async function is called there is also a "thread".

In JavaScript the latter two kinds of "threads" are run by one single kernel thread, i.e. N:1 threading.


No. Callbacks are not a form of threads. Async is not a form of threads.


If you take the word "thread" to mean only "OS thread", then of course you are right, in the most boring way.

If you agree that Project Loom virtual threads are threads, consider this. When pluggable executor is available, you start a few of virtual threads, confining them in the UI thread. Semantically, how are they different from calling async functions in JavaScript?

---

>> abstraction allowing programmers to write sequences of code that can run and pause.

> No. Callbacks are not a form of threads. Async is not a form of threads.

Simply a "No" adds nothing to the conversation. Do you not see that they are "sequences of code that can run and pause"? Or do you not agree with the wider meaning of the word "thread"?


I once did something similar in C# to deterministically test some concurrency-heavy code (injecting custom TaskScheduler instances). It gave me a lot of confidence that it actually worked.


If you're looking for similar concurrency testing in the dotnet world, checkout Coyote: https://microsoft.github.io/coyote/

Explainer here: https://innovation.microsoft.com/en-us/exploring-project-coy...


Any (relatively easy) way we can try virtual threads out yet?


You can download JDKs from https://jdk.java.net/loom/ that support it


It should be available in the 19 EA builds:

https://jdk.java.net/19/


Will loom get support for Blockhound? https://github.com/reactor/BlockHound Blockhound supports both reactive streams and Kotlin coroutines currently.


Project loom is about deprecating "non-blocking" and "reactive" as a concept. The whole point is to just let the code block.


How does it replace the backpressure mechanism from reactive solutions?


Send to a full `BlockingQueue`, producer will block. Automatic backpressure.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: