While I do agree, with pointer chasing down a persistent data structure being ba...

MillenialMan · on Oct 24, 2021

Threading doesn't compensate for that degree of slowdown, and itself has overhead. You'll get something back, but not much.

dgb23 · on Oct 24, 2021

Again, generalisations about performance like that is just not very useful. There are use cases where you get significant benefits from using persistent data structures.

Aeron comes to mind as an example in regards to high performance solutions that uses them. But a more fundamental reason is that immutability can be part of your domain logic. Think video games like braid, or business applications where you need to derive a lot of different views and go back and forth between them, or domains where data just is inherently immutable such as accounting and so on.

MillenialMan · on Oct 24, 2021

I don't really agree. Cache friendliness is almost always a relevant factor as soon as performance becomes an issue. I get what you're saying but as I see it immutability gives you architectural efficiency and in some cases space efficiency, but rarely processing speed.

> Think video games like braid

Braid doesn't use immutable data structures, it uses a historical record (and immutability would be incompatible with some of the mechanics).[1] The author of Braid is actually quite famous for his dislike of functional languages and immutability. He doesn't even like garbage collection because of the overhead it introduces.

Interestingly, he was talking about data structures for codebase representation (in the context of a compiler) a while back, and someone mentioned HAMTs. I'm definitely curious if they would work well there.

[1] https://www.youtube.com/watch?v=8dinUbg2h70

dgb23 · on Oct 24, 2021

You’re picking out an admittedly bad example in order to miss the point. Different data structures have different tradeoffs. Not every problem lends itself to the same type of optimization, so it’s not helpful to make these generalizations.

Continuing with the example regardless, change records and persistent data structures have different performance characteristics. The former is going fast if you move incrementally between states, the latter enables arbitrary access, comparison and efficient in memory caching of multiple views.

It would be interesting to explore and measure the trade offs under certain use cases.

MillenialMan · on Oct 25, 2021

I understand your point. I'm saying: the subset of problems that benefit in a performance sense from immutability is very small. The vast majority of the time, cache misses slow down algorithms. That's a perfectly reasonable generalisation and I don't really understand why you think it's untrue.

Re: change records, I believe Braid uses a historical record of absolute states, not a series of diffs. The state at a particular time is recreated from a minimal representation (a couple of arrays). That's much more efficient than throwing multiple iterations of the state in a HAMT.

dgb23 · on Oct 25, 2021

> The vast majority of the time, cache misses slow down algorithms. That's a perfectly reasonable generalisation and I don't really understand why you think it's untrue.

I don't say it is untrue. But not every use case lends itself to CPU cache optimization. Your access patterns might just happen to be arbitrary or fragmented from a layout perspective.

I would argue that this is a very common case, especially for IO bound applications that operate on tiny pieces of data and have deeply nested dispatch rules that each query some section of memory that you don't know it will query in advance.

Or in other words: The clearer your computation pipeline can be, the more you can benefit from CPU cache optimization.

> Re: change records, I believe Braid uses a historical record of absolute states, not a series of diffs. The state at a particular time is recreated from a minimal representation (a couple of arrays). That's much more efficient than throwing multiple iterations of the state in a HAMT.

You caught my interest, I will have to study this. I can't really imagine how it works from your explanation (my bad, not yours). I assumed when you said "change records" it meant that it would be suboptimal to access an arbitrary state (linear as opposed to constant).

reitzensteinm · on Oct 24, 2021

In my experience you can usually achieve near linear speed up. My machine can run 24 threads.

MillenialMan · on Oct 24, 2021

Fair enough on scaling. But 24 is still a lot less than two to three orders of magnitude.

reitzensteinm · on Oct 24, 2021

Yes, it is. I'd probably ballpark Clojure at 100x slower than the kinds of low level C# I usually write (gamedev). But threading C#, especially low level imperative C#, is so awful I often don't bother unless it's very important or there's an embarrassingly parallel element (genetic algorithms and simulating sound waves on a 2D grid are two cases I've pulled the trigger where both were true).

This leaves Clojure as 1/4 the overall speed, which seems about right. However that's based on a hugely unscientific gut feeling, because generally I don't touch Clojure if performance matters and I don't touch C# if it doesn't.

By the way, I've implemented persistent data structures on disk for a problem they were particularly useful for. If stalling for a cache miss feels bad, try waiting for an SSD :)

gpderetta · on Oct 24, 2021

Having to use 24 cores to get to 1/4th of the performance of a single threaded C# application seems particularly awful.

Especially as C# is a relatively pleasant language to use.

It doesn't matter how good threading in clojure is if you don't even need to use it to beat clojure on another language (also: great scaling is pointless if you start so far behind).

reitzensteinm · on Oct 24, 2021

Yes, hammering in a nail with a screwdriver is particularly awful. I love using Clojure where it's appropriate, but I'm not in the least bit tempted to use it for game dev.

gnuvince · on Oct 24, 2021

Could you share an example program that does that?

_bsless · on Oct 24, 2021

Take that with a grain of salt, from experiments run by myself and others I can share this does not scale linearly.

Hope some performance experts can chime in but this might be related to data structure sharing between caches or allocation contention in the JVM.

reitzensteinm · on Oct 24, 2021

If you've got some code, I'm happy to take a look.

_bsless · on Oct 24, 2021

You could start with these:

https://clojureverse.org/t/parallel-transducing-context-figh...

https://www.reddit.com/r/Clojure/comments/emylbs/any_idea_wh...

https://clojurians.zulipchat.com/#narrow/stream/151168-cloju...

https://github.com/joinr/paralleltest/issues/1

Anything in particular you're looking for?

reitzensteinm · on Oct 24, 2021

I don't have access to the Zulip chat, but the other benchmarks are basically testing allocating in a hot loop. I'm not surprised that doesn't scale linearly, and it's certainly not representative of real world code I've ever written.

If you have code you wrote to achieve something and hit a wall with scaling, I'm happy to take a look.

_bsless · on Oct 24, 2021

Up to 4 threads I usually see linear scaling as well, but it begins to drop off afterwards, although I don't have a representative example at hand.

I'd like to see a good example if you have one available, most of my performance work was done in a single-threaded context until now

reitzensteinm · on Oct 24, 2021

I don't really have anything off hand.

But code primarily slowed down by pointer chasing down big maps, which is MillenialMan's complaint and fits my own experience, will absolutely be sped up linearly.

A bunch of cores sitting around waiting for the next HAMT node to come in will not interfere with each other in the slightest.