Nuke: A memory arena implementation for Go

sapiogram · on March 5, 2024

Looks like the arena is completely unsound if you mix types with different alignments?

  arena := NewSlabArena(8182, 1) // 8KB

  var b *byte = New[byte](arena)
  var i *int = New[int](arena)

  fmt.Printf("Pointer address: %p\n", b)
  fmt.Printf("Pointer address: %p\n", i)

Result:

  Pointer address: 0x14000198000
  Pointer address: 0x14000198001

I'm not a Go language lawyer, but I assume this is just immediate UB. OP, it's fine to publish a library without experience in manual memory management, but maybe put a disclaimer in the README?

electroly · on March 5, 2024

The fact that the author calls it a slab arena makes me think they did, indeed, intend for it to be used with a single type per arena. I do wonder why you'd want an allocator that is both slab and arena, though. I assume there is some use case but nothing immediately comes to mind.

sapiogram · on March 5, 2024

> The fact that the author calls it a slab arena makes me think they did, indeed, intend for it to be used with a single type per arena.

Maybe? But that seems strange, since they seem to have intended it to be used for http servers making per-request allocations.

Come to think of it, the arena is probably still unsound with just ints, because the underlying allocation is just for a `[]byte`, which I don't think is guaranteed to be aligned to 8 bytes. Might be on most platforms, though.

perbu · on March 5, 2024

There is also the issue where you have a pointer in the arena pointing to something on the heap. The GC will gladly kill the object on the heap as it has no idea something is still pointing to it.

throwaway894345 · on March 5, 2024

Why wouldn't this just use generics to allocate a big slice of the type so that the GC can know whether or not the arena may contain pointers to the heap?

jerf · on March 5, 2024

You can do that but you end up with something other than what the author wrote. The use cases are different. Arenas, at least in principle, should generally be able to allocate anything, not just a single type. If you have a single type you would do the equivalent thing in any language, not use arenas.

jfindley · on March 5, 2024

Generics in go, as they're implemented today, sadly have a fair bit of performance overhead. I haven't tried but my assumption would be that go generics are not fast enough to make an effective arena allocator. I'd be thrilled if someone could prove me wrong though!

throwaway894345 · on March 8, 2024

The performance overhead is when you're calling a method on the generic type--Go has to lookup the specific implementation in a dictionary. Pretty sure that doesn't apply for straight-up container use cases like this one.

perbu · on March 5, 2024

That is a really good question.

bborud · on March 5, 2024

Upvoted because I want to see comments from people more knowledgeable than me.

fasteo · on March 5, 2024

I was hoping to see a benchmark against Go garbage collector, but unfortunately there is none, so it is difficult to assess the usefulness of this library.

d-z-m · on March 5, 2024

related: https://github.com/golang/go/issues/51317

aranw · on March 5, 2024

Be really interested to know and understand when and how to use these in real workloads

anonacct37 · on March 6, 2024

This is a very sharp tool and I find it's really rare to need it.

I do alot of profiling and performance optimization. Especially with go. Allocation and gc is often a bottleneck.

Usually when that happens you look for ways to avoid allocation such as reusing an object or pre allocating one large chunk or slice to amortize the cost of smaller objects. The easiest way to reuse a resource is something like creating one instance at the start of a loop and clearing it out between iterations.

The compiler and gc can be smart enough to do alot of this work on their own, but they don't always see (a common example is that when you pass a byte slice to a io.Reader go has to heap allocate it because it doesn't know if a reader implementation will store a reference and if it's stack allocated that's bad.

If you can't have a clean "process requests in a loop and reuse this value" lifecycle, it's common to use explicit pools.

I've never really had to do more than that. But one observation people make is that alot of allocations are request scoped and it's easier to bulk clean them up. Except that requires nothing else stores pointers to them and go doesn't help you enforce that.

Also this implementation in particular might not actually work because there are alignment restrictions on values.

jerf · on March 5, 2024

I expect that honestly the answer in Go is that if you're even tempted to use this, you are doing at least one of 1. premature optimization or 2. experiencing the consequences of choosing the wrong programming language for your project.

Language selection is really important and I think too many engineers approach it rather willy-nilly and with way too much bias towards what they like or may already know. Both of those are legitimate considerations! But they shouldn't be determinative. You need to calmly and rationally look at all the tradeoffs the languages offer. I think the vast majority of projects that have the sort of performance requirements that require arenas to function could have had that requirement determined from the beginning and the conclusion reached that Go was not a good choice, despite matching on some criteria. If this degree of memory performance is a critical requirement for your project, you're looking at a list of possible languages I could count with one hand, and Go's not on it.

(Though based on what I see in the world right now, the more common problem is people getting a project and grotesquely overestimating the performance they need, like, the guy tasked with writing a web site that will perform up to 5 entire CRUD updates per second at maximum load posting questions about whether they need the web framework that does six million requests per second or the one that does ten million. But both over and under estimating requirements is a problem in the real world.)

I would think not twice, but more like a dozen times about using a package like this. I would need to be backed into it by sheer desperation, some large code base that I simply can not fix any other way, or extract this into its external service/microservice/library for the task, or literally almost anything else, and using it would represent my program reaching the end of its "design budget", if not exceeding it.

And unless the decision was just so far back in the mists of history that it is completely irrelevant now (e.g., the decisions were all made by people no longer on the project), there'd be a postmortem on how we made the mistake of picking an inappropriate language.

FooBarWidget · on March 5, 2024

It's pretty popular in web servers. You create a new area per HTTP request, and then everything you need for the context of that request, you allocate from the arena. When the request is done, you free the entire arena.

Nginx does this. As does my Passenger application server.

Apache... kind of does this, but it fakes it. It allocates every object individually, and the "arena" is only used for linking all those allocations together so that Apache can free all of those allocations (individually). facepalm

perbu · on March 5, 2024

Instead of freeing it you can just keep zero it out and keep it. That way you don't have to allocate memory for the next HTTP request. IIRC this was how Varnish managed the per thread worker memory.

tgv · on March 5, 2024

You can use it in all kinds of producer-consumer designs, but this seems quite dangerous in Go. It's like making a huge effort to remove all the safeties and then aiming at your foot.

fsociety · on March 5, 2024

For Go? Honestly not sure. In principle it can be faster but as someone pointed out in another comment this one has some subtle issues.

In manual languages like C or C++, you can use these to allocate a fixed set of memory on program init (keeps system resources under control), to get contiguous allocations (friendly for caches), to keep yourself sane for memory (clear start and end to the lifecycle of an object), and to be very performant (if used correctly).

tiptup300 · on March 5, 2024

Never heard of this concept I found this

https://github.com/Enichan/Arenas

For C#.

Been trying to get into more managed memory in C#, so this might be something good for that.

neonsunset · on March 5, 2024

Another one is https://github.com/xoofx/Varena

Arguably, there's less need for arenas in C# in most scenarios than in Go because of easy object and array pooling out of box with ArrayPool<T>, ObjectPool<T> (Sdk.Web workload) and stackalloc/InlineArray and co. You can also just use malloc/free directly with NativeMemory.Alloc/Free instead.

danbruc · on March 5, 2024

I might be a bit ignorant here, but I was under the impression that this is what every program already does. Before virtual memory, you would not call into the operating system for every allocation but you would have an in-process memory allocator that obtains large chunks of memory from the operating system and satisfies allocations from that pool. The advent of virtual memory removed the need to explicitly obtain memory from the operating system, just access it and the page fault handler will transparently give you some memory. But you still would have an allocator that keeps track of all your allocations. Even the use of a garbage collector would not fundamentally change this.

If I am not misunderstanding what this project tries to do, then they are essentially adding another layer, they get large chunks from the allocator and then use those to satisfy allocations. This seems essentially like having a second allocator on top of the existing one. If the existing one does not work well in certain scenarios, there might be some performance to be gained by using a different allocator, even on top of the existing one. But I wonder if this is the best way to solve the issue, this seems more like a workaround than a fix. Would it not be better to tune the existing allocator or make it configurable or even swappable? This of course requires more fundamental changes - language, runtime, compiler - instead of just being a library. Or maybe Go already has facilities to customize memory management?

EDIT: I think I got this wrong, the actual goal seems to be able to allocate several objects in a continuous chunk of memory, i.e. an array of objects - not to be confused with an array of pointers to objects - for improved locality.

sapiogram · on March 5, 2024

> I might be a bit ignorant here, but I was under the impression that this is what every program already does.

It is not. Arena allocators have a fundamentally different API, because they don't allow you to de-allocate individual objects in the arena - everything must be de-allocated at once. For specific workloads, like repeatedly allocating a large number of short-lived objects, this can be a huge speedup, and also significantly reduce memory usage.

danbruc · on March 5, 2024

So it is like allocating an array of objects but I guess it does not have to be homogeneous, i.e. you can decide after the initial allocation what objects you want to live in your chunk of memory. Allocation gets simpler as you just have to move one pointer forward past the newly allocated object. Memory usage would actually remain the same or even increase - ignoring the overhead of data structures to track allocations - as the lifetime of objects is now tied to the lifetime of the arena. You could get essentially the same behavior from a normal per-thread allocator if you never freed anything and the allocator would satisfy all allocations from a single pool. You would of course still pay the management overhead and have to free every object individually. More interesting scenarios arise when you use several arenas with different lifetimes or mix arenas allocation with normal allocation.

nickpsecurity · on March 5, 2024

What you say is true for pure arenas. There are also hybrids like Immix that mix them with other methods:

https://www.cs.cornell.edu/courses/cs6120/2019fa/blog/immix/

sapiogram · on March 5, 2024

I only took a brief look, but from an API perspective, this seems identical to "normal" garbage collectors? In the sense that from the programmer's perspective, all the allocations live forever?

Someone · on March 5, 2024

Arena allocators add the ability to free all allocations in an entire arena with a single call. That means that you don’t have to keep track of allocations made. That can simplify APIs that take and/or produce pointers.

They also allow you to have multiple arenas.

A use case is a web server, where each request creates an arena, allocates scratch memory in it, and destroys the arena after handling the request.