Sounds a lot like GreenArray GA144 (https://www.greenarraychips.com/home/documen...

jnpnj · 2025-07-25T21:09:23 1753477763

That was my first thought too. I really like the idea of interconnected nodes array. There's something biological, thinking in topology and neighbours diffusion that I find appealing.

londons_explore · 2025-07-25T21:43:45 1753479825

One day someone will get it working...

Data transfer is slow and power hungry - it's obvious that putting a little bit of compute next to every bit of memory is the way to minimize data transfer distance.

The laws of physics can't be broken, yet people demand more and more performance, so eventually the difficulty of solving this issue will be worth solving.

AnimalMuppet · 2025-07-26T00:20:04 1753489204

That minimizes the data transfer distance from that bit of memory to that bit of compute. But it increases the distance between that bit of (memory and compute) and all the other bits of (memory and compute). If your problem is bigger than one bit of memory, such a configuration is probably a net loss, because of the increased data transfer distance between all the bits.

Your last paragraph... you're right that, sooner or later, something will have to give. There will be some scale such that, if you create clumps either larger or smaller than that scale, things will only get worse. (But that scale may be problem-dependent...) I agree that sooner or later we will have to do something about it.

Earw0rm · 2025-07-26T07:25:36 1753514736

We already do.

Cache hierarchies operate on the principle that the probability of a bit being operated on is inversely proportional to the time since it was last operated on.

Registers can be thought of in this context as just another cache, the memory closest to the compute units for the most frequent operations.

It's possible to have register-less machines (everything expressed as memory to memory operations) but it blows up the instruction word length, better to let the compiler do some of the thinking.

Earw0rm · 2025-07-26T10:36:06 1753526166

Indeed you can take this further and think of three address spaces:

- Visible register file. 4-6 bit address space, up to 2kb in size. Virtualized as hidden (hardware) registers. Single cycle access. Usually little or no access controls or fault handling, if it exists you can read/write it.

- Main memory, 32-64 bit address space. Virtualized as caches, main RAM and swap. Access may be as low as 5 cycles for L1d, hundreds for main RAM, up into millions if you hit the swap file. Straightforward layer of access controls: memory protection, segfault exceptions and so on.

- Far storage, URIs and so on. Variable-length address space, effectively infinite. Arbitrarily long access times, arbitrarily complex access controls and fallbacks.

actionfromafar · 2025-07-26T09:53:11 1753523591

So do both. Put a bunch of small processors on every DIMM.

yencabulator · 2025-07-28T23:24:42 1753745082

It's been tried. Nothing's really worked yet.

https://en.wikipedia.org/wiki/Computational_RAM

https://research.ibm.com/projects/in-memory-computing

jnpnj · 2025-07-27T12:14:10 1753618450

Long ago I thought that, at least for very generic / task-agnostic operations such as wiping, moving, duplicating chunks of memory, a chip-on-dimm could be of use (but maybe this is already the case and I don't know about it)

actionfromafar · 2025-07-27T13:47:32 1753624052

And encryption, hashing. Hashing could be used for deduplication and caching.

jnpnj · 2025-07-27T14:44:17 1753627457

Kinda like SSDs encryption ?

actionfromafar · 2025-07-28T08:38:34 1753691914

I don’t know… I was thinking these cores would be programmable.