Rethinking the Computer Chip in the Age of AI

vajrabum · on Oct 9, 2022

Find a preprint of the paper referred to here: https://arxiv.org/abs/2202.05259

Here's the abstract:

The deluge of sensors and data generating devices has driven a paradigm shift in modern computing from arithmetic-logic centric to data centric processing. At a hardware level, this presents an urgent need to integrate dense, high-performance and low-power memory units with Si logic-processor units. However, data-heavy problems such as search and pattern matching also require paradigm changing innovations at the circuit and architecture level to enable compute in memory (CIM) operations. CIM architectures that combine data storage yet concurrently offer low-delay and small footprint are highly sought after but have not been realized. Here, we present Aluminum Scandium Nitride (AlScN) ferroelectric diode (FeD) memristor devices that allow for storage, search and neural network-based pattern recognition in a transistor-free architecture. Our devices can be directly integrated on top of Si processors in a scalable, back-end-of-line process. We leverage the field-programmability, non-volatility and non-linearity of FeDs to demonstrated circuit blocks that can support search operations in-situ memory with search delay times < 0.1 ns and a cell footprint < 0.12 μm2. In addition, we demonstrate matrix multiplication operations with 4-bit operation of the FeDs. Our results highlight FeDs as promising candidates for fast, efficient, and multifunctional CIM platforms.

Animats · on Oct 9, 2022

Ah. Thanks.

"Three of the most important functions or operations are: 1) on-chip storage, 2) parallel search, and 3) matrix multiplication." All of which they claim to be able to do, more or less.*

Storage:

"...we also show that FeD devices can be programmed into 4-bit, distinct, conductive states with superior linearity and symmetry via electrical pulsing."

I think they mean they can distinguish at least 16 different analog levels, not that they are storing 4 discrete binary bits.

On-chip storage is a crossbar array of diodes whose state can be changed. It's a coincident-current addressing system, like old core memory. So you need 2N lines to address N^2 cells.

The two-terminal FeD devices show a diode- like self-rectifying behavior with ... endurance over 10^4 cycles,

Do they mean 10^4 cycles before it needs to be refreshed, like DRAM, or before it's worn out, like flash?

Search:

Search is more digital. Again, it's a crossbar thing, and all the rows are tested simultaneously.

Multiplication:

"This allows mapping the matrix multiplication operation, a key kernel in neural-network computation, into reading the accumulated currents at each bitline of a FeD device by encoding an input vector into analog voltage amplitudes and the matrix elements into conductances of an array of FeD devices."

Huh. That is an analog multiplier. Unclear how many bits of product you get. Probably 4. Do you get accurate results, after quantizing to 16 levels, or just ones that are reasonably close?

This is fascinating. It's a hybrid computer, something seldom seen in years. What can be done with this architecture is limited, but apparently well matched to what ML people want to do, which is very low precision multiply, add, and search.

sizzle · on Oct 10, 2022

Sounds like you should be working in this field?

Animats · on Oct 10, 2022

No, I'm not qualified. I'm just trying to figure out what the article really had to say.

Scientific PR articles are often deliberately hard to read, especially when some minor improvement is being inflated into a world-changing discovery. They have to be read with a viewpoint of finding what they're not telling you. This one is describing something that might be important. If they can fab the things and the noise level and lifetime are acceptable.

It's not clear that it gets the same result every time. Analog computers usually don't. But if the number of different levels is small enough, and the results are quantized to a small number of bits after the operation, maybe they can. That's much like the way modems with multiple signal levels work.

Or exact repeatability might not matter. People have proposed error-tolerant computing before. Devices can be smaller and lower power if the only have to be 99% right. It's enough of a headache that it never caught on. It's hell for debugging. But it might work for ML, which is somewhat noise tolerant.

Things could get strange if this catches on.

abstractcontrol · on Oct 9, 2022

https://knowm.org/oxide-memristors-have-shelf-life-problems/

Not related to the Aluminum Scandium Nitride memristors from the current paper, but this article I just found fascinating. I've been a stealth fan of memristors for a long time, and this finally answers why HP's efforts failed in such a crazy fashion. It is because they were using metal oxide memristors, and they rust easily.

Maybe if the devices from the current paper turn out to be good that could finally act as the fuel for the next AI wave after deep learning? The paper looks good to my eyes, but I am not a materials expert.

ip26 · on Oct 9, 2022

I am still trying to figure out how CIM is supposed to work out.

The highest density storage shares a single access structure among many bits (see SRAM design). This is antithetical to real CIM, because you fundamentally have to read & write the whole structure iteratively, at which point it’s just regular compute.

On the other hand, if you build memory that does not depend on a shared access structure, the density is inherently poor. We already know how to build these today in CMOS, and the reason they have not taken the world by storm is the crummy density. (A CAM, commonly built with flip flops, is a simple form of CIM)

There’s a good chance I’m just not creative enough to see the light, but I haven’t found my way past this basic axiom.

rm445 · on Oct 9, 2022

Transistor-free compute in memory? Am I hopelessly behind the times in thinking that memory is, like, flip-flops in CMOS?

I get that memory could be anything which holds an encoded state, for instance magnetically. But how is the compute part of it achieved?

HarHarVeryFunny · on Oct 9, 2022

Yeah, the linked article seems to be dumbed-down to the point of being information free. It does contain a link to this report which has a bit more information.

https://pubs.acs.org/doi/full/10.1021/acs.nanolett.2c03169?c...

Still not totally clear, but my understanding is that they are using these Ferroelectic Diodes for both memory and "compute", and that this is a memristor-type device.

The "compute" claim seems a bit exaggerated, but seems to be saying that given the response curve of the device it can effectively "multiply" by a pre-programmed weight value as part of memory recall.

Maybe someone can confirm my take, or correct if wrong.

Lind5 · on Oct 9, 2022

Dealing with the exponential increase in data is driving some massive architectural changes. This Univ of Penn research is interesting. Chip companies are currently working a number of strategies. Related: https://semiengineering.com/ic-architectures-shift-as-oems-n...

Animats · on Oct 9, 2022

Actual info is paywalled. Is there another source?