I was under the impression that, for any FHE scheme with "good" security, (a) there was a finite and not very large limit to the number of operations you could do on encrypted data before the result became undecryptable, and (b) each operation on the encrypted side was a lot more expensive than the corresponding operation on plaintext numbers or whatever.
Am I wrong? I freely admit I don't know how it's supposed to work inside, because I've never taken the time to learn, because I believed those limitations made it unusable for most purposes.
Yet the abstract suggests that FHE is useful for running machine learning models, and I assume that means models of significant size.
The difference between homomorphic schemes and fully homomorphic schemes is that FHE can be bootstrapped; there's a circuit that can be homomorphically evaluated that removes the noise from an encrypted value, allowing any homomorphic calculation's result to have its noise removed for further computation.
My understanding is largely ten years old and high level and only for one kind of fully homomorphic encryption. Things have changed and there is more than one kind.
I heard it described as a system that encrypts each bit and then evaluates the "encrypted bit" in a virtual gate-based circuit that implements the desired operations that one wants applied to the plaintext. The key to (de|en)crypt plaintext will be at least one gigabyte. Processing this exponentially larger data is why FHE based on the system I've described is so slow.
So, if you wanted to, say, add numbers, that would involve implementing a full adder [0] circuit in the FHE system.
Both of these are correct-ish. You can do a renornalization that resets the operation counter without decrypting on FHE schemes, so in that sense there is no strict limit on operation count. However, FHE operations are still about 6 orders of magnitude more expensive than normal, so you are not going to be running an LLM, for instance, any time soon. A small classifier maybe.
LLMs are at the current forefront of FHE research. There are a few papers doing some tweaked versions of BERT in <1 minute per token. Which is only ~4 orders of magnitude slower than cleartext.
This paper uses a very heavily modified version of an encoder-only BERT model. Forward pass on a single 4090 is cited there at 13 seconds after switching softmax out for a different kernel (21 seconds with softmax). They are missing a non-FHE baseline, but that model has only about 35 million parameters when you look at its size. At FP16, you would expect this to be about 100x faster than a normal BERT because it's so damn small. On a 4090, that model's forward pass probably runs at something like 100k-1M tokens per second given some batching. It sounds like 6 orders of magnitude is still about right.
Given individual LLM parameters are not easily interpreted, naturally obfuscated by the diffuse nature of their impact, I would think leaning into that would be a more efficient route.
Obfuscating input and output formats could be very effective.
Obfuscation layers can be incorporated into training. With an input (output) layer that passes information forward, but whose output (input) is optimized to have statistically flat characteristics, resistant to attempts to interpret.
Nothing like apparent pure noise for obfuscation!
The core of the model would then be trained, and infer, on the obfuscated data.
When used, the core model would publicly operate on obfuscated data. While the obfuscation/de-obfuscation layers would be used privately.
In addition to obfuscating, the pre and post-layers could also reduce data dimensionality. Naturally increasing obfuscation and reducing data transfer costs. It is a really good fit.
Even the most elaborate obfuscation layers will be orders and orders of magnitude faster than today's homomorphic approaches.
(Given the natural level parameter obfuscation, and the highly limited set of operations for most deep models, I wouldn't be surprised if efficient homomorphic approaches were found in the future.)
Moore's Law roughly states that we get a doubling of speed every 2 years.
If we're 6 orders of magnitude off, then we need to double our speed 20 times (2^20 = 1,048,576), which would give us speeds approximately in line with 40 years ago. Unless my understanding is completely off.
The rule of thumb is "about a 100000x slowdown". With Moore's law of 2 years that means it would operate at speeds of computers from about 40 years ago. Although really that's still making it seem like it's faster than it is. Making direct comparisons is hard.
Am I wrong? I freely admit I don't know how it's supposed to work inside, because I've never taken the time to learn, because I believed those limitations made it unusable for most purposes.
Yet the abstract suggests that FHE is useful for running machine learning models, and I assume that means models of significant size.