That's pretty much what SLIDE [0] does. The driver was achieving performance par...

		nmfisher on June 23, 2022 \| parent \| context \| favorite \| on: YaLM-100B: Pretrained language model with 100B par... That's pretty much what SLIDE [0] does. The driver was achieving performance parity with GPUs for CPU training, but presumably the same could apply to running inference on models too large to load into consumer GPU memory. https://github.com/RUSH-LAB/SLIDE