Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
nmfisher
on June 23, 2022
|
parent
|
context
|
favorite
| on:
YaLM-100B: Pretrained language model with 100B par...
That's pretty much what SLIDE [0] does. The driver was achieving performance parity with GPUs for CPU training, but presumably the same could apply to running inference on models too large to load into consumer GPU memory.
https://github.com/RUSH-LAB/SLIDE
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
https://github.com/RUSH-LAB/SLIDE