You (currently) need a GPU to run any of the useful models. I haven't really see...

hereonout2 · on Sept 12, 2023

> once you start needing to horizontally scale it'll get messy fast.

It gets expensive fast, but not messy, these things scale horizontally really well. All the state is encapsulated in the request, no replication, synchronisation, user data to worry about. I'd rather have the job of horizontally scaling llama2 than a relational database.

thewataccount · on Sept 12, 2023

For sure, and yeah it wouldn't be terrible you're right. You'd just need the api servers + a load balancer.

My thing is that dynamically doing that is still a lot compared to just calling a single endpoint and all of that is handled for you.

But for sure this is a very decent horizontal use-case.