I was looking for a version of a proxy that could maximize throughput to each LL...

I was looking for a version of a proxy that could maximize throughput to each LLM based on its limits. Basically max requests and input/output tokens per second.

I couldn't find something, so I rolled a version together based on redis and job queues. It works decently well, but I'd prefer to use something better if it exists.

Does anyone know of something like this that isn't completely over engineered / abstracted?