Doesn't this depend a lot on your application though? Not every workload needs low latency and massive horizontal scalability.
Take their example of running the llm over the 2 million recipes and saving $23k over GPT 4. That could easily be 2 million documents in some back end system running in a batch. Many people would wait a few days or weeks for a job like that to finish if it offered significant savings.
Take their example of running the llm over the 2 million recipes and saving $23k over GPT 4. That could easily be 2 million documents in some back end system running in a batch. Many people would wait a few days or weeks for a job like that to finish if it offered significant savings.