That may or may not be true for use-cases that require asynchronous, bulk inference _and_ require some task-specific post-training.
FWIW, my approach towards tasks like the above is to
1. start with using an off-the-shelf LM API until
2. one figures out (using evals that capture product intent) what the failure modes are (there always are some) and then
3. post-train against those (using the evals)
That may or may not be true for use-cases that require asynchronous, bulk inference _and_ require some task-specific post-training.
FWIW, my approach towards tasks like the above is to
1. start with using an off-the-shelf LM API until
2. one figures out (using evals that capture product intent) what the failure modes are (there always are some) and then
3. post-train against those (using the evals)