Looks really well executed, nice! I'd shared this idea with a few people. GPT and other LLMs don't allow you to use their output to train competing models, but the implication is that it's fine to use their output to train your own internal alternative models. So you can't sell access to the output as an API, but you can use it to replace your GPT API calls.
My other thoughts to extend this are that you could make it seamless. To start, it'll simply pipe the user's requests to OpenAI or their existing model. So it'd be a drop in replacement. Then, it'll every so often offer to the user - "hey we think at this point there's enough data that a fine tune might save you approx $x/month based on your current calls, click the button to start the fine tune and we'll email you once we have the results" - and then the user gets the email "here are the results, based on that we recommend switching, click here to switch to calling your fine-tuned model" - Helicone and the other monitoring platforms could also offer something similar. (Side note I'm working on an "ai infra handbook" aimed at technical people in software orgs looking to deploy unspecified "AI" features and trying to figure out what to do and what resources they'll need - it's a 20+ page google doc, if anyone can help me review what I have so far please let me know and I'll add you.)
If it's latency/error/speed competitive, and cheaper, and equivalently accurate, then for anyone doing production scale LLM API usage it'd make sense to use something like this - either the fine-tune is worse so you keep using the regular API, or the fine tune has parity plus cost and/or speed advantage, so you switch. (It wouldn't make sense for prototyping scale, because the additional complexity of the switch wouldn't be worth it unless it could save you 4/5 or more figures a year in API costs I'd think.)
> My other thoughts to extend this are that you could make it seamless. To start, it'll simply pipe the user's requests to OpenAI or their existing model. So it'd be a drop in replacement. Then, it'll every so often offer to the user - "hey we think at this point there's enough data that a fine tune might save you approx $x/month based on your current calls, click the button to start the fine tune and we'll email you once we have the results" - and then the user gets the email "here are the results, based on that we recommend switching, click here to switch to calling your fine-tuned model"
You just described our short-term roadmap. :) Currently an OpenPipe user has to explicitly kick off a fine-tuning job, but they're so cheap to run we're planning on letting users opt in to running them proactively once they have enough data so we can provide exactly that experience.
Also, it seems sort of like how cryptocurrency folks assumed their transactions were anonymous? It's an API, so they could log the calls. (Maybe not the contents.)
> Side note I'm working on an "ai infra handbook" aimed at technical people in software orgs looking to deploy unspecified "AI" features and trying to figure out what to do and what resources they'll need - it's a 20+ page google doc, if anyone can help me review what I have so far please let me know and I'll add you.
I would be interested in reviewing your handbook too. I am technical, but have not deployed any AI related tooling so far. keen to know if this is targeted to AI noobs as well.
My other thoughts to extend this are that you could make it seamless. To start, it'll simply pipe the user's requests to OpenAI or their existing model. So it'd be a drop in replacement. Then, it'll every so often offer to the user - "hey we think at this point there's enough data that a fine tune might save you approx $x/month based on your current calls, click the button to start the fine tune and we'll email you once we have the results" - and then the user gets the email "here are the results, based on that we recommend switching, click here to switch to calling your fine-tuned model" - Helicone and the other monitoring platforms could also offer something similar. (Side note I'm working on an "ai infra handbook" aimed at technical people in software orgs looking to deploy unspecified "AI" features and trying to figure out what to do and what resources they'll need - it's a 20+ page google doc, if anyone can help me review what I have so far please let me know and I'll add you.)
If it's latency/error/speed competitive, and cheaper, and equivalently accurate, then for anyone doing production scale LLM API usage it'd make sense to use something like this - either the fine-tune is worse so you keep using the regular API, or the fine tune has parity plus cost and/or speed advantage, so you switch. (It wouldn't make sense for prototyping scale, because the additional complexity of the switch wouldn't be worth it unless it could save you 4/5 or more figures a year in API costs I'd think.)