I agree with you that the market exists and, as a result, solutions to this prob...

I agree with you that the market exists and, as a result, solutions to this problem also exist in abundance. The most difficult part about a building a product like the one presented here is making something super generic that works for a wide swath of use cases. If you simplify the stack to more bespoke/custom approach, the build burden decreases exponentially.

For the folks who are already technical in this vertical, especially ones that leverage a low cardinality architecture (one or two models, small subset of tasks, etc), this type of thing is quite easy to build yourself first as a working prototype and then only slightly more difficult to productionize & automate.

I have some in-house infra that does similar work: monitors inputs and outputs from models, puts them in a UI for a human to score/rank, preps a DPO dataset for training, kicks off training run. The total amount of calendar time I spent from prototype to production was roughly two person weeks. Changing the human intervention mechanism to an automated reward function would be an hour or two worth of work. If I had to make this work for all types of users, tasks, and models — no shot I'd have the time personally to pull that off with any reasonable velocity.

With that said, having a nice UI with great observability into the whole process is a pretty big value-add to get out of the box as well.

(EDIT: for clarity, not affiliated all with the OP project/org)