Think we will see something like this in the near future. But there are two very...

Think we will see something like this in the near future. But there are two very bold claims: one is hope that RL will lead to generalization in other domains, which based on current evidence seems far fetched. Other is the softwares and specs are good "RL-gym" data. The whole idea behind RL is that model explores the best paths, but if the softwares written are suboptimal when it comes to agent interaction paradigm (they were written for humans, not agents), there is a high chance even with the compute the model would be suboptimal. There is a parallel trend where current AI systems are abstracting entire workflows. Not accounting for that would lead to outcomes which are not cognizant of the current requirements.