The problem for them is that it might be a rather fundamental limitation. We already know that RLHF makes models dumber. It is entirely possible that, in order to make the model buy fully into what those people are peddling, the amount of forceful training required would crater the model's overall performance.