I don't think slow/fast is an appropriate analogy. Yes there are safety concerns - you don't want the model advising you how to do mass killing or something - but I also get the sense that the raw model is unpredictable, behaves weird, and generally has its own problems. So I don't see RLHF as reducing capability so much as altering capability. My suspicion is that the raw model would have other major flaws, and RLHF is just trading one set of flaws for another. Which is to say, the limitations introduced by RLHF are indeed artificial, but the raw model itself has limitations too.