One wrinkle, is that it is now common to fine-tune on previously derived RL data...

		jacobr1 on June 18, 2024 \| parent \| context \| favorite \| on: What Is ChatGPT Doing and Why Does It Work? (2023) One wrinkle, is that it is now common to fine-tune on previously derived RL datasets, with the tested inputs and preferred sample outputs as the training data.