Hacker News new | past | comments | ask | show | jobs | submit login
Does RL Incentivize Reasoning Capacity in LLMs Beyond the Base Model? (arxiv.org)
2 points by Anon84 2 days ago | hide | past | favorite | 2 comments






Funny how this title follows Betteridge's law of headlines, in this case demonstrating that RLVR (Reinforcement Learning with Verifiable Rewards) doesn't help the model generalize, but rather seems to overfit it, reducing the overall reasoning capacity.



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: