Does RL Incentivize Reasoning Capacity in LLMs Beyond the Base Model? | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

		Does RL Incentivize Reasoning Capacity in LLMs Beyond the Base Model? (arxiv.org)
		2 points by Anon84 2 days ago \| hide \| past \| favorite \| 2 comments

yorwba 2 days ago | [–]

Discussed two days ago: https://news.ycombinator.com/item?id=43760625

falcor84 2 days ago | [–]

Funny how this title follows Betteridge's law of headlines, in this case demonstrating that RLVR (Reinforcement Learning with Verifiable Rewards) doesn't help the model generalize, but rather seems to overfit it, reducing the overall reasoning capacity.

Join us for AI Startup School this June 16-17 in San Francisco!
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact