From: https://x.com/zzlccc/status/1903162768083259703 DeepSeek-V3-Base already e... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

delifue 61 days ago | parent | context | favorite | on: Understanding R1-Zero-Like Training: A Critical Pe...

From: https://x.com/zzlccc/status/1903162768083259703

DeepSeek-V3-Base already exhibits "Aha moment" before RL-tuning

The ever-increasing output length in RL-tuning might be due to a BIAS in GRPO

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact