Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
espadrine
11 months ago
|
parent
|
context
|
favorite
| on:
Using reinforcement learning and $4.80 of GPU time...
That makes me wonder though what the best loss function was. I assume you used MSE on the logscore. I wonder if a sigmoid on which of two articles has the higher score would yield better results for the downstream RLHF task.
Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: