I think this is on the right track, but I think it's a byproduct of the reinforc... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		CGamesPlay 3 months ago \| parent \| context \| favorite \| on: I'm absolutely right I think this is on the right track, but I think it's a byproduct of the reinforcement learning, rather than something hard-coded. Basically, the model has to train itself to follow the user's instruction, so by starting a response with "You're absolutely right!", it puts the model into the thought pattern of doing whatever the user said.

layer8 3 months ago [–]

"Thought pattern" might be overstating it. The fact that "You're absolutely right!" is statistically more likely to precede something consistent with the user's intent than something that isn't, might be enough of an explanation.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact