Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How does one just use token and train a model? I thought you need to label each one on what it means?


You are thinking of supervised learning, in which a model is trained to generate labels for inputs.

In self-supervised learning, the training target is a modified version of the input.

Transformers are trained in a self supervised manner. The problem they solve is "given a sequence of N tokens, what is the next most likely token?"

No labels required :)


What about the human reinforcement part of it?


There's no traditional human reinforcement.

Models like gpt3 get turned into models like ChatGPT through RLHF (reinforcement learning from human feedback), by fine-tuning the model further on prompts in the style we'd like them to respond in, typically

User: question

Bot: Response

This is done by handcrafting or modifying data from places like stack exchange.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: