How does one just use token and train a model? I thought you need to label each ...

abrichr · on April 18, 2023

You are thinking of supervised learning, in which a model is trained to generate labels for inputs.

In self-supervised learning, the training target is a modified version of the input.

Transformers are trained in a self supervised manner. The problem they solve is "given a sequence of N tokens, what is the next most likely token?"

No labels required :)

m3kw9 · on April 18, 2023

What about the human reinforcement part of it?

TOMDM · on April 18, 2023

There's no traditional human reinforcement.

Models like gpt3 get turned into models like ChatGPT through RLHF (reinforcement learning from human feedback), by fine-tuning the model further on prompts in the style we'd like them to respond in, typically

User: question

Bot: Response

This is done by handcrafting or modifying data from places like stack exchange.