Contact

Next-token prediction

Predict the next token, given all previous tokens.

That’s the self-supervised objective behind pretraining. The model outputs a probability distribution over its entire vocabulary. The actual next token in the training data is the answer. Adjust the parameters to make the right token more probable. Repeat, billions of times. From this single objective, the model learns grammar, facts, reasoning, code. Not because it’s told to, but because predicting well requires understanding.

Talk to an RL expert