Contact

Instruction data

Structured input-output pairs for SFT.

A pretrained model predicts the next token. It doesn’t know how to follow instructions. Instruction data fixes that: each example is a prompt (“Summarize this article”) paired with the desired response. Humans write these, or they’re generated synthetically.

InstructGPT used ~13,000 human-written demonstrations. The Flan collection compiled hundreds of thousands from existing academic datasets, reformatted as instructions. Quality and diversity tend to matter more than raw volume.

Talk to an RL expert