Structured input-output pairs for SFT.
A pretrained model predicts the next token. It doesn’t know how to follow instructions. Instruction data fixes that: each example is a prompt (“Summarize this article”) paired with the desired response. Humans write these, or they’re generated synthetically.
InstructGPT used ~13,000 human-written demonstrations. The Flan collection compiled hundreds of thousands from existing academic datasets, reformatted as instructions. Quality and diversity tend to matter more than raw volume.