What models learn from. Different training phases need different data.
Pretraining consumes raw text at massive scale. SFT needs structured input-output pairs. Models can also generate their own training data.