How do you know if the model works?
The things that matter most (helpfulness, safety, honesty) are hard to measure. The things that are easy to measure (benchmark scores, perplexity) don’t always predict real-world quality.