Contact
← Home Evaluation

Evaluation

How do you know if the model works?

The things that matter most (helpfulness, safety, honesty) are hard to measure. The things that are easy to measure (benchmark scores, perplexity) don’t always predict real-world quality.

Talk to an RL expert