Contact

Scaling laws

More compute + more data + more parameters = predictably better models.

The improvement follows a predictable curve. Double the compute, performance improves by a measurable amount. Same for data, same for parameters. These relationships, called power laws, let researchers forecast a model’s performance before training it.

References
  1. Scaling laws for neural language models Kaplan et al., 2020
  2. Training compute-optimal large language models Hoffmann et al., 2022
Talk to an RL expert