What happens when the model runs.
Training builds the model. Inference is where it generates output. Two dimensions: getting better answers by spending more compute per request, and getting answers faster by spending hardware more efficiently.