Train a small adapter instead of the full model.
Full SFT updates every parameter in the model. For a 7B model, that’s expensive. LoRA freezes the original parameters and adds a tiny set of new ones to specific layers. Typically less than 1% of the model. Train those, leave everything else untouched. The result is a lightweight adapter that can be swapped in and out: one base model, many specializations.
LoRA still loads the full frozen model into memory. QLoRA shrinks it first: aggressively compress the base model, then train the adapter on top. Uses a fraction of the memory, runs faster, and makes fine-tuning possible on hardware that couldn’t hold the original model.