Large language models generate text by predicting the next word. Every capability, from translation to code to conversation, emerges from next-word prediction at scale.
What makes them “large” is parameters: the numbers inside the model that get adjusted during training. A 7B model has 7 billion of them. More parameters, more capacity to learn patterns from data.
An LLM before any specialization is called a foundation model. It only predicts the next word. Everything else gets built on top of that.