AI Concepts DefinedTerm

Large Language Model (LLM)

Also known as: LLM, Large Language Models

A deep learning model with billions of parameters, trained on massive amounts of text to understand and generate natural language.

Updated: 2026-01-03

Definition

A Large Language Model (LLM) is a deep learning model with billions of parameters, trained on massive text corpora to predict the next token in a sequence. This predictive capacity emerges as the ability to understand, generate, and manipulate natural language.

Modern LLMs are based on the Transformer architecture and undergo two training phases: pre-training on web-scale data (hundreds of billions of tokens) and subsequent alignment via RLHF or similar techniques.

Key Characteristics

Scale: frontier models have 100B-1T+ parameters. GPT-4 is estimated around 1.7T parameters (not officially confirmed). Smaller models (7B-70B) offer interesting trade-offs between performance and cost.

Emergent abilities: capabilities that only appear beyond certain scale thresholds, such as multi-step reasoning, in-context learning, and complex instruction-following. The phenomenon is documented but not fully understood.

Context window: the amount of text the model can process in a single call. Ranges from 4K tokens (legacy models) to 128K-1M+ tokens (Claude, Gemini). Directly influences possible use cases.

How It Works

The base architecture involves:

  1. Tokenization: text is converted to tokens (sub-word units) via algorithms like BPE or SentencePiece
  2. Embedding: each token becomes a dense vector
  3. Transformer layers: attention mechanisms process the sequence, capturing long-range dependencies
  4. Output: probability distribution over the vocabulary for the next token

Training occurs on next-token prediction: given a prefix, predict the next token. This seemingly simple task, at sufficient scale, produces surprising generalist capabilities.

Main Models (2025)

Closed-source: GPT-4/4o (OpenAI), Claude 3.5 (Anthropic), Gemini 1.5 (Google). Accessible only via API, with per-token costs.

Open-weights: Llama 3 (Meta), Mistral, Qwen, DeepSeek. Public weights, local deployment possible. Variable licenses (some with commercial restrictions).

Reference benchmarks: MMLU for general knowledge, HumanEval for coding, GPQA for scientific reasoning.

Practical Considerations

Costs: vary by 10-100x between models. GPT-4o costs ~$5/million input tokens, GPT-4o-mini ~$0.15. Model choice significantly impacts application TCO.

Latency: time-to-first-token (TTFT) and tokens/second vary by provider and model. For real-time applications, latency can be more constraining than cost.

Rate limits: APIs have limits on requests/minute and tokens/minute. At scale, these become an architectural constraint.

Common Misconceptions

”LLMs understand what they say”

No. They produce statistically plausible output based on learned patterns. They have no model of the world, beliefs, or understanding in the cognitive sense. This explains hallucinations.

”The largest model is always best”

It depends on the task. For many use cases, 7B-70B models fine-tuned on specific tasks outperform models 10x larger, on specific metrics, at a fraction of the cost.

”LLMs remember previous conversations”

No. Each API call is stateless. “Memory” is simulated by including history in the prompt, consuming context window.

Sources

Related Articles

Articles that cover Large Language Model (LLM) as a primary or secondary topic.