Definition
Generative AI is a category of machine learning models that learn to reproduce the probability distribution of training data and can generate new, statistically plausible content. Instead of predicting labels (classification) or values (regression), generative models produce complex, structured outputs.
Generation occurs through sampling from the learned distribution. The process is conditioned on input (prompt), allowing directional control of generated content.
Main Categories
Language Models (LLM): generate text. Dominant architecture: autoregressive Transformer. Examples: GPT-4, Claude, Llama. Training objective: predict the next token.
Image Generation Models: generate images. Main architectures:
- GAN (Generative Adversarial Networks): adversarial training between generator and discriminator
- Diffusion Models: iterative denoising of Gaussian noise. Current SOTA. Examples: Stable Diffusion, DALL-E 3, Midjourney
- Autoregressive: VQ-VAE-2, ImageGPT (legacy, superseded by diffusion)
Multimodal Models: generate or understand text + images. Examples: GPT-4o, Claude 3.5, Gemini 1.5. Architecture: unified embedding layer for tokens and images.
Models for Other Domains: audio (Vall-E, TTS), video (Sora, Runway), code (CodeLlama), molecules (AlphaFold).
How It Works
Training: the model learns to predict the next element (token, pixel, frame) given the previous one. Minimizes a loss function measuring divergence between generated and real distributions.
Generation: the model iterates in the new context:
- Takes the previous output (or initial conditioning)
- Uses the neural network to compute the probability distribution of the next element
- Samples from the probabilistic model
- Repeats until reaching a stopping criterion
Decoding strategies:
- Greedy: selects the most probable token. Fast, but repetitive
- Sampling: samples from the distribution. Diverse, but sometimes incoherent
- Beam search: maintains k parallel hypotheses, choosing the globally best sequence
- Top-k / Top-p: samples from subset of most probable tokens, tuning diversity
Use Cases
Content creation: text generation (articles, creativity), images (design, illustrations), music.
Code generation: specialized models (CodeLlama, Copilot) generate productive code, reducing time-to-code.
Data augmentation: generate synthetic data for training discriminative models, especially when real data is scarce.
Question answering and assistance: generative LLMs provide conversational responses, document summarization, etc.
Search and recommendation: generative models produce ranking or rerank candidates based on semantic relevance.
Simulation and forecasting: generation of plausible scenarios, time series prediction.
Practical Considerations
Quality vs. speed: beam search produces better output but is 10-100x slower than greedy. Decoding strategy is a critical tuning parameter for production.
Memory: generation requires maintaining KV-cache for the entire generated sequence. At long lengths (2K+ tokens), becomes a bottleneck.
Latency: token-by-token generation has intrinsic latency. A model generating 100 tokens with TTFT of 100ms has ~1 second latency. Not suitable for applications requiring sub-100ms response.
Evaluation: traditional metrics (BLEU, ROUGE) have weak correlation with human quality. Evaluation often remains manual or with LLM-as-judge (with distortions).
Cost: inference cost is proportional to generated tokens. For high-volume applications, generation cost can exceed training cost.
Common Misconceptions
”Generative AI creates from nothing”
False. Generative models recombine patterns from training data. Every output is a consequence of conditioning + learned distribution. Originality is illusory.
”Generation is random”
Depends on temperature and decoding strategy. At low temperature (greedy), it is completely deterministic. At high temperature, it has stochastic component. Not “random” in the uncontrolled sense.
”Generative AI is always accurate if confident”
No. Model confidence does not correlate with accuracy. A model can assign high probability to false output (hallucination). External validation is necessary.
Related Terms
- LLM: specific instance of generative model for language
- Foundation Model: category of pre-trained generative models at web-scale
- Prompt Engineering: art of formulating input to control generated output
- Hallucination: false output that generative models produce with confidence
- Diffusion Model: alternative architecture for image generation
Sources
- Goodfellow, I. et al. (2014). Generative Adversarial Networks. NeurIPS
- Ho, J. et al. (2020). Denoising Diffusion Probabilistic Models. NeurIPS
- Radford, A. et al. (2019). Language Models are Unsupervised Multitask Learners
- Stanford AI Index Report 2024: annual benchmark on AI progress