Generative Artificial Intelligence

Definition

Generative AI is a category of machine learning models that learn to reproduce the probability distribution of training data and can generate new, statistically plausible content. Instead of predicting labels (classification) or values (regression), generative models produce complex, structured outputs.

Generation occurs through sampling from the learned distribution. The process is conditioned on input (prompt), allowing directional control of generated content.

Main Categories

Language Models (LLM): generate text. Dominant architecture: autoregressive Transformer. Examples: GPT-4, Claude, Llama. Training objective: predict the next token.

Image Generation Models: generate images. Main architectures:

GAN (Generative Adversarial Networks): adversarial training between generator and discriminator
Diffusion Models: iterative denoising of Gaussian noise. Current SOTA. Examples: Stable Diffusion, DALL-E 3, Midjourney
Autoregressive: VQ-VAE-2, ImageGPT (legacy, superseded by diffusion)

Multimodal Models: generate or understand text + images. Examples: GPT-4o, Claude 3.5, Gemini 1.5. Architecture: unified embedding layer for tokens and images.

Models for Other Domains: audio (Vall-E, TTS), video (Sora, Runway), code (CodeLlama), molecules (AlphaFold).

How It Works

Training: the model learns to predict the next element (token, pixel, frame) given the previous one. Minimizes a loss function measuring divergence between generated and real distributions.

Generation: the model iterates in the new context:

Takes the previous output (or initial conditioning)
Uses the neural network to compute the probability distribution of the next element
Samples from the probabilistic model
Repeats until reaching a stopping criterion

Decoding strategies:

Greedy: selects the most probable token. Fast, but repetitive
Sampling: samples from the distribution. Diverse, but sometimes incoherent
Beam search: maintains k parallel hypotheses, choosing the globally best sequence
Top-k / Top-p: samples from subset of most probable tokens, tuning diversity

Use Cases

Content creation: text generation (articles, creativity), images (design, illustrations), music.

Code generation: specialized models (CodeLlama, Copilot) generate productive code, reducing time-to-code.

Data augmentation: generate synthetic data for training discriminative models, especially when real data is scarce.

Question answering and assistance: generative LLMs provide conversational responses, document summarization, etc.

Search and recommendation: generative models produce ranking or rerank candidates based on semantic relevance.

Simulation and forecasting: generation of plausible scenarios, time series prediction.

Practical Considerations

Quality vs. speed: beam search produces better output but is 10-100x slower than greedy. Decoding strategy is a critical tuning parameter for production.

Memory: generation requires maintaining KV-cache for the entire generated sequence. At long lengths (2K+ tokens), becomes a bottleneck.

Latency: token-by-token generation has intrinsic latency. A model generating 100 tokens with TTFT of 100ms has ~1 second latency. Not suitable for applications requiring sub-100ms response.

Evaluation: traditional metrics (BLEU, ROUGE) have weak correlation with human quality. Evaluation often remains manual or with LLM-as-judge (with distortions).

Cost: inference cost is proportional to generated tokens. For high-volume applications, generation cost can exceed training cost.

Common Misconceptions

”Generative AI creates from nothing”

False. Generative models recombine patterns from training data. Every output is a consequence of conditioning + learned distribution. Originality is illusory.

”Generation is random”

Depends on temperature and decoding strategy. At low temperature (greedy), it is completely deterministic. At high temperature, it has stochastic component. Not “random” in the uncontrolled sense.

”Generative AI is always accurate if confident”

No. Model confidence does not correlate with accuracy. A model can assign high probability to false output (hallucination). External validation is necessary.

LLM: specific instance of generative model for language
Foundation Model: category of pre-trained generative models at web-scale
Prompt Engineering: art of formulating input to control generated output
Hallucination: false output that generative models produce with confidence
Diffusion Model: alternative architecture for image generation

Sources

Goodfellow, I. et al. (2014). Generative Adversarial Networks. NeurIPS
Ho, J. et al. (2020). Denoising Diffusion Probabilistic Models. NeurIPS
Radford, A. et al. (2019). Language Models are Unsupervised Multitask Learners
Stanford AI Index Report 2024: annual benchmark on AI progress