Hallucination (AI)

Definition

A hallucination is an output generated by an AI model that appears linguistically fluent and confident but contains false, invented, or unsupported information.

The term borrows from psychology but technically “confabulation” is more accurate: the model doesn’t perceive things that don’t exist, but generates plausible output based on statistical patterns, without verifying factual correctness.

Types

Intrinsic hallucination: output directly contradicts the provided input or context. Example: in a summarization task, the summary includes facts not present in the original document.

Extrinsic hallucination: output includes unverifiable information from the input, which may or may not be true. Example: the model adds unrequested details that cannot be confirmed from context.

Factual hallucination: invented facts presented as true. Non-existent citations, false statistics, events that never occurred.

Faithfulness hallucination: in tasks requiring fidelity to a source (RAG, summarization), output diverges from source content.

Why They Occur

LLMs are trained to predict the most probable token given a sequence. They lack an internal model of “truth” or fact-checking mechanisms. If a statistical pattern produces plausible output, the model generates it, regardless of factual correctness.

Factors that increase hallucinations:

Questions about facts underrepresented in training
Requests for specific details (names, dates, numbers)
High temperature (more randomness in generation)
Insufficient or ambiguous context

Mitigation

RAG: provide reference documents in context. Reduces but does not eliminate hallucinations.

Grounding: constrain output to specific sources, require verifiable citations.

Low temperature: reduces randomness, more conservative and repeatable output.

Prompt engineering: explicit instructions (“answer only based on provided context”, “if unsure, say so”).

Validation pipeline: automatic verification of output against authoritative sources or business rules.

Human-in-the-loop: human review for critical applications.

Common Misconceptions

”Hallucinations are a bug that will be fixed”

No. They are a consequence of the architecture itself. They can be mitigated, not eliminated. Models generate probabilistically plausible output, not verified output.

”If the model is confident, it’s correct”

Linguistic confidence level does not correlate with accuracy. Models are trained to produce fluent output, not calibrated on their own uncertainty.

”RAG solves the problem”

RAG reduces hallucinations by providing factual context, but the model can still ignore, misunderstand, or integrate the context with invented information.

LLM: models subject to hallucinations
RAG: technique for partial mitigation
Prompt Engineering: techniques to reduce hallucinations

Sources

Huang, L. et al. (2023). A Survey on Hallucination in Large Language Models. arXiv
Ji, Z. et al. (2023). Survey of Hallucination in Natural Language Generation. ACM Computing Surveys

Definition

Types

Why They Occur

Mitigation

Common Misconceptions

”Hallucinations are a bug that will be fixed”

”If the model is confident, it’s correct”

”RAG solves the problem”

Sources

Related Articles

Constitutional AI: A Guide for Claude Users

Hallucination (AI)

Definition

Types

Why They Occur

Mitigation

Common Misconceptions

”Hallucinations are a bug that will be fixed”

”If the model is confident, it’s correct”

”RAG solves the problem”

Related Terms

Sources

Related Articles

Constitutional AI: A Guide for Claude Users