Hallucination

Hallucination is a fundamental property of LLMs (not a bug) — covering why it happens, six types, detection methods (faithfulness checks, self-consistency sampling), and six mitigation strategies with RAG as the most effective.

Updated Invalid Date·

hallucination grounding factuality rag citations detection mitigation

When a model generates confident, fluent, plausible-sounding output that is factually wrong. The model isn't lying. It has no concept of truth. It's pattern-matching on training data and producing statistically likely continuations. Hallucination is a fundamental property of how LLMs work, not a bug to be fixed.

Why It Happens

LLMs are trained to produce probable next tokens, not true statements. During training, the model learns patterns like "the capital of France is ___" → "Paris", but also learns to complete sentences confidently even when it has no reliable signal.

Root causes:

Knowledge gaps: the training data didn't contain the fact, so the model interpolates from related patterns
Outdated knowledge: training cutoff means recent events, prices, versions are unknown
Long-tail facts: rare facts appear few times in training data — the signal is weak
Retrieval failure in context: even with facts in the context window, models sometimes ignore them and generate from parametric memory
Sycophancy: models trained on human feedback learn to say what sounds good, not what's true

Types of Hallucination

Type	Example	Detection
Factual fabrication	Wrong date, made-up statistic	Cross-reference source
Citation fabrication	Real author, fake paper title	Check the citation exists
Entity confusion	Mixing up two similar people/companies	Named entity verification
Temporal error	Stating a future event as past	Date validation
Numeric error	Wrong calculation, wrong figure	Independent calculation
Context ignore	Ignoring provided document, answering from memory	Faithfulness check vs source

Detection

LLM-as-Judge Faithfulness Check

For RAG systems: check whether the answer is grounded in the retrieved context.

import anthropic

client = anthropic.Anthropic()

def check_faithfulness(context: str, answer: str) -> dict:
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",  # cheap for bulk checking
        max_tokens=200,
        messages=[{
            "role": "user",
            "content": f"""Is the following answer fully supported by the provided context?
Answer only with JSON: {{"faithful": true/false, "unsupported_claims": ["list any claims not in context"]}}

Context:
{context}

Answer:
{answer}"""
        }],
    )
    import json
    return json.loads(response.content[0].text)

result = check_faithfulness(
    context="Anthropic was founded in 2021 by Dario Amodei and others.",
    answer="Anthropic was founded in 2021 by Dario Amodei and Sam Altman.",
)
# {"faithful": false, "unsupported_claims": ["Sam Altman co-founded Anthropic"]}

Self-Consistency Sampling

Run the same query multiple times with temperature > 0. If answers disagree, the model is uncertain.

def self_consistency_check(prompt: str, n: int = 5) -> dict:
    answers = []
    for _ in range(n):
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=100,
            messages=[{"role": "user", "content": prompt}],
        )
        answers.append(response.content[0].text.strip())

    from collections import Counter
    counts = Counter(answers)
    most_common, freq = counts.most_common(1)[0]
    confidence = freq / n

    return {
        "answer": most_common,
        "confidence": confidence,   # 1.0 = all identical, 0.2 = all different
        "all_answers": answers,
    }

result = self_consistency_check("What year was the Eiffel Tower built?")
# confidence 1.0 → reliable. confidence 0.4 → uncertain, verify.

Uncertainty Probing

Ask the model if it's sure:

def probe_uncertainty(claim: str) -> str:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=150,
        system="Be honest about uncertainty. If you're not sure, say so explicitly.",
        messages=[{
            "role": "user",
            "content": f"How confident are you in this claim, and why? Claim: {claim}"
        }],
    )
    return response.content[0].text

Claude is better calibrated than most models. It will often express genuine uncertainty when it has it. Don't suppress this with prompts like "answer confidently".

Mitigation

1. RAG — Ground Every Answer in Retrieved Sources

The single most effective mitigation for factual hallucination. Force the model to answer from retrieved documents, not parametric memory.

def grounded_answer(question: str) -> dict:
    # Retrieve relevant documents
    docs = vector_store.search(question, k=5)
    context = "\n\n".join(d.content for d in docs)

    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=512,
        system="""Answer using ONLY the provided context.
If the answer is not in the context, say "I don't have information about that in the provided documents."
Do not use any knowledge outside the context.""",
        messages=[{
            "role": "user",
            "content": f"Context:\n{context}\n\nQuestion: {question}"
        }],
    )
    return {
        "answer": response.content[0].text,
        "sources": [d.metadata["source"] for d in docs],
    }

2. Citations — Force the Model to Point to Its Sources

SYSTEM_PROMPT = """You are a research assistant.
When you make a factual claim, cite the source document using [1], [2], etc.
At the end, list your sources.
If you cannot cite a source for a claim, do not make that claim."""

Citations serve two purposes: they let humans verify, and they force the model to stay grounded, if it can't cite something, it shouldn't say it.

3. Constrain the Output Space

Reduce hallucination by reducing degrees of freedom:

# Instead of open-ended generation, use structured output
from pydantic import BaseModel

class FactualResponse(BaseModel):
    answer: str
    confidence: float    # 0-1
    source_quote: str    # direct quote from context supporting the answer
    caveat: str | None   # any uncertainty to flag

# The model must find a source_quote — can't fabricate without one

4. Temperature = 0 for Factual Tasks

Higher temperature = more creative = more hallucination. For factual Q&A, use temperature 0.

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=200,
    messages=[{"role": "user", "content": factual_question}],
    # temperature defaults to 1.0 — explicitly set to 0 for factual tasks
)
# Note: Anthropic API doesn't expose temperature parameter directly in all SDKs
# but the default is well-calibrated for factual tasks

5. Decompose Complex Queries

Long, multi-part queries increase hallucination risk. Break them down:

# Instead of: "Compare the funding, team size, safety approach, and benchmark performance of Anthropic and OpenAI"
# Do:
questions = [
    "What is Anthropic's total funding?",
    "What is OpenAI's total funding?",
    "How does Anthropic approach AI safety?",
    "How does OpenAI approach AI safety?",
]
answers = [grounded_answer(q) for q in questions]
# Then synthesise

6. Post-Generation Verification

For high-stakes outputs, verify facts automatically:

import re

def extract_and_verify_claims(text: str) -> list[dict]:
    """Extract factual claims and verify each one."""
    # Extract claims using Claude
    claims_response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=500,
        messages=[{
            "role": "user",
            "content": f"List all factual claims in this text as a JSON array of strings:\n\n{text}"
        }],
    )
    claims = json.loads(claims_response.content[0].text)

    results = []
    for claim in claims:
        # Verify each claim against your knowledge base
        verification = check_faithfulness(knowledge_base_text, claim)
        results.append({"claim": claim, "verified": verification["faithful"]})

    return results

Hallucination by Task Type

Task	Hallucination risk	Mitigation
Summarising a provided document	Low	Faithfulness check
Q&A with RAG context	Low-medium	Citations, faithfulness check
Creative writing	N/A — factual accuracy irrelevant
Code generation	Medium (wrong APIs, wrong syntax)	Run the code, type check
Factual Q&A (no context)	High	RAG, self-consistency
Dates, numbers, names	High	Always verify externally
Citations / bibliography	Very high	Never trust, always verify

Claude Specifics

Claude is better calibrated than average for expressing uncertainty. It tends to say "I'm not certain but..." rather than fabricating confidently. This is partly from Constitutional AI training. Honesty is an explicit value.

Claude's hallucination rate drops significantly on tasks where you:

Provide the context it should answer from
Ask it to cite specific quotes
Explicitly tell it to say "I don't know" when uncertain

Do not prompt Claude to "always give a confident answer". This actively increases hallucination.

Key Facts

Self-consistency confidence: 1.0 = all 5 runs identical (reliable); 0.2-0.4 = uncertain (verify externally)
Temperature 0 for factual Q&A tasks; default temperature is calibrated for general tasks
Citations as mitigation: forcing the model to cite prevents fabrication because it can't cite what doesn't exist
Never prompt Claude to "always give a confident answer" — actively increases hallucination rate
Citation fabrication (real author, fake paper) is very high risk — never trust LLM-generated citations
Claude expresses genuine uncertainty more than average models (Constitutional AI honesty training)

Common Failure Cases

RAG system returns retrieved context but the model ignores it and answers from parametric memory Why: if the retrieved context and the model's parametric knowledge conflict, the model may default to what it "knows" especially when the context is long or the key fact is buried; without an explicit grounding instruction, the model treats context as optional. Detect: the faithfulness check shows faithful: false and flags claims not supported by the retrieved documents; the answer matches common training-data patterns rather than the provided source. Fix: add an explicit system instruction — "Answer using ONLY the provided context. If the answer is not in the context, say so." — and test with a faithfulness checker on every production RAG response.

Self-consistency check reports high confidence (1.0) but the answer is still wrong Why: self-consistency detects uncertainty, not factual error; if the training data consistently contained the same wrong fact (e.g., a widely repeated misconception), all five samples will agree on the wrong answer. Detect: self-consistency confidence is 1.0 but an external ground-truth check or human review shows the answer is incorrect; the claim is a commonly repeated misconception. Fix: self-consistency cannot substitute for external verification on high-stakes facts; always pair it with a source-grounded check or a Perplexity/web lookup for claims where being wrong has consequences.

LLM-as-judge faithfulness checker itself hallucinates, classifying unfaithful responses as faithful Why: the judge model is subject to the same hallucination tendencies as the target model; on short or poorly specified contexts, the judge may convince itself that a fabricated claim is implicit in the document. Detect: manual audit of a random 10% sample of judge outputs shows disagreement rate above 10%; the judge systematically misses false citations or numeric errors. Fix: calibrate the judge on a gold set of known faithful and unfaithful examples before deploying it; use Haiku for cost, but spot-check Haiku judge outputs against Sonnet on a sample to confirm calibration.

Citation-forcing instruction causes the model to fabricate citations rather than admit it cannot cite Why: when instructed to always cite sources, models under pressure to provide an answer may generate plausible-sounding but non-existent citations (real author name, invented paper title) rather than refusing to answer. Detect: the citation URLs or DOIs return 404 errors; the paper title does not appear in Google Scholar or Semantic Scholar; the model cited a real researcher for a paper they did not write. Fix: instruct the model explicitly that "I cannot cite a source for this claim" is a valid and required response; add a post-generation citation verification step that checks cited URLs before the response is returned to the user.

Connections

rag/pipeline — RAG is the primary and most effective hallucination mitigation
evals/llm-as-judge — faithfulness evaluation methodology for systematic detection
prompting/techniques — prompting patterns to reduce hallucination risk
safety/constitutional-ai — how honesty as an explicit value is trained into Claude
llms/claude — Claude's calibration characteristics and uncertainty expression

Open Questions

Does extended thinking mode increase or decrease hallucination rate on factual tasks?
Is there a reliable way to quantify hallucination rate per task domain without human labelling?
How does RAG faithfulness change when the retrieved context itself contains inaccurate information?