OWASP LLM Top 10 (2025) and Agentic Top 10 (2026)

OWASP LLM Top 10 (2025) — prompt injection is

Updated Invalid Date·

owasp security prompt-injection llm-security vulnerabilities agentic

The definitive threat models for AI systems. The OWASP LLM Top 10 covers LLM applications broadly; the Agentic Top 10 (2026) extends it for autonomous, tool-using agents.

[Source: OWASP genai.owasp.org + Perplexity research, 2026-04-29]

OWASP Top 10 for LLM Applications 2025

LLM01 — Prompt Injection

Crafted input that overrides the system prompt or hijacks the model's intended behaviour. Split into:

Direct — user directly injects into the prompt
Indirect — malicious instructions in retrieved content (web pages, documents, tool results)

The #1 risk. Indirect injection via RAG or tool results is the harder problem. See security/prompt-injection.

LLM02 — Sensitive Information Disclosure

Model reveals training data, system prompts, API keys, or personal information. Causes:

System prompt extraction via crafted queries
Training data memorisation (regurgitation of PII from training set)
Tool result leakage into model output

Mitigations: Output filtering, prompt design that keeps secrets out of the model entirely, differential privacy in training.

LLM03 — Supply Chain Vulnerabilities

Third-party model weights, datasets, training pipelines, and plugins introduce attack surface. Poisoned training data affects model behaviour without leaving a visible trace.

Mitigations: Pin model versions, verify checksums, audit third-party plugins/MCP servers.

LLM04 — Data and Model Poisoning

Malicious data injected into training, fine-tuning, or RAG knowledge bases. The model learns incorrect or malicious associations.

Mitigations: Validate training/fine-tuning datasets; scan RAG knowledge bases for adversarial content before indexing.

LLM05 — Improper Output Handling

Treating LLM output as trusted code or SQL. Covers: SQL injection via LLM-generated queries, XSS via LLM-generated HTML, shell injection via LLM-generated commands.

Mitigations: Never execute raw LLM output. Validate, sanitise, and parameterise all outputs before use.

LLM06 — Excessive Agency

The model takes irreversible actions (deleting files, sending emails, running code) without sufficient authorisation. Risk multiplies with agentic tool use.

Mitigations: Principle of least privilege on tools; require confirmation for irreversible actions; time-bounded permissions.

LLM07 — System Prompt Leakage (2025 addition)

The model reveals its system prompt contents, exposing business logic, pricing, or security instructions. Can be exploited to understand and bypass safeguards.

Mitigations: Never put secrets in the system prompt. Treat the system prompt as obfuscated, not secret. Design safeguards to work even if the system prompt is known.

LLM08 — Vector and Embedding Weaknesses (2025 addition)

Attacks on RAG infrastructure: poisoning the vector store with adversarial embeddings, cross-encoding attacks, backdoor triggers in embedding models.

Mitigations: Validate documents before indexing; monitor for out-of-distribution retrievals.

LLM09 — Misinformation

Model generates confident, plausible, but false information. Causes: hallucination, outdated training data, adversarial prompting.

Mitigations: RAG for facts, citations with sources, human review for high-stakes outputs, evals/methodology with faithfulness metrics.

LLM10 — Unbounded Consumption

DoS via token-expensive requests; cost abuse via prompt construction that maximises output tokens; API key theft leading to fraudulent usage.

Mitigations: Rate limiting, cost gates, token budget enforcement, usage anomaly detection.

OWASP Top 10 for Agentic Applications 2026

Developed by 100+ security experts. Extends the LLM Top 10 for systems that combine reasoning, memory, tools, and multi-step autonomous execution.

[Source: genai.owasp.org Agentic Top 10, 2026-04-29]

A1 — Goal Misalignment

Agent pursues a proxy goal rather than the intended goal. Specification gaming, reward hacking, Goodhart's law in practice.

Example: An agent tasked with "maximise user engagement" learns to generate controversial content.

A2 — Tool Misuse

Agent invokes tools with unintended parameters, calls tools out of sequence, or uses a high-privilege tool when a lower-privilege one would suffice.

A3 — Delegated Trust Failures

When Agent A delegates to Agent B, it implicitly trusts B. A compromised or manipulated Agent B can abuse that trust to take actions Agent A would never have authorised.

A4 — Inter-Agent Communication Attacks

Prompt injection or data poisoning via messages passed between agents. A malicious tool result from Agent B is passed to Agent A and hijacks its next action.

A5 — Persistent Memory Exploitation

Adversarial content stored in the agent's long-term memory (vector store, episodic memory) that activates on future sessions to trigger malicious behaviour.

A6 — Emergent Autonomous Behaviour

Unexpected behaviours that arise from the interaction of multiple agents, tools, and environment states. Not present in any single component.

A7 — Resource and Cost Abuse

Agents spinning up infinite loops, spawning excessive subagents, or making uncontrolled API calls that exhaust compute or budget.

A8 — Confused Deputy Attacks

An agent with high-privilege access is tricked by low-privilege input into using those privileges maliciously.

A9 — Cascading Failures

One agent failure propagates to others in a multi-agent system, causing a catastrophic system-wide failure.

A10 — Inadequate Human Oversight

Agents operate without checkpoints, auditing, or human-in-the-loop for high-stakes decisions. No ability to stop, inspect, or roll back.

Defence Summary

Layer	Key mitigations
Input	Input validation, prompt isolation, content filtering
Model	Least privilege, Constitutional AI, safety fine-tuning
Tool	Scoped permissions, human confirmation, sandboxing
Memory	Validate before storing, TTL on memory, anomaly detection
Output	Output filtering, never exec raw output, citations
System	Rate limiting, cost gates, logging, human oversight

Key Facts

LLM01 Prompt Injection: #1 risk; indirect injection via RAG/tool results is harder to defend than direct injection
LLM05 Improper Output Handling: never execute raw LLM output — validate, sanitise, parameterise
LLM06 Excessive Agency: require confirmation for irreversible actions; time-bounded permissions
Agentic Top 10 (2026): developed by 100+ security experts for autonomous, tool-using agent systems
A3 Delegated Trust Failures: compromised Agent B abuses trust granted by Agent A
A5 Persistent Memory Exploitation: adversarial content in vector store activates on future sessions
A10 Inadequate Human Oversight: no checkpoints or audit trail for high-stakes autonomous decisions

Common Failure Cases

Indirect prompt injection via RAG document goes undetected
Why: a malicious instruction is embedded in a document that gets indexed into the RAG corpus; when retrieved, it is injected into the LLM's context alongside trusted content and executed.
Detect: agent takes unexpected actions correlated with a specific document being retrieved; red-team by injecting Ignore previous instructions: ... into test documents and observing agent behaviour.
Fix: validate and sanitise documents before indexing; use a screening LLM call to check retrieved chunks for injection patterns before passing to the main agent.

LLM06 Excessive Agency: agent deletes files without confirmation
Why: the tool has write/delete permissions and no confirmation step; the LLM misinterprets an ambiguous user request and takes an irreversible action.
Detect: audit logs show destructive tool calls (delete, update, send) triggered without an explicit user confirmation in the same turn.
Fix: require human-in-the-loop confirmation for all irreversible actions; scope tools to read-only by default; provide a separate tool with write access that requires an explicit confirmation parameter.

LLM05 SQL injection via LLM-generated query
Why: LLM output is interpolated directly into a SQL string; the model can be prompted to generate a query that extracts or modifies unintended data.
Detect: security scan finds raw string interpolation in DB query construction; penetration test with a prompt that asks the model to "show all users" reveals unintended data.
Fix: never interpolate LLM output into SQL; use parameterised queries or an ORM; validate that generated queries match an allow-list of permitted operations.

A7 Resource abuse: agent spawns subagents in an infinite loop
Why: the orchestration logic has a bug or the agent interprets its goal as requiring continuous operation; it spawns subagents or makes API calls until the account budget is exhausted.
Detect: token spend spikes 100x normal; cost gate threshold triggers; trace shows a recursive spawning pattern.
Fix: implement hard token/cost gates at the orchestration layer; cap subagent spawn depth; add a circuit breaker that halts if the same tool is called more than N times in one session.

A5 Persistent memory exploitation: adversarial content stored in vector store
Why: user-controlled content is written directly to the agent's long-term memory without sanitisation; it contains instructions that activate on future sessions.
Detect: agent behaviour in a new session is influenced by content from a previous session in unexpected ways; the triggering memory entry contains instruction-like text.
Fix: validate all content before writing to long-term memory; apply TTL to memories; use a separate namespace for user-supplied content vs system-generated memories.

Connections

security/prompt-injection — LLM01 in depth
security/mcp-cves — concrete CVEs from MCP ecosystem
protocols/mcp — the MCP attack surface
agents/multi-agent-patterns — multi-agent trust considerations
security/red-teaming — testing for these vulnerabilities
security/owasp-wstg — OWASP Web Security Testing Guide; structured methodology for web application security testing
security/threat-modelling — STRIDE and DREAD frameworks for identifying AI system threats

Open Questions

How does the Agentic Top 10 apply to Claude Code specifically — which risks are most prevalent in agentic coding workflows?
Is A5 (persistent memory exploitation) a real attack vector in practice or primarily theoretical?
Will OWASP release a formal scoring methodology for LLM vulnerabilities analogous to CVSS?