Guardrails and Output Validation
Runtime enforcement of LLM output contracts — ensuring models return valid structure, safe content, and correct format before that output reaches your users or downstream systems.
LLM Red Teaming Tools
Five automated tools for adversarially testing LLM applications: Garak (pre-deployment scanner), PyRIT (enterprise multi-turn attack framework), Promptfoo (eval + security combined), NeMo Guardrails (runtime filtering), and DeepTeam (DeepEval-integrated red teaming).
MCP Security CVEs
MCP is the largest new AI attack surface of 2026 — a systemic STDIO RCE vulnerability affects all official SDKs, with 6+ named CVEs and up to 200,000 vulnerable instances in the wild.
OAuth 2.0 Boundary Testing
OAuth boundary testing verifies that scoped tokens can't exceed their declared scope, PKCE can't be downgraded, and tokens can't be replayed or audience-swapped. Test the negative cases, not just the happy path.
OWASP LLM Top 10 (2025) and Agentic Top 10 (2026)
OWASP LLM Top 10 (2025) — prompt injection is
OWASP Web Security Testing Guide (WSTG)
WSTG is a structured, test-ID-driven methodology for web application security testing — not a risk list. Use it to scope engagements, document findings against test IDs, and produce defensible deliverables.
Prompt Injection
OWASP LLM01 — indirect injection via RAG/tool results is the hard problem; XML privilege separation, "flag injection attempts" instructions, and least-privilege tools are the primary defences; no complete solution exists.
Red Teaming LLM Systems
Red teaming finds failure modes evals miss — four jailbreak types, automated variant generation, LLM-as-judge scoring, four severity tiers (critical/high/medium/low), and CI regression suites.
Security Scorecard Methodology
How to design a defensible, reproducible composite security score — covering OpenSSF Scorecard's weighted 0-10 model, CVSS's base/temporal/environmental split, graduated vs pass/fail tradeoffs, and category weighting conventions.
Snyk
Snyk is a developer-first security platform that replaces raw CVSS severity with a composite Priority Score (0-1000) incorporating reachability, exploit maturity, and an ML-based Risk Score layer to cut noise and direct remediation effort to vulnerabilities that can actually be reached and exploited.
Socket.dev
Socket.dev is a supply chain security scanner that detects malicious package behavior (install scripts, network access, obfuscation) through static analysis — catching attacks that CVE-only tools miss.
Threat Modelling
Structured methodology for identifying and prioritising security threats at design time — covers STRIDE categorisation, DREAD scoring, PASTA risk analysis, workshop facilitation, and mapping threats to test cases.