AI Application Architecture Patterns
Seven blueprints (RAG chatbot, document pipeline, classification routing, agentic loop, multi-agent, eval pipeline, hybrid human-AI) cover 90% of production AI applications — real systems combine two or three.
AI Engineering Learning Path
Four-stage curriculum for software engineers entering AI engineering — Foundations (1-2 weeks), Building (2-3 weeks), Production (2-3 weeks), Advanced (ongoing) — each stage has a concrete project to build.
Data as a System
Data as a first-class system concern — lineage, contracts, freshness, ownership, and consistency across services. Most production bugs are data bugs, not code bugs.
Debug: Agent Loop Not Terminating
Runbook for diagnosing agents that spin indefinitely, hit max iterations, or get stuck on a tool call.
Debug: Agent Not Using Tools
Runbook for diagnosing LLM agents that ignore available tools and answer from memory instead.
Debug: Alert Firing Incorrectly
Runbook for diagnosing alerts that fire when nothing is wrong, or fail to fire when something is.
Debug: API Timeout
Runbook for diagnosing intermittent or consistent API timeouts under load.
Debug: Auth Failing
Runbook for diagnosing 401 and 403 errors, JWT failures, token expiry, and OAuth misconfiguration.
Debug: Cache Inconsistency
Runbook for diagnosing stale, wrong, or missing data caused by cache inconsistency.
Debug: CI Pipeline Failing
Runbook for diagnosing CI pipelines that fail in CI but pass locally.
Debug: Cloud Cost Spike
Runbook for diagnosing unexpected cloud bill spikes and identifying the source of runaway spend.
Debug: CORS Error
Runbook for diagnosing CORS errors blocking browser requests to an API.
Debug: Data Pipeline Failing
Runbook for diagnosing silent data drops, stale outputs, or broken ingestion in data pipelines.
Debug: Database Migration Failing
Runbook for diagnosing DB migrations that stall, lock tables, or break running services.
Debug: Deadlock
Runbook for diagnosing DB or code deadlocks where requests hang indefinitely waiting for locks.
Debug: DNS Resolution Failing
Runbook for diagnosing DNS resolution failures causing services to be unreachable by name.
Debug: Duplicate Writes from Retries
Runbook for diagnosing duplicate records or side effects caused by retries or at-least-once delivery.
Debug: Embedding Quality Degraded
Runbook for diagnosing RAG accuracy drops caused by embedding model changes, stale indexes, or distribution shift.
Debug: Fine-Tuned Model Worse Than Base
Runbook for diagnosing a fine-tuned model that performs worse than the base model or regresses after training.
Debug: Flaky Test in CI
Runbook for diagnosing tests that pass locally but fail intermittently in CI.
Debug: Hallucination in Production
Runbook for diagnosing LLM confidently returning wrong, fabricated, or outdated information in production.
Debug: High CPU
Runbook for diagnosing a process with CPU pegged at 100% or consistently high under load.
Debug: High Error Rate After Deploy
Runbook for diagnosing a spike in errors or 5xx responses immediately after a deployment.
Debug: Kubernetes Pod Not Starting
Runbook for diagnosing pods stuck in Pending, CrashLoopBackOff, or ImagePullBackOff.
Debug: LLM High Latency
Runbook for diagnosing slow LLM responses, stalling streams, or high time-to-first-token.
Debug: Memory Leak
Runbook for diagnosing process memory that climbs over time and never releases.
Debug: No Logs in Production
Runbook for diagnosing missing or incomplete logs in production when they work locally.
Debug: Prompt Injection Detected
Runbook for diagnosing and responding to prompt injection attacks in LLM applications and agents.
Debug: RAG Pipeline Slow
Runbook for diagnosing RAG pipelines where retrieval is slow, adding seconds of latency before the LLM even starts.
Debug: RAG Returning Wrong Context
Runbook for diagnosing RAG pipelines returning irrelevant, incomplete, or hallucinated answers.
Debug: Scaling Not Triggering
Runbook for diagnosing autoscalers that fail to scale up under load or scale down after load drops.
Debug: Secret Leaked
Runbook for the first 30 minutes after discovering a leaked credential or secret in a public location.
Debug: Slow Query
Runbook for diagnosing slow database queries causing API latency or timeouts.
Debug: SSL Certificate Error
Runbook for diagnosing TLS handshake failures, expired certificates, and chain errors.
Debug: WebSocket Connection Dropping
Runbook for diagnosing WebSocket connections that drop, fail to connect, or cause reconnect storms.
Debugging Runbooks
Index of 32 production debugging runbooks organised by failure domain. Each runbook is a step-by-step guide for isolating and fixing a specific class of production failure.
Engineering Tradeoffs
The decisions that separate senior engineers — when to cache, when to use RAG vs fine-tuning, when to scale up vs out, when to accept inconsistency.
Getting Started with AI Engineering
First working Anthropic API call in under 10 minutes — API key setup, SDK install, stateless multi-turn conversation pattern, streaming, and the nine most common beginner mistakes.
Graph Health Report — 2026-05-08 (vault:lint)
Link density audit across 467 content pages — score 100/100. 3 orphans (broken-link root cause), 81 under-linked, 137 hub pages.
Knowledge Gap Report — 2026-05-08
Ranked list of knowledge gaps relative to active areas — what to research next.
LLM Cost Optimisation
Seven cost levers (prompt caching, model routing, Batch API, prompt compression, output token control, semantic caching, streaming) applied together typically reduce LLM API costs by 60-90% without quality loss.
LLM Decision Guide
Opinionated decision tables for every major AI engineering choice — model (Sonnet 4.6 default), embedding (Cohere 65.2 MTEB), vector store (pgvector if on Postgres), agent framework (LangGraph for stateful Python), observability (Langfuse self-hosted), fine-tuning (Axolotl), and the prompting → RAG → fine-tune → agents escalation order.
RAG vs Fine-Tuning
RAG manages knowledge (external, updateable, citable); fine-tuning shapes behaviour (style, terminology, format) — 57% of LLM-deploying organisations use RAG without fine-tuning; the most powerful pattern combines both.
Reasoning Model Patterns
A production decision framework for when to use reasoning models and extended thinking — covering task fit, budget_tokens selection, cross-provider comparison, and cost/latency tradeoffs.
Request Flow Anatomy
The full anatomy of a web request from user to database and back — where latency accumulates, where failures happen, and where retries apply.
Software Engineer to AI Engineer
The fastest path to AI engineering runs through software engineering fundamentals — debugging, APIs, testing, and data thinking transfer directly; the gap is knowing which AI primitives to reach for when.
System Architecture: Three Ms and Four Cs
The Three Ms and Four Cs framework describing the Nexus/Axiom system architecture and workflow.
Technical Communication
The communication layer that separates senior engineers from lead engineers — technical writing, stakeholder translation, ADRs, RFCs, and explaining tradeoffs without losing precision.
Vault Audit Report — 2026-05-03
Vault audit report — rule-based and semantic checks across all 377 pages. 2 broken links, 32 debug runbooks missing from index, 0 frontmatter issues, 3 complementary pairs (all cross-linked), 1 accuracy issue.