AI Engineer
Prompt engineering, RAG, agents, evals, and production operations. The full curriculum for building AI products that hold up under load.
LLMs
Claude
The Claude 4.x model family (Opus 4.7, Sonnet 4.6, Haiku 4.5) — model selection guide, extended thinking, prompt caching, and the RSP safety framework underlying all Claude deployments.
↳ Start with the model you will use most
LLMs
Tokenisation
LLMs read tokens not text — BPE algorithm, tiktoken and Anthropic tokenisers, non-English cost penalty, and context window budgeting at production scale.
↳ What the model actually sees — tokens, embeddings, context mechanics
Prompting
Prompt Engineering
Claude-specific XML structuring outperforms Markdown, 2-5 few-shot examples in example tags, CoT for reasoning tasks but not with Extended Thinking, and DSPy for automated optimisation at scale.
RAG
RAG — Retrieval-Augmented Generation
Production RAG pipeline — hybrid BM25+dense retrieval, Cohere reranking (10-25% precision gain), RAGAS evaluation (faithfulness >0.9, context precision >0.8), and when GraphRAG beats standard retrieval.
↳ Retrieval-augmented generation end-to-end
Data
Data Pipelines for AI
Data pipelines for AI (dbt, Airflow, Prefect, DVC) differ from traditional ETL because data quality bugs silently degrade model quality, making validation checkpoints and eval-as-a-pipeline-stage mandatory.
↳ Ingestion, chunking, freshness, versioning — bad data kills RAG
Agents
LangGraph
LangGraph v1.0 is the production standard for stateful multi-agent orchestration in Python, offering fine-grained graph-based control with built-in checkpointing and human-in-the-loop support.
↳ Stateful agent loops
Evals
LLM Evaluation
LLM evaluation methodology — only 52% of AI orgs have evals in place, making this the most common gap; covers offline/online/agent/RAG eval types, framework selection, golden set construction, and CI integration.
↳ The only way to know if it is actually working
LLMs
Hallucination
Hallucination is a fundamental property of LLMs (not a bug) — covering why it happens, six types, detection methods (faithfulness checks, self-consistency sampling), and six mitigation strategies with RAG as the most effective.
↳ Failure modes: confabulation patterns and how to detect them
Security
Prompt Injection
OWASP LLM01 — indirect injection via RAG/tool results is the hard problem; XML privilege separation, "flag injection attempts" instructions, and least-privilege tools are the primary defences; no complete solution exists.
↳ Failure modes: injection, data leakage, tool misuse
Observability
LLM Tracing with OpenTelemetry
OTel GenAI semantic conventions, manual and auto-instrumentation for Anthropic/LangChain, Langfuse native SDK patterns, cost tracking per trace, and Prometheus alerting thresholds.
↳ Trace every prompt, tool call, and latency spike in production
Security
OWASP LLM Top 10 (2025) and Agentic Top 10 (2026)
OWASP LLM Top 10 (2025) — prompt injection is
↳ The full threat surface for AI systems
Protocols
MCP Server Development (Python)
Building MCP servers in Python uses FastMCP (now part of the official mcp SDK). Decorator-based API auto-generates JSON Schema from type hints. stdio is the default transport for Claude Desktop/Claude Code integration; streamable-HTTP for production. Tool descriptions are an attack surface — keep them minimal.
↳ Build tool integrations
12 pages · ~9h estimated reading time
← Browse all topics