The Axiom
Learning Path12 topics · ~9h

AI Engineer

Prompt engineering, RAG, agents, evals, and production operations. The full curriculum for building AI products that hold up under load.

  1. LLMs

    Claude

    The Claude 4.x model family (Opus 4.7, Sonnet 4.6, Haiku 4.5) — model selection guide, extended thinking, prompt caching, and the RSP safety framework underlying all Claude deployments.

    Start with the model you will use most

  2. LLMs

    Tokenisation

    LLMs read tokens not text — BPE algorithm, tiktoken and Anthropic tokenisers, non-English cost penalty, and context window budgeting at production scale.

    What the model actually sees — tokens, embeddings, context mechanics

  3. Prompting

    Prompt Engineering

    Claude-specific XML structuring outperforms Markdown, 2-5 few-shot examples in example tags, CoT for reasoning tasks but not with Extended Thinking, and DSPy for automated optimisation at scale.

  4. RAG

    RAG — Retrieval-Augmented Generation

    Production RAG pipeline — hybrid BM25+dense retrieval, Cohere reranking (10-25% precision gain), RAGAS evaluation (faithfulness >0.9, context precision >0.8), and when GraphRAG beats standard retrieval.

    Retrieval-augmented generation end-to-end

  5. Data

    Data Pipelines for AI

    Data pipelines for AI (dbt, Airflow, Prefect, DVC) differ from traditional ETL because data quality bugs silently degrade model quality, making validation checkpoints and eval-as-a-pipeline-stage mandatory.

    Ingestion, chunking, freshness, versioning — bad data kills RAG

  6. Agents

    LangGraph

    LangGraph v1.0 is the production standard for stateful multi-agent orchestration in Python, offering fine-grained graph-based control with built-in checkpointing and human-in-the-loop support.

    Stateful agent loops

  7. Evals

    LLM Evaluation

    LLM evaluation methodology — only 52% of AI orgs have evals in place, making this the most common gap; covers offline/online/agent/RAG eval types, framework selection, golden set construction, and CI integration.

    The only way to know if it is actually working

  8. LLMs

    Hallucination

    Hallucination is a fundamental property of LLMs (not a bug) — covering why it happens, six types, detection methods (faithfulness checks, self-consistency sampling), and six mitigation strategies with RAG as the most effective.

    Failure modes: confabulation patterns and how to detect them

  9. Security

    Prompt Injection

    OWASP LLM01 — indirect injection via RAG/tool results is the hard problem; XML privilege separation, "flag injection attempts" instructions, and least-privilege tools are the primary defences; no complete solution exists.

    Failure modes: injection, data leakage, tool misuse

  10. Observability

    LLM Tracing with OpenTelemetry

    OTel GenAI semantic conventions, manual and auto-instrumentation for Anthropic/LangChain, Langfuse native SDK patterns, cost tracking per trace, and Prometheus alerting thresholds.

    Trace every prompt, tool call, and latency spike in production

  11. Security

    OWASP LLM Top 10 (2025) and Agentic Top 10 (2026)

    OWASP LLM Top 10 (2025) — prompt injection is

    The full threat surface for AI systems

  12. Protocols

    MCP Server Development (Python)

    Building MCP servers in Python uses FastMCP (now part of the official mcp SDK). Decorator-based API auto-generates JSON Schema from type hints. stdio is the default transport for Claude Desktop/Claude Code integration; streamable-HTTP for production. Tool descriptions are an attack surface — keep them minimal.

    Build tool integrations

12 pages · ~9h estimated reading time

← Browse all topics