Claude
The Claude 4.x model family (Opus 4.7, Sonnet 4.6, Haiku 4.5) — model selection guide, extended thinking, prompt caching, and the RSP safety framework underlying all Claude deployments.
DeepSeek R1 / R2
DeepSeek R1 is a 671B MoE reasoning model trained entirely via reinforcement learning (GRPO, no PPO reward model) that matched OpenAI o1 on AIME and MATH-500 at 96% lower API cost with MIT-licensed open weights — the most disruptive open model release since Llama.
Foundation Models
Foundation models are large neural networks pretrained on massive datasets that can be adapted to many tasks via prompting or fine-tuning — the paradigm shift underlying modern AI engineering.
Hallucination
Hallucination is a fundamental property of LLMs (not a bug) — covering why it happens, six types, detection methods (faithfulness checks, self-consistency sampling), and six mitigation strategies with RAG as the most effective.
Inference-Time Scaling (Test-Time Compute)
Allocating more compute at inference time — through sampling, search, or extended reasoning traces — produces quality gains that compound independently of training compute, with math and code tasks benefiting most.
LLM Model Families
The eight major LLM families (OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek, Qwen, Cohere) compared by capability tier, licensing, and best use case.
ML Fundamentals
Traditional ML foundations — supervised (regression, classification), unsupervised (clustering, dimensionality reduction), and RL — with key algorithms, evaluation metrics, and the ML lifecycle. AIF-C01 Domain 1 core.
Multi-head Latent Attention (MLA)
MLA compresses K and V into a single low-rank latent vector that is cached instead of full K/V tensors, cutting KV cache size by 93% vs standard MHA while preserving model quality — enabling 128K-context inference at scale.
Small Language Models (SLMs)
Small language models (1B-14B parameters) run on consumer hardware and mobile devices; a fine-tuned SLM on a narrow task often beats frontier models at 1/100th the serving cost.
Tokenisation
LLMs read tokens not text — BPE algorithm, tiktoken and Anthropic tokenisers, non-English cost penalty, and context window budgeting at production scale.
Transformer Architecture
The transformer's core operations — scaled dot-product attention (O(n²)), KV cache, RoPE positional encoding, MoE routing, and Chinchilla scaling laws — and why each matters operationally.