Papers

17 pages

Attention Is All You Need (Vaswani et al., 2017)

The 2017 paper that replaced RNNs with parallel self-attention — enabling BERT, GPT, and every LLM since; key changes from 2017 to 2026 (RoPE, Pre-LN, SwiGLU, GQA).

transformerattentionpapervaswani

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Wei et al., 2022)

Adding intermediate reasoning steps to few-shot examples — "chain-of-thought" — dramatically improves LLM performance on multi-step reasoning tasks, but only emerges at large model scale (~100B parameters).

chain-of-thoughtcotreasoningwei

Constitutional AI: Harmlessness from AI Feedback (Bai et al., Anthropic, 2022)

Instead of collecting human labels for harmful outputs, train a model to critique and revise its own responses using a written set of principles (a "constitution"), then use those AI-generated preference labels to train the final model.

constitutional-aianthropicrlhfalignment

Direct Preference Optimization (Rafailov et al., 2023)

DPO shows that the RLHF reward model and PPO optimisation loop can be eliminated — the LLM itself encodes an implicit reward function, allowing direct optimisation on preference pairs with a simple classification-style loss.

dpodirect-preference-optimizationalignmentrafailov

GPT-3: Language Models are Few-Shot Learners (Brown et al., 2020)

Scaling a decoder-only Transformer to 175B parameters with 300B tokens of training data produced a model that could perform new tasks from a handful of examples in the prompt — without any gradient updates.

gpt-3openaifew-shotscaling

GPT-4 Technical Report

OpenAI (2023) — GPT-4 is a large-scale multimodal model trained with RLHF. Passes the bar exam in the top 10%, demonstrates emergent capabilities, and introduces a systematic safety evaluation methodology with a published system card. The template for how frontier labs now report model capabilities.

gpt-4openaimultimodalrlhf

Key Papers Reading List

Curated reading list for senior AI engineers — 22 papers across architecture, alignment, reasoning, RAG, efficient training, safety, and scaling, with a one-day and one-week priority order.

papersresearchfoundationalreading-list

Llama 2: Open Foundation and Fine-Tuned Chat Models (Touvron et al., 2023)

Llama 2 (Touvron et al., Meta + Microsoft, July 2023) adds RLHF-tuned chat models (7B–70B), doubles the pretraining budget to 2T tokens, extends context to 4096 tokens, and introduces Ghost Attention for multi-turn consistency — with a commercial licence covering up to 700M monthly users.

llama-2metamicrosofttouvron

LLaMA: Open and Efficient Foundation Language Models (Touvron et al., 2023)

LLaMA (Touvron et al., Meta AI, Feb 2023) proved that a 13B model trained on public data only can outperform GPT-3 (175B) on most benchmarks — igniting the open-source LLM ecosystem.

llamametatouvron2023

LoRA: Low-Rank Adaptation of Large Language Models (Hu et al., 2021)

Instead of fine-tuning all model weights, freeze the original weights and inject trainable low-rank decomposition matrices into each attention layer — achieving 10,000× fewer trainable parameters with no inference overhead.

lorafine-tuninglow-rankadaptation

Mechanistic Interpretability — Core Papers

The research programme of understanding what computations neural networks actually implement.

mechanistic-interpretabilitycircuitssuperpositionmonosemanticity

Mistral 7B and Mixtral 8x7B

Mistral 7B (Oct 2023) introduced SWA and GQA to beat Llama 2 13B at 7B parameters; Mixtral 8x7B (Dec 2023) applied sparse MoE — 8 experts, 2 active per token — to match GPT-3.5 Turbo with 12.9B active from 46.7B total parameters.

mistralmixtralmoesliding-window-attention

ReAct: Synergizing Reasoning and Acting in Language Models (Yao et al., 2022)

Interleave chain-of-thought reasoning with tool-use actions in a single generation loop — the model reasons about what to do, takes an action, observes the result, reasons again — enabling LLMs to complete tasks requiring external information retrieval.

reactagentsreasoningacting

RLHF: Reinforcement Learning from Human Feedback

Two papers that define RLHF as an alignment technique: Stiennon et al. (2020) demonstrated it at scale for summarisation; Ouyang et al.

rlhfinstructgptalignmentppo

Scaling Laws for Neural Language Models (Kaplan et al., 2020) + Chinchilla (Hoffmann et al., 2022)

Two papers that define how LLM performance scales with compute, parameters, and data. Chinchilla corrected a key mistake in Kaplan and changed how all subsequent models are trained.

scaling-lawskaplanchinchillacompute

SWE-bench: Can Language Models Resolve Real-World GitHub Issues? (Jimenez et al., 2024)

A benchmark of 2,294 real GitHub issues from 12 popular Python repositories — to resolve each issue, a model must understand a full codebase, write a patch, and pass the existing test suite.

swe-benchbenchmarkcodeagents

Toolformer: Language Models Can Teach Themselves to Use Tools

Schick et al. (Meta, 2023) — language models can teach themselves to call external APIs by self-generating training data. The conceptual origin of tool use in LLMs before ChatGPT plugins or function calling.

toolformermetatool-useself-supervised