RAG

7 pages

Chunking Strategies

512-token fixed-size chunking with 50-token overlap is the default; semantic chunking improves complex docs at 5-10x ingestion cost; parent-child retrieval separates precision from context richness.

chunkingragtext-splittingsemantic-chunking

Embedding Models

MTEB leaderboard comparison — Cohere embed-v4 (65.2, best managed), BGE-M3 (63.0, best open), Matryoshka truncation for 12x storage reduction, and Cohere input_type for 5% retrieval gain.

embeddingsembedding-modelscohereopenai

GraphRAG

GraphRAG builds entity/relationship knowledge graphs with Leiden community detection for multi-hop synthesis queries; LazyGraphRAG (2024) achieves 0.1% of full GraphRAG indexing cost at 70-80% quality.

graphragknowledge-graphmicrosoftlazygraphrag

Hybrid Retrieval

BM25+dense hybrid with RRF (k=60) is the production standard — BM25 catches exact-match keywords dense misses; dense catches semantic matches BM25 misses; Qdrant and Weaviate have native hybrid support.

hybrid-retrievalbm25dense-retrievalrrf

Query Expansion

Query expansion fixes retrieval failures by rewriting the query before searching. HyDE generates a hypothetical answer and embeds that; multi-query generates alternative phrasings; step-back prompts for a more general principle. Each fixes a different failure mode.

ragquery-expansionhydemulti-query

RAG — Retrieval-Augmented Generation

Production RAG pipeline — hybrid BM25+dense retrieval, Cohere reranking (10-25% precision gain), RAGAS evaluation (faithfulness >0.9, context precision >0.8), and when GraphRAG beats standard retrieval.

ragretrievalembeddingschunking

Reranking

Cross-encoder reranking is the single biggest precision lever in RAG — 10-25% NDCG improvement; Cohere Rerank v3.5 (API), Jina v2 (137M self-hosted), BGE v2-m3 (568M highest quality open).

rerankingcoherecross-encoderretrieval