Chunking Strategies
512-token fixed-size chunking with 50-token overlap is the default; semantic chunking improves complex docs at 5-10x ingestion cost; parent-child retrieval separates precision from context richness.
Embedding Models
MTEB leaderboard comparison — Cohere embed-v4 (65.2, best managed), BGE-M3 (63.0, best open), Matryoshka truncation for 12x storage reduction, and Cohere input_type for 5% retrieval gain.
GraphRAG
GraphRAG builds entity/relationship knowledge graphs with Leiden community detection for multi-hop synthesis queries; LazyGraphRAG (2024) achieves 0.1% of full GraphRAG indexing cost at 70-80% quality.
Hybrid Retrieval
BM25+dense hybrid with RRF (k=60) is the production standard — BM25 catches exact-match keywords dense misses; dense catches semantic matches BM25 misses; Qdrant and Weaviate have native hybrid support.
Query Expansion
Query expansion fixes retrieval failures by rewriting the query before searching. HyDE generates a hypothetical answer and embeds that; multi-query generates alternative phrasings; step-back prompts for a more general principle. Each fixes a different failure mode.
RAG — Retrieval-Augmented Generation
Production RAG pipeline — hybrid BM25+dense retrieval, Cohere reranking (10-25% precision gain), RAGAS evaluation (faithfulness >0.9, context precision >0.8), and when GraphRAG beats standard retrieval.
Reranking
Cross-encoder reranking is the single biggest precision lever in RAG — 10-25% NDCG improvement; Cohere Rerank v3.5 (API), Jina v2 (137M self-hosted), BGE v2-m3 (568M highest quality open).