SQL for AI Engineering
SQL patterns for AI engineering — vector similarity queries with pgvector, feature extraction for ML, logging LLM calls, and building lightweight feature stores.
pgvector: Vector Similarity in Postgres
infra/pgvector adds a vector type and similarity operators directly into PostgreSQL — no separate vector database required for most use cases.
-- Install extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Embeddings table
CREATE TABLE document_embeddings (
id BIGSERIAL PRIMARY KEY,
doc_id BIGINT REFERENCES documents(id),
chunk_idx INTEGER,
content TEXT,
embedding vector(1536) -- OpenAI ada-002 dimension
);
-- HNSW index for fast approximate nearest-neighbor search
CREATE INDEX ON document_embeddings
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);Similarity queries
-- Cosine similarity (1 - cosine_distance): higher = more similar
SELECT doc_id, content,
1 - (embedding <=> '[0.1, 0.2, ...]'::vector) AS similarity
FROM document_embeddings
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
LIMIT 5;
-- Operators:
-- <=> cosine distance
-- <-> L2 (Euclidean) distance
-- <#> negative inner product
-- Hybrid: combine vector search with metadata filter
SELECT d.title, de.content, de.embedding <=> $1::vector AS dist
FROM document_embeddings de
JOIN documents d ON d.id = de.doc_id
WHERE d.category = 'engineering'
AND d.created_at > NOW() - INTERVAL '90 days'
ORDER BY dist
LIMIT 10;LLM Call Logging
Logging every LLM call to Postgres enables cost tracking, latency monitoring, and regression testing without an external observability platform.
CREATE TABLE llm_calls (
id BIGSERIAL PRIMARY KEY,
created_at TIMESTAMPTZ DEFAULT NOW(),
model TEXT NOT NULL,
input_tokens INTEGER,
output_tokens INTEGER,
latency_ms INTEGER,
cost_usd NUMERIC(10, 6),
prompt_hash TEXT, -- hash of system + user prompt for dedup
cache_hit BOOLEAN,
error TEXT,
metadata JSONB
);
-- Daily cost report
SELECT
DATE(created_at) AS day,
model,
COUNT(*) AS calls,
SUM(input_tokens + output_tokens) AS total_tokens,
SUM(cost_usd) AS total_cost,
AVG(latency_ms) AS avg_latency_ms,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY latency_ms) AS p95_latency
FROM llm_calls
WHERE created_at > NOW() - INTERVAL '30 days'
AND error IS NULL
GROUP BY DATE(created_at), model
ORDER BY day DESC, total_cost DESC;
-- Cache hit rate
SELECT
model,
COUNT(*) FILTER (WHERE cache_hit) * 100.0 / COUNT(*) AS cache_hit_pct,
SUM(cost_usd) FILTER (WHERE NOT cache_hit) AS actual_cost
FROM llm_calls
WHERE created_at > NOW() - INTERVAL '7 days'
GROUP BY model;Feature Store Pattern
A feature store provides pre-computed, point-in-time correct ML features. A simple SQL implementation using PostgreSQL.
-- Feature definitions
CREATE TABLE feature_definitions (
feature_name TEXT PRIMARY KEY,
entity_type TEXT, -- 'user', 'product', etc.
sql_query TEXT, -- query that computes it
ttl_hours INTEGER,
updated_at TIMESTAMPTZ
);
-- Feature values store
CREATE TABLE feature_values (
entity_id TEXT,
feature_name TEXT REFERENCES feature_definitions(feature_name),
value JSONB,
computed_at TIMESTAMPTZ,
PRIMARY KEY (entity_id, feature_name)
);
-- Retrieve feature vector for inference
SELECT feature_name, value
FROM feature_values
WHERE entity_id = 'user_123'
AND computed_at > NOW() - INTERVAL '1 hour' -- TTL check
ORDER BY feature_name;Eval Results Store
Storing evaluation results in SQL enables systematic comparison and regression detection.
CREATE TABLE eval_runs (
id BIGSERIAL PRIMARY KEY,
run_id TEXT UNIQUE, -- git SHA or run name
created_at TIMESTAMPTZ DEFAULT NOW(),
model TEXT,
dataset TEXT,
config JSONB
);
CREATE TABLE eval_results (
id BIGSERIAL PRIMARY KEY,
run_id TEXT REFERENCES eval_runs(run_id),
question_id TEXT,
prompt TEXT,
response TEXT,
score NUMERIC(4, 3), -- 0.0 to 1.0
passed BOOLEAN,
metadata JSONB
);
-- Compare two runs
SELECT
a.question_id,
a.score AS score_v1,
b.score AS score_v2,
b.score - a.score AS delta
FROM eval_results a
JOIN eval_results b USING (question_id)
WHERE a.run_id = 'v1.0' AND b.run_id = 'v1.1'
ORDER BY delta ASC; -- worst regressions firstJSONB for Flexible AI Metadata
-- Query nested JSONB fields
SELECT id, metadata -> 'usage' ->> 'total_tokens' AS tokens
FROM llm_calls
WHERE metadata @> '{"stop_reason": "max_tokens"}'::jsonb;
-- Aggregate JSONB values
SELECT
metadata ->> 'model' AS model,
AVG((metadata -> 'usage' ->> 'total_tokens')::int) AS avg_tokens
FROM llm_calls
GROUP BY metadata ->> 'model';
-- GIN index for JSONB containment queries
CREATE INDEX idx_llm_calls_metadata ON llm_calls USING GIN (metadata);Connections
- infra/pgvector — vector index types and configuration in depth
- rag/chunking — chunking strategies that feed the embeddings table
- evals/methodology — eval frameworks that write results to stores like this
- observability/datadog — alternative for LLM call monitoring
- sql/query-optimization — keeping these queries fast under load
- sql/window-functions — analytical patterns for cost/latency reporting
Open Questions
- Is a time-series database (TimescaleDB) a better fit than plain PostgreSQL for the eval results store pattern at high eval throughput?
- How do you model multi-turn conversation logs in SQL while keeping per-turn retrieval efficient?
- What's the practical row count ceiling for pgvector similarity search before a dedicated vector DB becomes necessary?
Related reading