Write a window function ranking query
Write a SQL query that ranks customers by total spend in each product category using RANK() OVER (PARTITION BY category ORDER BY total_spend DESC). Then extend it to find customers who ranked in the top 10 in the prior month but dropped out this month; a cohort analysis pattern used in every real analytics stack.
Why this matters
Window functions are the dividing line between SQL users and SQL engineers. They eliminate the self-joins and correlated subqueries that make queries unreadable and slow, and they unlock analytical patterns that are simply impossible without them. Ranking, running totals, lead/lag comparisons; these are the queries that turn raw data into business insight.
Before you start
- A SQL database with sample data (DuckDB with a CSV works; see the DuckDB exercise)
- Understanding of GROUP BY and aggregate functions (SUM, COUNT)
- Basic understanding of what a subquery is
- No prior window function experience required
Step-by-step guide
- 1
Set up the sample data
Create two months of order data: customer_id, product_category, order_amount, order_month. You need at least 20 customers and 3 categories to make the ranking meaningful. Seed data where a handful of customers change their spending significantly between months; these are the interesting cases.
- 2
Write the monthly spend aggregation
Aggregate to: customer_id, product_category, order_month, total_spend using GROUP BY. This is the input to your window function. Verify the aggregation is correct before adding the window; debugging window function errors on top of aggregation errors is painful.
- 3
Add the RANK() window function
Add RANK() OVER (PARTITION BY product_category, order_month ORDER BY total_spend DESC) AS category_rank. The PARTITION BY resets the rank for each combination of category and month. Verify that the top spender in each category in each month has rank 1.
- 4
Find top-10 customers per category per month
Wrap the previous query in a CTE or subquery and filter WHERE category_rank <= 10. You now have the top 10 customers per category for each month in your dataset. Print this result and verify it makes sense before building the comparison.
- 5
Identify customers who dropped out
Self-join the top-10 result on customer_id and product_category, matching the prior month to the current month. Customers where the prior month row exists but the current month row is NULL are the dropouts. Use LAG() as an alternative approach; compare both and understand when each is cleaner.