Implement Type 2 slowly changing dimensions in SQL
Implement a Type 2 SCD for a customer dimension where email address and subscription plan change over time. You will write the initial load, the incremental merge that expires changed rows and inserts new ones, and the point-in-time query that returns what a customer's record looked like on any given historical date.
Why this matters
Every analytics system eventually needs to answer 'what did the customer's plan look like when they made this purchase?' If you built a simple dimension with current values, that question is unanswerable for historical data. SCD Type 2 preserves history by versioning rows with effective dates rather than overwriting; and understanding it is the difference between a data model that supports historical analysis and one that only works for reports about today.
Before you start
- PostgreSQL or DuckDB running locally
- Basic SQL (INSERT, UPDATE, SELECT with JOINs)
- Understanding of what a dimension table is from the star schema exercise
Step-by-step guide
- 1
Create the dimension table
Write CREATE TABLE dim_customer with: surrogate_key SERIAL, customer_id INT (natural key), email VARCHAR, subscription_plan VARCHAR, effective_from DATE, effective_to DATE (NULL for the current row), is_current BOOLEAN. The surrogate key plus effective dates is the SCD Type 2 pattern.
- 2
Load the initial data
INSERT 10 customer rows with effective_from = today and effective_to = NULL, is_current = TRUE. Write a query that returns only current rows (WHERE is_current = TRUE); this is the view most reports will use.
- 3
Write the incremental merge
Write the SQL that handles a changed row: UPDATE the existing row to set effective_to = today minus 1 day, is_current = FALSE, then INSERT a new row for the same customer_id with the new attributes and effective_from = today. Wrap both in a transaction. Verify 2 rows per changed customer.
- 4
Handle new and unchanged customers
Extend the merge: for new customer_ids, INSERT with is_current = TRUE; for changed customers, expire and insert; for unchanged customers, do nothing. Write a staging query that classifies each incoming row as new, changed, or unchanged using a LEFT JOIN.
- 5
Query point-in-time state
Write a query with a date parameter that returns the dimension as it existed on that date: WHERE effective_from <= target_date AND (effective_to IS NULL OR effective_to > target_date). Test with 3 historical dates and verify the correct row is returned for customers who changed during those periods.
- 6
Join the SCD dimension to fact data
Create a fact_orders table with customer_id, order_date, amount. Join to dim_customer using the point-in-time pattern: match on customer_id where order_date falls within the effective window. Verify an order placed before a plan change shows the old plan, not the current one.