IntermediateAnalytics Engineer

Design a star schema

Design a star schema for e-commerce analytics: a fact_orders table at the order-line grain, with dim_customer, dim_product, and dim_date dimension tables. Write the CREATE TABLE statements with appropriate indexes, define the grain explicitly, and explain your choice of surrogate keys versus natural keys.

Why this matters

The star schema is the most widely deployed data warehousing pattern for a reason: it makes analytical queries fast and readable. Understanding grain, surrogate keys, and slowly changing dimensions is the foundation that makes everything in an analytics stack work correctly; dbt models, BI tool queries, and executive dashboards all depend on getting this right.

Before you start

Step-by-step guide

  1. 1

    Define the grain

    Write one sentence defining the grain of the fact table: 'Each row in fact_orders represents one line item from one order placed by one customer for one product on one date'. The grain determines which columns belong in the fact table and which belong in dimensions. Write this sentence before touching SQL.

  2. 2

    Design the dimension tables

    Write CREATE TABLE statements for dim_customer (customer_sk surrogate key, customer_id natural key, name, email, country, valid_from, valid_to for SCD2), dim_product (product_sk, product_id, name, category, unit_price), and dim_date (date_sk integer YYYYMMDD, date, year, quarter, month, day_of_week, is_weekend).

  3. 3

    Design the fact table

    Write CREATE TABLE for fact_orders (order_line_sk surrogate key, order_id, customer_sk FK, product_sk FK, date_sk FK, quantity, unit_price, discount_amount, net_revenue). Net_revenue is a derived metric worth pre-computing; explain why you chose to store it rather than calculating it at query time.

  4. 4

    Add indexes

    Add indexes on every foreign key column in fact_orders (customer_sk, product_sk, date_sk). Add a composite index on (date_sk, customer_sk) for the most common query pattern: 'revenue by customer for a date range'. Explain why you would not add more indexes on a fact table that receives millions of inserts per day.

  5. 5

    Write three analytical queries

    Write: monthly revenue by product category, top 10 customers by revenue in the last 90 days, and revenue for new customers (first purchase in the last 30 days) vs returning customers. All three should run without subqueries; if they require subqueries, your schema has a structural issue.

Relevant Axiom pages

What to do next

Back to Practice Lab