Dbt Models: Subqueries, CTEs, and Macros

Goal: unlock the power of modular SQL with dbt Models, manage static data with Seeds, and write DRY code with Macros.


1. The Power of {{ ref() }}

In standard SQL, you reference tables by their name (FROM schema.table). In dbt, you use {{ ref('model_name') }}.

Why?

  1. Environment Agnostic: dbt handles the schema.table resolution. It works in Prod, Dev, and CI without changing code.
  2. Dependency Graph: dbt parses these refs to build a DAG (Directed Acyclic Graph). It knows exactly which order to run models in.
1
2
3
-- models/marts/fct_trips.sql
select *
from {{ ref('stg_green_tripdata') }} -- dbt knows to run staging first

2. Common Table Expressions (CTEs)

dbt Analytics Engineers love CTEs. They make code readable and modular. A common pattern is Import CTEs:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
with green_data as (
    select * from {{ ref('stg_green_tripdata') }}
),

yellow_data as (
    select * from {{ ref('stg_yellow_tripdata') }}
),

unioned as (
    select * from green_data
    union all
    select * from yellow_data
)

select * from unioned

Best Practice: Define all your references at the top. It makes the model’s inputs clear immediately.


3. Seeds: Version Controlled Static Data

Sometimes you need a lookup table that doesn’t exist in your database.

  • Country codes
  • Employee IDs
  • Taxi Zone Names

Instead of INSERT INTO statements, use Seeds.

  1. Save a CSV file in the seeds/ directory (e.g., taxi_zone_lookup.csv).
  2. Run dbt seed.
  3. dbt creates a table and inspects the data to infer types.
  4. Reference it like any model: {{ ref('taxi_zone_lookup') }}.

4. Macros: SQL with Superpowers

SQL is great, but it lacks abstraction. If you have complex logic (e.g., converting cents to dollars, or mapping integer codes to names), you often copy-paste logical blocks.

Macros (using Jinja) solve this. They are like Functions for SQL.

Example: get_payment_type_description

Defined in macros/get_payment_type_description.sql:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
{% macro get_payment_type_description(payment_type) %}

    case {{ payment_type }}
        when 1 then 'Credit card'
        when 2 then 'Cash'
        when 3 then 'No charge'
        when 4 then 'Dispute'
        when 5 then 'Unknown'
        when 6 then 'Voided trip'
        else 'EMPTY'
    end

{% endmacro %}

Used in a Model:

1
2
3
4
select
    payment_type,
    {{ get_payment_type_description('payment_type') }} as payment_description
from {{ ref('stg_green_tripdata') }}

Result: dbt compiles the macro into the full CASE WHEN statement at runtime. Your source code remains clean and readable.


In the next post, we will look at Testing and Documentation to ensure our data quality.