Dbt Models: Subqueries, CTEs, and Macros
Goal: unlock the power of modular SQL with dbt Models, manage static data with Seeds, and write DRY code with Macros.
1. The Power of {{ ref() }}
In standard SQL, you reference tables by their name (FROM schema.table). In dbt, you use {{ ref('model_name') }}.
Why?
- Environment Agnostic: dbt handles the
schema.tableresolution. It works in Prod, Dev, and CI without changing code. - Dependency Graph: dbt parses these refs to build a DAG (Directed Acyclic Graph). It knows exactly which order to run models in.
|
|
2. Common Table Expressions (CTEs)
dbt Analytics Engineers love CTEs. They make code readable and modular. A common pattern is Import CTEs:
|
|
Best Practice: Define all your references at the top. It makes the model’s inputs clear immediately.
3. Seeds: Version Controlled Static Data
Sometimes you need a lookup table that doesn’t exist in your database.
- Country codes
- Employee IDs
- Taxi Zone Names
Instead of INSERT INTO statements, use Seeds.
- Save a CSV file in the
seeds/directory (e.g.,taxi_zone_lookup.csv). - Run
dbt seed. - dbt creates a table and inspects the data to infer types.
- Reference it like any model:
{{ ref('taxi_zone_lookup') }}.
4. Macros: SQL with Superpowers
SQL is great, but it lacks abstraction. If you have complex logic (e.g., converting cents to dollars, or mapping integer codes to names), you often copy-paste logical blocks.
Macros (using Jinja) solve this. They are like Functions for SQL.
Example: get_payment_type_description
Defined in macros/get_payment_type_description.sql:
|
|
Used in a Model:
|
|
Result:
dbt compiles the macro into the full CASE WHEN statement at runtime. Your source code remains clean and readable.
In the next post, we will look at Testing and Documentation to ensure our data quality.