Dbt Testing, Documentation, and Packages

Goal: ensure data quality with Tests, generate beautiful Documentation, and leverage the community with Packages.


1. Testing: Sleep Well at Night

dbt makes testing a core part of the development workflow, not an afterthought.

Generic Tests

You can add these directly to your schema.yml. dbt ships with 4 built-in tests:

  1. unique: No duplicates.
  2. not_null: No missing values.
  3. accepted_values: Ensure a column is one of ['A', 'B', 'C'].
  4. relationships: Referential integrity (Foreign Key check).
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
models:
  - name: stg_green_tripdata
    columns:
      - name: trip_id
        tests:
          - unique
          - not_null
      - name: verify_status
        tests:
          - accepted_values:
              values: ['Verified', 'Not Verified']

Singular Tests

For specific business logic, write a SQL query in tests/. If the query returns rows, the test fails.

1
2
3
4
-- tests/assert_positive_payment.sql
select *
from {{ ref('stg_green_tripdata') }}
where total_amount < 0

2. Documentation: Your Project Website

dbt parses your project and generates a static website with:

  • Lineage Graphs: See dependencies between models.
  • Column Descriptions: Pulled from schema.yml.
  • SQL Code: View compiled SQL.

How to Generate

1
2
dbt docs generate
dbt docs serve

Doc Blocks

For long descriptions, use doc blocks in Markdown/Jinja:

1
2
3
4
5
6
{% docs table_description %}
This table contains **all** taxi trips.
It is filtered to exclude:
* Test trips
* Zero-distance trips
{% enddocs %}

Refer to it in YAML: description: '{{ doc("table_description") }}'


3. Packages: Don’t Reinvent the Wheel

dbt has a massive library of open-source packages. The most essential is dbt-utils.

Installation (packages.yml)

1
2
3
packages:
  - package: dbt-labs/dbt_utils
    version: 1.1.1

Run dbt deps to install.

Usage Example: Surrogate Keys

Instead of manually concatenating strings to create a primary key, use a robust macro:

1
{{ dbt_utils.generate_surrogate_key(['vendor_id', 'pickup_datetime']) }} as trip_id

This handles nulls and data types correctly across different warehouses (BigQuery/Snowflake/DuckDB).


In the final post of this series, we will look at Deploying dbt Projects.