Using dbt with Dagster, part one: Set up the dbt project#

This is part one of the Using dbt with Dagster tutorial.

First, you'll download Dagster's tutorial_dbt_dagster example and configure the dbt project it contains. This example uses dbt's jaffle shop project, but you can follow along with a different project as well.

In this part, you'll:


Step 1: Download the tutorial_dbt_dagster example#

Let's get started by downloading the tutorial_dbt_dagster example.

  1. In the terminal, run the following to download the example:

    dagster project from-example --name tutorial_dbt_dagster --example tutorial_dbt_dagster
    
  2. Navigate into the tutorial_dbt_dagster folder:

    cd tutorial_dbt_dagster
    
  3. Run the following to install the project's dependencies:

    pip install -e ".[dev]"
    

Step 2: Create a profile#

We assume you use DuckDB as your data warehouse in order to run the tutorial entirely locally. If you plan on using a different data warehouse in production, that's totally fine: Dagster can orchestrate dbt regardless of what profile is configured. For example, we support dbt profiles for BigQuery, Redshift, Snowflake, and more.

In this step, you'll add a profile to the dbt project. Profiles in dbt allow you to connect dbt to a database - in this case, that's DuckDB.

  1. Open the profiles.yml file, located in /tutorial_template/jaffle_shop/config.

  2. Add the following code to the file:

    jaffle_shop:
      target: local
      outputs:
        local:
          type: duckdb
          path: tutorial.duckdb
          schema: jaffle_shop
    

Step 3: Configure dbt model data sources#

Next, you'll tell the dbt models where their source data will be located. There isn't any data yet - we'll create Dagster assets that fetch and provide data to the dbt models later in the tutorial.

In dbt, sources are declared using sources.yml files. We've provided this file for you in /tutorial_template/jaffle_shop/models, which defines the location of tables containing source data:

version: 2

sources:
  - name: jaffle_shop
    tables:
      - name: orders_raw
      - name: customers_raw

The Dagster assets you build in part three of this tutorial will create orders_raw and customers_raw tables when materialized. To have the models select data from these tables, you'll need to update the models.

In /tutorial_template/jaffle_shop/models, update the stg_customers.sql and stg_orders.sql models to use the {{ source()}} function. This function tells the dbt model to select data from the defined source. In this example, that's the customers_raw and orders_raw tables in the DuckDB database:.

  1. In stg_customers.sql, replace its contents with the following:

    select
        id as customer_id,
        first_name,
        last_name
    from {{ source('jaffle_shop', 'customers_raw') }}
    
  2. In stg_orders.sql, replace its contents with the following:

    select
        id as order_id,
        user_id as customer_id,
        order_date,
        status
    from {{ source('jaffle_shop', 'orders_raw') }}
    

What's next?#

At this point, you should have a fully-configured dbt project that's ready to work with Dagster. The next step is to load the dbt models into Dagster as assets.