Build your first models
With our starter project setup, it's time to get to the fun part — building models! We're going to take the query from the Setting up instructions, and turn it into a model in our dbt project.
Checkout a new git branch
dbt Cloud
- Ensure you're in the Develop interface. If you're not, click the hamburger menu, and then
Develop
. The master branch should now be set to "read only" mode, so you'll need to create a new branch. - Click the "create new branch" button, and name your branch
add-customers-model
.
dbt CLI
It's a good idea to check out a new git branch when working on new code.
- Run
git checkout -b add-customers-model
to create a new branch.
Build your first model
dbt Cloud
- Ensure you're in the Develop interface. If you're not, click the hamburger menu, and then
Develop
. - Create a new file in the
models
directory namedmodels/customers.sql
. - Paste the query from the Setting up instructions into the file.
- Execute
dbt run
in the command prompt at the bottom of the screen. You should get a successful run, like so:
If you switch back to the BigQuery console you'll be able to select
from this model.
dbt CLI
- Open your project in a code editor
- Create a new SQL file in the
models
directory, namedmodels/customers.sql
. - Paste the query from the Setting up instructions into the file.
- From the command line, execute
dbt run
. Your output should look like this:
If you switch back to the BigQuery console you'll be able to select
from this model.
FAQs
Change the way your model is materialized
One of the most powerful features of dbt is that you can change the way a model is materialized in your warehouse, simply by changing a configuration value. Let's see this in action.
Using the +
sign in your dbt_project.yml
These videos were recorded with a slightly older version of dbt (dbt v0.15.0), which did not use the +
sign in the dbt_project.yml
file (this was introduced in dbt v0.17.0).
We'll try to update the videos soon, but for now, take extra note of the +
signs in the code samples below, under the models:
key.
- dbt Cloud
- dbt CLI
- Edit the following in your
dbt_project.yml
file:
models:jaffle_shop:+materialized: tableexample:+materialized: view
- Execute
dbt run
. Your model,customers
should now be built as a table!info
To do this, dbt had to first run a
drop view
statement (or API call on BigQuery), then acreate table as
statement.
- Edit
models/customers.sql
to have the following snippet at the top:
{{config(materialized='view')}}with customers as (selectid as customer_id...)
- Execute
dbt run
. Your model,customers
should be built as a view. You may need to rundbt run --full-refresh
for this to take effect on BigQuery.
FAQs
Delete the example models
We don't need the sample files that dbt created for us anymore! Let's delete them.
- dbt Cloud
- dbt CLI
- Delete the
models/example/
directory - Delete the
example:
key from yourdbt_project.yml
file, and any configurations that are listed under it
# beforemodels:jaffle_shop:+materialized: tableexample:+materialized: view
# aftermodels:jaffle_shop:+materialized: table
FAQs
Build models on top of other models
Often, it's a good idea to clean your data in one place, before doing additional transformations downstream. Our query already uses CTEs to this effect, but now we're going to experiment with using the ref function to separate this clean-up into upstream models, like so:
- dbt Cloud
- dbt CLI
- Create a new SQL file,
models/stg_customers.sql
, with the SQL from thecustomers
CTE in our original query:
selectid as customer_id,first_name,last_namefrom `dbt-tutorial`.jaffle_shop.customers
- Create a second new SQL file,
models/stg_orders.sql
, with the SQL from theorders
CTE in our original query:
selectid as order_id,user_id as customer_id,order_date,statusfrom `dbt-tutorial`.jaffle_shop.orders
- Edit the SQL in your
models/customers.sql
file as follows:
with customers as (select * from {{ ref('stg_customers') }}),orders as (select * from {{ ref('stg_orders') }}),customer_orders as (selectcustomer_id,min(order_date) as first_order_date,max(order_date) as most_recent_order_date,count(order_id) as number_of_ordersfrom ordersgroup by 1),final as (selectcustomers.customer_id,customers.first_name,customers.last_name,customer_orders.first_order_date,customer_orders.most_recent_order_date,coalesce(customer_orders.number_of_orders, 0) as number_of_ordersfrom customersleft join customer_orders using (customer_id))select * from final
- Execute
dbt run
This time when dbt ran, separate views/tables were created for stg_customers
, stg_orders
and customers
. dbt was able to infer the order in which to run these models in — customers
depends on stg_customers
and stg_orders
, so gets built last. There's no need to explicitly define these dependencies.
FAQs
Extra exercises
- Write some bad SQL to cause an error — can you debug this error?
- Run only a single model at a time (docs)
- Group your models with a
stg_
prefix into astaging
subdirectory (i.e.models/staging/stg_customers.sql
)- Configure your
staging
models to be views - Run only the
staging
models
- Configure your
- dbt Cloud
- dbt CLI