freshness
version: 2sources:- name: <source_name>freshness:warn_after:error_after:tables:- name: <table_name>freshness:warn_after:error_after:...
Definition
A freshness block is used to define the acceptable amount of time between the most recent record, and now, for a table to be considered "fresh".
In the freshness
block, one or both of warn_after
and error_after
can be provided. If neither is provided, then dbt will not calculate freshness snapshots for the tables in this source.
Additionally, the loaded_at_field
is required to calculate freshness for a table. If a loaded_at_field
is not provided, then dbt will not calculate freshness for the table.
Freshness blocks are applied hierarchically:
- a
freshness
andloaded_at_field
property added to a source will be applied to all all tables defined in that source - a
freshness
andloaded_at_field
property added to a source table will override any properties applied to the source.
This is useful when all of the tables in a source have the same loaded_at_field
, as is often the case.
loaded_at_field
(Required)
A column name (or expression) that returns a timestamp indicating freshness.
If using a date field, you may have to cast it to a timestamp:
loaded_at_field: "completed_date::timestamp"
Or, depending on your SQL variant:
loaded_at_field: "CAST(completed_date AS TIMESTAMP)"
If using a non-UTC timestamp, cast it to UTC first:
loaded_at_field: "convert_timezone('UTC', 'Australia/Sydney', created_at_local)"
count
(Required)
A positive integer for the number of periods where a data source is still considered "fresh".
period
(Required)
The time period used in the freshness calculation. One of minute
, hour
or day
filter
(optional)
Add a where clause to the query run by dbt source snapshot-freshness
in order to limit data scanned.
This filter only applies to dbt's source freshness queries - it will not impact other uses of the source table.
This is particularly useful if:
- You are using BigQuery and your source tables are partitioned tables
- You are using Snowflake or Spark with large tables, and this results in a performance benefit
Examples
Complete example
version: 2sources:- name: jaffle_shopdatabase: rawfreshness: # default freshnesswarn_after: {count: 12, period: hour}error_after: {count: 24, period: hour}loaded_at_field: _etl_loaded_attables:- name: customers # this will use the freshness defined above- name: ordersfreshness: # make this a little more strictwarn_after: {count: 6, period: hour}error_after: {count: 12, period: hour}# Apply a where clause in the freshness queryfilter: datediff('day', _etl_loaded_at, current_timestamp) < 2- name: product_skusfreshness: # do not check freshness for this table
When running dbt source snapshot-freshness
, the following query will be run:
- Compiled SQL
- Jinja SQL
selectmax(_etl_loaded_at) as max_loaded_at,convert_timezone('UTC', current_timestamp()) as snapshotted_atfrom raw.jaffle_shop.orderswhere datediff('day', _etl_loaded_at, current_timestamp) < 2