Skip to content

Commit b6d584c

Browse files
authored
Merge branch 'current' into nfiann-gitlabedit
2 parents cc5db0e + 1f2834d commit b6d584c

File tree

17 files changed

+234
-83
lines changed

17 files changed

+234
-83
lines changed

website/docs/docs/build/data-tests.md

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -32,14 +32,14 @@ Data tests are assertions you make about your models and other resources in your
3232

3333
You can use data tests to improve the integrity of the SQL in each model by making assertions about the results generated. Out of the box, you can test whether a specified column in a model only contains non-null values, unique values, or values that have a corresponding value in another model (for example, a `customer_id` for an `order` corresponds to an `id` in the `customers` model), and values from a specified list. You can extend data tests to suit business logic specific to your organization – any assertion that you can make about your model in the form of a select query can be turned into a data test.
3434

35-
Data tests return a set of failing records. Generic data tests (a.k.a. schema tests) are defined using `test` blocks.
35+
Data tests return a set of failing records. Generic data tests (also known as schema tests) are defined using `test` blocks.
3636

37-
Like almost everything in dbt, data tests are SQL queries. In particular, they are `select` statements that seek to grab "failing" records, ones that disprove your assertion. If you assert that a column is unique in a model, the test query selects for duplicates; if you assert that a column is never null, the test seeks after nulls. If the data test returns zero failing rows, it passes, and your assertion has been validated.
37+
Like almost everything in dbt, data tests are SQL queries. In particular, they are `select` statements that seek to grab "failing" records, ones that disprove your assertion. If you assert that a column is unique in a model, the test query selects for duplicates; if you assert that a column is never null, the test seeks nulls. If the data test returns zero failing rows, it passes, and your assertion has been validated.
3838

3939
There are two ways of defining data tests in dbt:
4040

41-
- A **singular** data test is testing in its simplest form: If you can write a SQL query that returns failing rows, you can save that query in a `.sql` file within your [test directory](/reference/project-configs/test-paths). It's now a data test, and it will be executed by the `dbt test` command.
42-
- A **generic** data test is a parameterized query that accepts arguments. The test query is defined in a special `test` block (like a [macro](jinja-macros)). Once defined, you can reference the generic test by name throughout your `.yml` files—define it on models, columns, sources, snapshots, and seeds. dbt ships with four generic data tests built in, and we think you should use them!
41+
- A **singular** data test, in its simplest form, is when you write a SQL query that returns failing rows, you can save that query in a `.sql` file within your [test directory](/reference/project-configs/test-paths). It's now a data test, and it will be executed by the `dbt test` command.
42+
- A **generic** data test is a parameterized query that accepts arguments. The test query is defined in a special `test` block (like a [macro](/docs/build/jinja-macros)). Once defined, you can reference the generic test by name throughout your `.yml` files—define it on models, columns, sources, snapshots, and seeds. dbt ships with four generic data tests built in, and we think you should use them!
4343

4444
Defining data tests is a great way to confirm that your outputs and inputs are as expected, and helps prevent regressions when your code changes. Because you can use them over and over again, making similar assertions with minor variations, generic data tests tend to be much more common—they should make up the bulk of your dbt data testing suite. That said, both ways of defining data tests have their time and place.
4545

@@ -51,7 +51,7 @@ If you're new to dbt, we recommend that you check out our [online dbt Fundamenta
5151

5252
The simplest way to define a data test is by writing the exact SQL that will return failing records. We call these "singular" data tests, because they're one-off assertions usable for a single purpose.
5353

54-
These tests are defined in `.sql` files, typically in your `tests` directory (as defined by your [`test-paths` config](/reference/project-configs/test-paths)). You can use Jinja (including `ref` and `source`) in the test definition, just like you can when creating models. Each `.sql` file contains one `select` statement, and it defines one data test:
54+
These tests are defined in `.sql` files, typically in your `tests` directory (as defined by your `test-paths` config). **Note:** The `tests/` directory (`test-paths`) is reserved for singular and generic data tests (SQL). Unit test YAML definitions must live under your project’s `model-paths` (for example, in the `models/` directory), not in `tests/`. You can use Jinja (including `ref` and `source`) in the test definition, just like you can when creating models. Each `.sql` file contains one `select` statement, and it defines one data test:
5555

5656
<File name='tests/assert_total_payment_amount_is_positive.sql'>
5757

@@ -68,7 +68,7 @@ having total_amount < 0
6868

6969
</File>
7070

71-
The name of this test is the name of the file: `assert_total_payment_amount_is_positive`.
71+
The test name is the file name: `assert_total_payment_amount_is_positive`.
7272

7373
Note:
7474
- Omit semicolons (;) at the end of the SQL statement in your singular test files, as they can cause your data test to fail.
@@ -92,7 +92,7 @@ data_tests:
9292
Singular data tests are so easy that you may find yourself writing the same basic structure repeatedly, only changing the name of a column or model. By that point, the test isn't so singular! In that case, we recommend generic data tests.
9393
9494
## Generic data tests
95-
Certain data tests are generic: they can be reused over and over again. A generic data test is defined in a `test` block, which contains a parametrized query and accepts arguments. It might look like:
95+
Certain data tests are generic: they can be reused over and over again. A generic data test is defined in a `test` block, which contains a parameterized query and accepts arguments. It might look like:
9696

9797
```sql
9898
{% test not_null(model, column_name) %}
@@ -104,15 +104,15 @@ Certain data tests are generic: they can be reused over and over again. A generi
104104
{% endtest %}
105105
```
106106

107-
You'll notice that there are two arguments, `model` and `column_name`, which are then templated into the query. This is what makes the data test "generic": it can be defined on as many columns as you like, across as many models as you like, and dbt will pass the values of `model` and `column_name` accordingly. Once that generic test has been defined, it can be added as a _property_ on any existing model (or source, seed, or snapshot). These properties are added in `.yml` files in the same directory as your resource.
107+
You'll notice that there are two arguments, `model` and `column_name`, which are then templated into the query. This is what makes the data test "generic": it can be defined on as many columns as you like, across as many models as you like, and dbt will pass the values of `model` and `column_name` accordingly. Once that generic test has been defined, it can be added as a _property_ on any existing model (or source, seed, or snapshot). These properties are added in `.yml` files in the same directory as your resource.
108108

109109
:::info
110110
If this is your first time working with adding properties to a resource, check out the docs on [declaring properties](/reference/configs-and-properties).
111111
:::
112112

113-
Out of the box, dbt ships with four generic data tests already defined: `unique`, `not_null`, `accepted_values` and `relationships`. Here's a full example using those tests on an `orders` model:
113+
Out of the box, dbt ships with four generic data tests already defined: `unique`, `not_null`, `accepted_values`, and `relationships`. Here's a full example using those tests on an `orders` model:
114114

115-
```yml
115+
```yaml
116116
117117
models:
118118
- name: orders
@@ -137,10 +137,10 @@ models:
137137
In plain English, these data tests translate to:
138138
* `unique`: the `order_id` column in the `orders` model should be unique
139139
* `not_null`: the `order_id` column in the `orders` model should not contain null values
140-
* `accepted_values`: the `status` column in the `orders` should be one of `'placed'`, `'shipped'`, `'completed'`, or `'returned'`
140+
* `accepted_values`: the `status` column in the `orders` model should be one of `'placed'`, `'shipped'`, `'completed'`, or `'returned'`
141141
* `relationships`: each `customer_id` in the `orders` model exists as an `id` in the `customers` <Term id="table" /> (also known as referential integrity)
142142

143-
Behind the scenes, dbt constructs a `select` query for each data test, using the parametrized query from the generic test block. These queries return the rows where your assertion is _not_ true; if the test returns zero rows, your assertion passes.
143+
Behind the scenes, dbt constructs a `select` query for each data test, using the parameterized query from the generic test block. These queries return the rows where your assertion is _not_ true; if the test returns zero rows, your assertion passes.
144144

145145
You can find more information about these data tests, and additional configurations (including [`severity`](/reference/resource-configs/severity) and [`tags`](/reference/resource-configs/tags)) in the [reference section](/reference/resource-properties/data-tests). You can also add descriptions to the Jinja macro that provides the core logic of a generic data test. Refer to the [Add description to generic data test logic](/best-practices/writing-custom-generic-tests#add-description-to-generic-data-test-logic) for more information.
146146

@@ -274,27 +274,27 @@ where {{ column_name }} is null
274274

275275
## Storing data test failures
276276

277-
Normally, a data test query will calculate failures as part of its execution. If you set the optional `--store-failures` flag, the [`store_failures`](/reference/resource-configs/store_failures), or the [`store_failures_as`](/reference/resource-configs/store_failures_as) configs, dbt will first save the results of a test query to a table in the database, and then query that table to calculate the number of failures.
277+
Normally, a data test query will calculate failures as part of its execution. If you set the optional `--store-failures` flag, the [`store_failures`](/reference/resource-configs/store_failures), or the [`store_failures_as`](/reference/resource-configs/store_failures_as) configs, dbt will first save the results of a test query to a table in the database, and then query that table to calculate the number of failures.
278278

279279
This workflow allows you to query and examine failing records much more quickly in development:
280280

281281
<Lightbox src="/img/docs/building-a-dbt-project/test-store-failures.gif" title="Store test failures in the database for faster development-time debugging."/>
282282

283-
Note that, if you select to store data test failures:
284-
* Test result tables are created in a schema suffixed or named `dbt_test__audit`, by default. It is possible to change this value by setting a `schema` config. (For more details on schema naming, see [using custom schemas](/docs/build/custom-schemas).)
283+
Note that, if you choose to store data test failures:
284+
- Test result tables are created in a schema suffixed or named `dbt_test__audit`, by default. It is possible to change this value by setting a `schema` config. (For more details on schema naming, see [using custom schemas](/docs/build/custom-schemas).)
285285
- A test's results will always **replace** previous failures for the same test.
286286

287287

288288

289289
## New `data_tests:` syntax
290-
291-
Data tests were historically called "tests" in dbt as the only form of testing available. With the introduction of unit tests, the key was renamed from `tests:` to `data_tests:`.
292290

293-
dbt still supports `tests:` in your YML configuration files for backwards-compatibility purposes, and you might see it used throughout our documentation. However, you can't have a `tests` and a `data_tests` key associated with the same resource (for example, a single model) at the same time.
291+
Data tests were historically called "tests" in dbt as the only form of testing available. With the introduction of unit tests, the key was renamed from `tests:` to `data_tests:`.
292+
293+
dbt still supports `tests:` in your YAML configuration files for backward-compatibility purposes, and you might see it used throughout our documentation. However, you can't have a `tests` and a `data_tests` key associated with the same resource (for example, a single model) at the same time.
294294

295295
<File name='models/schema.yml'>
296296

297-
```yml
297+
```yaml
298298
models:
299299
- name: orders
300300
columns:
@@ -308,7 +308,7 @@ models:
308308

309309
<File name='dbt_project.yml'>
310310

311-
```yml
311+
```yaml
312312
data_tests:
313313
+store_failures: true
314314
```

website/docs/docs/build/unit-tests.md

Lines changed: 64 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -12,19 +12,11 @@ keywords:
1212

1313
Historically, dbt's test coverage was confined to [“data” tests](/docs/build/data-tests), assessing the quality of input data or resulting datasets' structure. However, these tests could only be executed _after_ building a model.
1414

15-
There is an additional type of test to dbt - unit tests. In software programming, unit tests validate small portions of your functional code, and they work much the same way here. Unit tests allow you to validate your SQL modeling logic on a small set of static inputs _before_ you materialize your full model in production. Unit tests enable test-driven development, benefiting developer efficiency and code reliability.
15+
There is an additional type of test in dbt: unit tests. In software programming, unit tests validate small portions of your functional code, and they work much the same way here. Unit tests allow you to validate your SQL modeling logic on a small set of static inputs _before_ you materialize your full model in production. Unit tests enable test-driven development, benefiting developer efficiency and code reliability.
1616

17-
## Before you begin
17+
import UnitTestsPrereqs from '/snippets/_unit-tests-prereqs.md';
1818

19-
- We currently only support unit testing SQL models.
20-
- We currently only support adding unit tests to models in your _current_ project.
21-
- We currently _don't_ support unit testing models that use the [`materialized view`](/docs/build/materializations#materialized-view) materialization.
22-
- We currently _don't_ support unit testing models that use recursive SQL.
23-
- We currently _don't_ support unit testing models that use introspective queries.
24-
- If your model has multiple versions, by default the unit test will run on *all* versions of your model. Read [unit testing versioned models](/reference/resource-properties/unit-testing-versions) for more information.
25-
- Unit tests must be defined in a YML file in your [`models/` directory](/reference/project-configs/model-paths).
26-
- Table names must be aliased in order to unit test `join` logic.
27-
- Include all [`ref`](/reference/dbt-jinja-functions/ref) or [`source`](/reference/dbt-jinja-functions/source) model references in the unit test configuration as `input`s to avoid "node not found" errors during compilation.
19+
<UnitTestsPrereqs />
2820

2921
#### Adapter-specific caveats
3022
- You must specify all fields in a BigQuery `STRUCT` in a unit test. You cannot use only a subset of fields in a `STRUCT`.
@@ -112,16 +104,19 @@ unit_tests:
112104
model: dim_customers
113105
given:
114106
- input: ref('stg_customers')
107+
format: dict
115108
rows:
116109
- {email: cool@example.com, email_top_level_domain: example.com}
117110
- {email: cool@unknown.com, email_top_level_domain: unknown.com}
118111
- {email: badgmail.com, email_top_level_domain: gmail.com}
119112
- {email: missingdot@gmailcom, email_top_level_domain: gmail.com}
120113
- input: ref('top_level_email_domains')
114+
format: dict
121115
rows:
122116
- {tld: example.com}
123117
- {tld: gmail.com}
124118
expect:
119+
format: dict
125120
rows:
126121
- {email: cool@example.com, is_valid_email_address: true}
127122
- {email: cool@unknown.com, is_valid_email_address: false}
@@ -133,6 +128,61 @@ unit_tests:
133128

134129
The previous example defines the mock data using the inline `dict` format, but you can also use `csv` or `sql` either inline or in a separate fixture file. Store your fixture files in a `fixtures` subdirectory in any of your [test paths](/reference/project-configs/test-paths). For example, `tests/fixtures/my_unit_test_fixture.sql`.
135130

131+
The following examples show how to define mock data and expected output using `csv` and `sql`.
132+
133+
<File name='models/schema.yml'>
134+
135+
```yaml
136+
unit_tests:
137+
- name: test_is_valid_email_address__csv
138+
model: dim_customers
139+
given:
140+
- input: ref('stg_customers')
141+
format: dict
142+
rows:
143+
- {email: cool@example.com, email_top_level_domain: example.com}
144+
- {email: cool@unknown.com, email_top_level_domain: unknown.com}
145+
- {email: badgmail.com, email_top_level_domain: gmail.com}
146+
- {email: missingdot@gmailcom, email_top_level_domain: gmail.com}
147+
- input: ref('top_level_email_domains')
148+
format: csv
149+
rows: |
150+
tld
151+
example.com
152+
gmail.com
153+
expect:
154+
format: csv
155+
fixture: valid_email_address_fixture_output
156+
```
157+
158+
</File>
159+
160+
<File name='models/schema.yml'>
161+
162+
```yaml
163+
unit_tests:
164+
- name: test_is_valid_email_address__sql
165+
model: dim_customers
166+
given:
167+
- input: ref('stg_customers')
168+
format: dict
169+
rows:
170+
- {email: cool@example.com, email_top_level_domain: example.com}
171+
- {email: cool@unknown.com, email_top_level_domain: unknown.com}
172+
- {email: badgmail.com, email_top_level_domain: gmail.com}
173+
- {email: missingdot@gmailcom, email_top_level_domain: gmail.com}
174+
- input: ref('top_level_email_domains')
175+
format: sql
176+
rows: |
177+
select 'example.com' as tld union all
178+
select 'gmail.com' as tld
179+
expect:
180+
format: sql
181+
fixture: valid_email_address_fixture_output
182+
```
183+
184+
</File>
185+
136186
When using the `dict` or `csv` format, you only have to define the mock data for the columns relevant to you. This enables you to write succinct and _specific_ unit tests.
137187

138188
:::note
@@ -226,7 +276,7 @@ Your model is now ready for production! Adding this unit test helped catch an is
226276
When configuring your unit test, you can override the output of macros, vars, or environment variables. This enables you to unit test your incremental models in "full refresh" and "incremental" modes.
227277

228278
:::note
229-
Incremental models need to exist in the database first before running unit tests or doing a `dbt build`. Use the [`--empty` flag](/reference/commands/build#the---empty-flag) to build an empty version of the models to save warehouse spend. You can also optionally select only your incremental models using the [`--select` flag](/reference/node-selection/syntax#shorthand).
279+
Incremental models need to exist in the database before running unit tests or doing a `dbt build`. Use the [`--empty` flag](/reference/commands/build#the---empty-flag) to build an empty version of the models to save warehouse spend. You can also optionally select only your incremental models using the [`--select` flag](/reference/node-selection/syntax#shorthand).
230280

231281
```shell
232282
dbt run --select "config.materialized:incremental" --empty
@@ -260,7 +310,7 @@ where event_time > (select max(event_time) from {{ this }})
260310

261311
You can define unit tests on `my_incremental_model` to ensure your incremental logic is working as expected:
262312

263-
```yml
313+
```yaml
264314
265315
unit_tests:
266316
- name: my_incremental_model_full_refresh_mode
@@ -307,7 +357,7 @@ There is currently no way to unit test whether the dbt framework inserted/merged
307357

308358
If you want to unit test a model that depends on an ephemeral model, you must use `format: sql` for that input.
309359

310-
```yml
360+
```yaml
311361
unit_tests:
312362
- name: my_unit_test
313363
model: dim_customers

0 commit comments

Comments
 (0)