You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: website/docs/docs/build/data-tests.md
+20-20Lines changed: 20 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -32,14 +32,14 @@ Data tests are assertions you make about your models and other resources in your
32
32
33
33
You can use data tests to improve the integrity of the SQL in each model by making assertions about the results generated. Out of the box, you can test whether a specified column in a model only contains non-null values, unique values, or values that have a corresponding value in another model (for example, a `customer_id` for an `order` corresponds to an `id` in the `customers` model), and values from a specified list. You can extend data tests to suit business logic specific to your organization – any assertion that you can make about your model in the form of a select query can be turned into a data test.
34
34
35
-
Data tests return a set of failing records. Generic data tests (a.k.a. schema tests) are defined using `test` blocks.
35
+
Data tests return a set of failing records. Generic data tests (also known as schema tests) are defined using `test` blocks.
36
36
37
-
Like almost everything in dbt, data tests are SQL queries. In particular, they are `select` statements that seek to grab "failing" records, ones that disprove your assertion. If you assert that a column is unique in a model, the test query selects for duplicates; if you assert that a column is never null, the test seeks after nulls. If the data test returns zero failing rows, it passes, and your assertion has been validated.
37
+
Like almost everything in dbt, data tests are SQL queries. In particular, they are `select` statements that seek to grab "failing" records, ones that disprove your assertion. If you assert that a column is unique in a model, the test query selects for duplicates; if you assert that a column is never null, the test seeks nulls. If the data test returns zero failing rows, it passes, and your assertion has been validated.
38
38
39
39
There are two ways of defining data tests in dbt:
40
40
41
-
- A **singular** data test is testing in its simplest form: If you can write a SQL query that returns failing rows, you can save that query in a `.sql` file within your [test directory](/reference/project-configs/test-paths). It's now a data test, and it will be executed by the `dbt test` command.
42
-
- A **generic** data test is a parameterized query that accepts arguments. The test query is defined in a special `test` block (like a [macro](jinja-macros)). Once defined, you can reference the generic test by name throughout your `.yml` files—define it on models, columns, sources, snapshots, and seeds. dbt ships with four generic data tests built in, and we think you should use them!
41
+
- A **singular** data test, in its simplest form, is when you write a SQL query that returns failing rows, you can save that query in a `.sql` file within your [test directory](/reference/project-configs/test-paths). It's now a data test, and it will be executed by the `dbt test` command.
42
+
- A **generic** data test is a parameterized query that accepts arguments. The test query is defined in a special `test` block (like a [macro](/docs/build/jinja-macros)). Once defined, you can reference the generic test by name throughout your `.yml` files—define it on models, columns, sources, snapshots, and seeds. dbt ships with four generic data tests built in, and we think you should use them!
43
43
44
44
Defining data tests is a great way to confirm that your outputs and inputs are as expected, and helps prevent regressions when your code changes. Because you can use them over and over again, making similar assertions with minor variations, generic data tests tend to be much more common—they should make up the bulk of your dbt data testing suite. That said, both ways of defining data tests have their time and place.
45
45
@@ -51,7 +51,7 @@ If you're new to dbt, we recommend that you check out our [online dbt Fundamenta
51
51
52
52
The simplest way to define a data test is by writing the exact SQL that will return failing records. We call these "singular" data tests, because they're one-off assertions usable for a single purpose.
53
53
54
-
These tests are defined in `.sql` files, typically in your `tests` directory (as defined by your [`test-paths` config](/reference/project-configs/test-paths)). You can use Jinja (including `ref` and `source`) in the test definition, just like you can when creating models. Each `.sql` file contains one `select` statement, and it defines one data test:
54
+
These tests are defined in `.sql` files, typically in your `tests` directory (as defined by your `test-paths` config). **Note:** The `tests/` directory (`test-paths`) is reserved for singular and generic data tests (SQL). Unit test YAML definitions must live under your project’s `model-paths` (for example, in the `models/` directory), not in `tests/`. You can use Jinja (including `ref` and `source`) in the test definition, just like you can when creating models. Each `.sql` file contains one `select` statement, and it defines one data test:
The name of this test is the name of the file: `assert_total_payment_amount_is_positive`.
71
+
The test name is the file name: `assert_total_payment_amount_is_positive`.
72
72
73
73
Note:
74
74
- Omit semicolons (;) at the end of the SQL statement in your singular test files, as they can cause your data test to fail.
@@ -92,7 +92,7 @@ data_tests:
92
92
Singular data tests are so easy that you may find yourself writing the same basic structure repeatedly, only changing the name of a column or model. By that point, the test isn't so singular! In that case, we recommend generic data tests.
93
93
94
94
## Generic data tests
95
-
Certain data tests are generic: they can be reused over and over again. A generic data test is defined in a `test` block, which contains a parametrized query and accepts arguments. It might look like:
95
+
Certain data tests are generic: they can be reused over and over again. A generic data test is defined in a `test` block, which contains a parameterized query and accepts arguments. It might look like:
96
96
97
97
```sql
98
98
{% test not_null(model, column_name) %}
@@ -104,15 +104,15 @@ Certain data tests are generic: they can be reused over and over again. A generi
104
104
{% endtest %}
105
105
```
106
106
107
-
You'll notice that there are two arguments, `model` and `column_name`, which are then templated into the query. This is what makes the data test "generic": it can be defined on as many columns as you like, across as many models as you like, and dbt will pass the values of `model` and `column_name` accordingly. Once that generic test has been defined, it can be added as a _property_ on any existing model (or source, seed, or snapshot). These properties are added in `.yml` files in the same directory as your resource.
107
+
You'll notice that there are two arguments, `model` and `column_name`, which are then templated into the query. This is what makes the data test "generic": it can be defined on as many columns as you like, across as many models as you like, and dbt will pass the values of `model` and `column_name` accordingly. Once that generic test has been defined, it can be added as a _property_ on any existing model (or source, seed, or snapshot). These properties are added in `.yml` files in the same directory as your resource.
108
108
109
109
:::info
110
110
If this is your first time working with adding properties to a resource, check out the docs on [declaring properties](/reference/configs-and-properties).
111
111
:::
112
112
113
-
Out of the box, dbt ships with four generic data tests already defined: `unique`, `not_null`, `accepted_values` and `relationships`. Here's a full example using those tests on an `orders` model:
113
+
Out of the box, dbt ships with four generic data tests already defined: `unique`, `not_null`, `accepted_values`, and `relationships`. Here's a full example using those tests on an `orders` model:
114
114
115
-
```yml
115
+
```yaml
116
116
117
117
models:
118
118
- name: orders
@@ -137,10 +137,10 @@ models:
137
137
In plain English, these data tests translate to:
138
138
* `unique`: the `order_id` column in the `orders` model should be unique
139
139
* `not_null`: the `order_id` column in the `orders` model should not contain null values
140
-
* `accepted_values`: the `status` column in the `orders` should be one of `'placed'`, `'shipped'`, `'completed'`, or `'returned'`
140
+
* `accepted_values`: the `status` column in the `orders` model should be one of `'placed'`, `'shipped'`, `'completed'`, or `'returned'`
141
141
* `relationships`: each `customer_id` in the `orders` model exists as an `id` in the `customers` <Term id="table" /> (also known as referential integrity)
142
142
143
-
Behind the scenes, dbt constructs a `select` query for each data test, using the parametrized query from the generic test block. These queries return the rows where your assertion is _not_ true; if the test returns zero rows, your assertion passes.
143
+
Behind the scenes, dbt constructs a `select` query for each data test, using the parameterized query from the generic test block. These queries return the rows where your assertion is _not_ true; if the test returns zero rows, your assertion passes.
144
144
145
145
You can find more information about these data tests, and additional configurations (including [`severity`](/reference/resource-configs/severity) and [`tags`](/reference/resource-configs/tags)) in the [reference section](/reference/resource-properties/data-tests). You can also add descriptions to the Jinja macro that provides the core logic of a generic data test. Refer to the [Add description to generic data test logic](/best-practices/writing-custom-generic-tests#add-description-to-generic-data-test-logic) for more information.
146
146
@@ -274,27 +274,27 @@ where {{ column_name }} is null
274
274
275
275
## Storing data test failures
276
276
277
-
Normally, a data test query will calculate failures as part of its execution. If you set the optional `--store-failures` flag, the [`store_failures`](/reference/resource-configs/store_failures), or the [`store_failures_as`](/reference/resource-configs/store_failures_as) configs, dbt will first save the results of a test query to a table in the database, and then query that table to calculate the number of failures.
277
+
Normally, a data test query will calculate failures as part of its execution. If you set the optional `--store-failures` flag, the [`store_failures`](/reference/resource-configs/store_failures), or the [`store_failures_as`](/reference/resource-configs/store_failures_as) configs, dbt will first save the results of a test query to a table in the database, and then query that table to calculate the number of failures.
278
278
279
279
This workflow allows you to query and examine failing records much more quickly in development:
280
280
281
281
<Lightbox src="/img/docs/building-a-dbt-project/test-store-failures.gif" title="Store test failures in the database for faster development-time debugging."/>
282
282
283
-
Note that, if you select to store data test failures:
284
-
* Test result tables are created in a schema suffixed or named `dbt_test__audit`, by default. It is possible to change this value by setting a `schema` config. (For more details on schema naming, see [using custom schemas](/docs/build/custom-schemas).)
283
+
Note that, if you choose to store data test failures:
284
+
-Test result tables are created in a schema suffixed or named `dbt_test__audit`, by default. It is possible to change this value by setting a `schema` config. (For more details on schema naming, see [using custom schemas](/docs/build/custom-schemas).)
285
285
- A test's results will always **replace** previous failures for the same test.
286
286
287
287
288
288
289
289
## New `data_tests:` syntax
290
-
291
-
Data tests were historically called "tests" in dbt as the only form of testing available. With the introduction of unit tests, the key was renamed from `tests:` to `data_tests:`.
292
290
293
-
dbt still supports `tests:` in your YML configuration files for backwards-compatibility purposes, and you might see it used throughout our documentation. However, you can't have a `tests` and a `data_tests` key associated with the same resource (for example, a single model) at the same time.
291
+
Data tests were historically called "tests" in dbt as the only form of testing available. With the introduction of unit tests, the key was renamed from `tests:` to `data_tests:`.
292
+
293
+
dbt still supports `tests:` in your YAML configuration files for backward-compatibility purposes, and you might see it used throughout our documentation. However, you can't have a `tests` and a `data_tests` key associated with the same resource (for example, a single model) at the same time.
Copy file name to clipboardExpand all lines: website/docs/docs/build/unit-tests.md
+64-14Lines changed: 64 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,19 +12,11 @@ keywords:
12
12
13
13
Historically, dbt's test coverage was confined to [“data” tests](/docs/build/data-tests), assessing the quality of input data or resulting datasets' structure. However, these tests could only be executed _after_ building a model.
14
14
15
-
There is an additional type of test to dbt - unit tests. In software programming, unit tests validate small portions of your functional code, and they work much the same way here. Unit tests allow you to validate your SQL modeling logic on a small set of static inputs _before_ you materialize your full model in production. Unit tests enable test-driven development, benefitingdeveloper efficiency and code reliability.
15
+
There is an additional type of test in dbt: unit tests. In software programming, unit tests validate small portions of your functional code, and they work much the same way here. Unit tests allow you to validate your SQL modeling logic on a small set of static inputs _before_ you materialize your full model in production. Unit tests enable test-driven development, benefitingdeveloper efficiency and code reliability.
16
16
17
-
## Before you begin
17
+
import UnitTestsPrereqs from '/snippets/_unit-tests-prereqs.md';
18
18
19
-
- We currently only support unit testing SQL models.
20
-
- We currently only support adding unit tests to models in your _current_ project.
21
-
- We currently _don't_ support unit testing models that use the [`materialized view`](/docs/build/materializations#materialized-view) materialization.
22
-
- We currently _don't_ support unit testing models that use recursive SQL.
23
-
- We currently _don't_ support unit testing models that use introspective queries.
24
-
- If your model has multiple versions, by default the unit test will run on *all* versions of your model. Read [unit testing versioned models](/reference/resource-properties/unit-testing-versions) for more information.
25
-
- Unit tests must be defined in a YML file in your [`models/` directory](/reference/project-configs/model-paths).
26
-
- Table names must be aliased in order to unit test `join` logic.
27
-
- Include all [`ref`](/reference/dbt-jinja-functions/ref) or [`source`](/reference/dbt-jinja-functions/source) model references in the unit test configuration as `input`s to avoid "node not found" errors during compilation.
19
+
<UnitTestsPrereqs />
28
20
29
21
#### Adapter-specific caveats
30
22
- You must specify all fields in a BigQuery `STRUCT` in a unit test. You cannot use only a subset of fields in a `STRUCT`.
The previous example defines the mock data using the inline `dict` format, but you can also use `csv` or `sql` either inline or in a separate fixture file. Store your fixture files in a `fixtures` subdirectory in any of your [test paths](/reference/project-configs/test-paths). For example, `tests/fixtures/my_unit_test_fixture.sql`.
135
130
131
+
The following examples show how to define mock data and expected output using `csv` and `sql`.
When using the `dict` or `csv` format, you only have to define the mock data for the columns relevant to you. This enables you to write succinct and _specific_ unit tests.
137
187
138
188
:::note
@@ -226,7 +276,7 @@ Your model is now ready for production! Adding this unit test helped catch an is
226
276
When configuring your unit test, you can override the output of macros, vars, or environment variables. This enables you to unit test your incremental models in "full refresh" and "incremental" modes.
227
277
228
278
:::note
229
-
Incremental models need to exist in the database first before running unit tests or doing a `dbt build`. Use the [`--empty` flag](/reference/commands/build#the---empty-flag) to build an empty version of the models to save warehouse spend. You can also optionally select only your incremental models using the [`--select` flag](/reference/node-selection/syntax#shorthand).
279
+
Incremental models need to exist in the database before running unit tests or doing a `dbt build`. Use the [`--empty` flag](/reference/commands/build#the---empty-flag) to build an empty version of the models to save warehouse spend. You can also optionally select only your incremental models using the [`--select` flag](/reference/node-selection/syntax#shorthand).
230
280
231
281
```shell
232
282
dbt run --select "config.materialized:incremental" --empty
@@ -260,7 +310,7 @@ where event_time > (select max(event_time) from {{ this }})
260
310
261
311
You can define unit tests on `my_incremental_model` to ensure your incremental logic is working as expected:
262
312
263
-
```yml
313
+
```yaml
264
314
265
315
unit_tests:
266
316
- name: my_incremental_model_full_refresh_mode
@@ -307,7 +357,7 @@ There is currently no way to unit test whether the dbt framework inserted/merged
307
357
308
358
If you want to unit test a model that depends on an ephemeral model, you must use `format: sql` for that input.
0 commit comments