SAO doc improvements #8234

luna-bianca · 2025-11-28T13:07:18Z

What are you changing in this pull request and why?

Slack thread 1
Slack thread 2

Previews:

Checklist

The changes in this PR meet the docs style guide/fundamentals required for all content.
Applied the proper versioning rules if the content is for specific dbt version(s): (version a whole page or version a block of content
The content in this PR requires a dbt release note, so I added one to the release notes page.).

🚀 Deployment available! Here are the direct links to the updated files:

vercel · 2025-11-28T13:07:25Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Review	Updated (UTC)
docs-getdbt-com	Ready	Preview	Jan 9, 2026 4:40pm

luna-bianca · 2025-11-28T14:28:53Z

website/docs/docs/deploy/state-aware-setup.md

+You can use the following optional parameters to customize your state-aware orchestration:

- `loaded_at_query`: Define a custom freshness condition in SQL to account for partial loading or streaming data.
+|Parameter | Description | Allowed values | Supports Jinja |


Converted the parameter descriptions to a table format

reubenmc

Thanks @luna-bianca! @evabgood and I just added some feedback. Things are starting to look great!

reubenmc · 2026-01-08T23:05:21Z

website/docs/best-practices/materializations/materializations-guide-4-incremental-models.md

 - 🕐 For example if most of our records for `2022-01-30` come in the raw schema of our warehouse on the morning of `2022-01-31`, but a handful don’t get loaded til `2022-02-02`, how might we tackle that? There will already be `max(updated_at)` timestamps of `2022-01-31` in the warehouse, filtering out those late records. **They’ll never make it to our model.**
 - 🪟 To mitigate this, we can add a **lookback window** to our **cutoff** point. By **subtracting a few days** from the `max(updated_at)`, we would capture any late data within the window of what we subtracted.
 - 👯 As long as we have a **`unique_key` defined in our config**, we’ll simply update existing rows and avoid duplication. We process more data this way, but in a fixed way, and it keeps our model hewing closer to the source data.
+- If you're using state-aware orchestration, make sure its freshness detection logic accounts for late-arriving data. By default, dbt uses warehouse metadata, which is updated whenever new rows arrive, even if their event timestamps are in the past. However, if you configure a `loaded_at_field` or `loaded_at_query` that uses an event timestamp (for example, `event_date`), late-arriving data may not increase the `loaded_at` value. In this case, state-aware orchestration may skip rebuilding the incremental model, even though your lookback window would normally pick up those records. To ensure late-arriving data is detected, configure your `loaded_at_field` or `loaded_at_query` to align with the same lookback window used in your incremental filter.


I feel like this needs to be split into three cases as it's currently confusing. These are suggestions so please edit!

Using State-aware orchestration with Incremental Models

By default, SAO uses dbt warehouse metadata to determine source freshness. This means that dbt will consider a source to have new data whenever a new row arrives. This could lead to running your models more often than ideal.

To avoid this issue, you can instead tell dbt exactly which field to look at for freshess by configuing a loaded_at_field for a specific column or a loaded_at_query with custom SQL (LINK TO DOCS ON LOADED AT OPTIONS).

Even with a loaded_at_field or loaded_at_query, late arriving records may have an earlier event timestamp. To ensure late-arriving data is detected, configure your loaded_at_field or loaded_at_query to align with the same lookback window used in your incremental filter.

Added new section here: https://github.com/dbt-labs/docs.getdbt.com/pull/8234/changes#diff-b64f84159e25ae2681d388f4b7ebf20e8b3c52b3c07f767345ad1b4dfa60fe62R152

reubenmc · 2026-01-08T23:09:34Z

website/docs/docs/deploy/state-aware-about.md

+- Every macro, variable, or templated logic is resolved before state-aware orchestration checks for changes.
+- If you use dynamic content (for example, `{{ run_started_at }}`), state-aware orchestration may detect that as a change even if the “static” SQL template hasn’t changed. This may result in more frequent model rebuilds.
+- Any change to a macro definition or templated logic will be treated as a code change, even if the underlying data or SQL structure remains the same.
+- If you want to leave comments in your source code but don’t want to trigger rebuilds, it is recommended to use regular SQL comments (for example, `-- This is a single-line comment in SQL`) in your query. State-aware orchestration ignores comment-only changes; such annotations will not force model rebuilds across the DAG.


This is currently true, however this should change in a couple of weeks, so it's probably not worth updating right now. Instead, this should (once it goes out) be added to reflect the new behavior.

https://www.notion.so/dbtlabs/Code-changes-for-non-deterministic-SQL-2a4bb38ebda7807386f6ee38e5b0f892?source=copy_link

Detecting code changes

We first look for changes in the pre-rendered SQL (like Mantle/Core does)

iff there is a change, we look at the post-complied SQL (with whitespace and comments stripped out like we do for Fusion currently)

Removed Detecting code changes section for now

reubenmc · 2026-01-08T23:10:42Z

website/docs/docs/deploy/state-aware-about.md

+
+### Handling concurrent jobs
+
+If two separate jobs both depend on the same downstream model (for example, `model_ab`), and both jobs detect upstream changes (`updates_on = any`), then `model_ab` may run twice &mdash; once per job.


Clarify: only if something has changed though. If nothing has changes, then the second job will simply reuse model_ab

Revised here: https://github.com/dbt-labs/docs.getdbt.com/pull/8234/changes#diff-ad798a159c003c98c28f29456ba1d0e295b58d33c976f5ed18c07c567f822080R45

reubenmc · 2026-01-08T23:13:34Z

website/docs/docs/deploy/state-aware-about.md

+
+If two separate jobs both depend on the same downstream model (for example, `model_ab`), and both jobs detect upstream changes (`updates_on = any`), then `model_ab` may run twice &mdash; once per job.
+
+Under state-aware orchestration, each job independently evaluates whether a model needs rebuilding based on the model’s compiled code and upstream data state. It does not enforce a single build per model across different jobs.


I don't like this. This is really more like:

Under state-aware orchestration, all job read and write from the same shared state and build a model only when either the code or data state has changed. This means that each job individually evaulates whether a model needs rebuilding based on the model’s compiled code and upstream data state.

Could also add: If you want to prevent a job from being built too frequently even when the code or data state has changed, you can slow down any model by using the build_after config (LINK TO DOCS ON HOW TO DO THIS).

Revised here: https://github.com/dbt-labs/docs.getdbt.com/pull/8234/changes#diff-ad798a159c003c98c28f29456ba1d0e295b58d33c976f5ed18c07c567f822080R47

And added build_after paragraph here: https://github.com/dbt-labs/docs.getdbt.com/pull/8234/changes#diff-ad798a159c003c98c28f29456ba1d0e295b58d33c976f5ed18c07c567f822080R54

reubenmc · 2026-01-08T23:18:59Z

website/docs/faqs/Runs/sao-difference-core.md

+- Upstream data changes at runtime and model-level freshness settings
+- Shared state across jobs
+
+This helps avoid unnecessary rebuilds when underlying source files changed without changing the compiled logic, while still rebuilding when upstream data changes require it.


To add: While Core did these for a single run in a single job, SAO with Fusion does this in real-time across every job in the enviroment to manage state and ensure you're not building any models when things haven't changed, no matter which job a model is built in.

Added here: https://github.com/dbt-labs/docs.getdbt.com/pull/8234/changes#diff-d514ef78bdf493d2f966e24d724d591a1ee87ed9cc136859768a85d6bbe8597fR20

SAO doc improvements

fa396d9

github-actions bot added the content Improvements or additions to content label Nov 28, 2025

vercel bot deployed to Preview November 28, 2025 13:09 View deployment

Edits

e22a2b6

vercel bot deployed to Preview November 28, 2025 13:29 View deployment

luna-bianca commented Nov 28, 2025

View reviewed changes

Update state-aware-setup.md

2e0a572

vercel bot deployed to Preview November 28, 2025 14:50 View deployment

More info

f358e31

vercel bot deployed to Preview December 4, 2025 12:31 View deployment

Merge branch 'current' into SAO-doc-improvements

740f1b9

vercel bot deployed to Preview December 4, 2025 12:53 View deployment

luna-bianca marked this pull request as ready for review December 4, 2025 17:10

luna-bianca requested a review from a team as a code owner December 4, 2025 17:10

luna-bianca requested a review from reubenmc December 4, 2025 17:11

Add late-arriving data info

53aeeaf

vercel bot deployed to Preview December 15, 2025 13:35 View deployment

Update state-aware-setup.md

ef09f5b

vercel bot deployed to Preview December 15, 2025 14:25 View deployment

Merge branch 'current' into SAO-doc-improvements

60ea3b4

vercel bot deployed to Preview December 17, 2025 15:51 View deployment

Merge branch 'current' into SAO-doc-improvements

7e356e0

vercel bot deployed to Preview December 17, 2025 17:48 View deployment

Merge branch 'current' into SAO-doc-improvements

451e1fb

vercel bot deployed to Preview December 18, 2025 11:32 View deployment

Merge branch 'current' into SAO-doc-improvements

cb640d8

vercel bot deployed to Preview January 6, 2026 16:24 View deployment

reubenmc reviewed Jan 8, 2026

View reviewed changes

Apply comments from Eva and Reuben

0c06e69

vercel bot deployed to Preview January 9, 2026 15:10 View deployment

luna-bianca and others added 2 commits January 9, 2026 23:11

Update state-aware-setup.md

aac82aa

Merge branch 'current' into SAO-doc-improvements

11dfef3

vercel bot deployed to Preview January 9, 2026 15:17 View deployment

Merge branch 'current' into SAO-doc-improvements

b20620f

vercel bot deployed to Preview January 9, 2026 15:25 View deployment

luna-bianca requested a review from evabgood January 9, 2026 15:29

Merge branch 'current' into SAO-doc-improvements

15defd7

vercel bot deployed to Preview January 9, 2026 16:40 View deployment


		### Handling concurrent jobs

		If two separate jobs both depend on the same downstream model (for example, `model_ab`), and both jobs detect upstream changes (`updates_on = any`), then `model_ab` may run twice — once per job.


		If two separate jobs both depend on the same downstream model (for example, `model_ab`), and both jobs detect upstream changes (`updates_on = any`), then `model_ab` may run twice — once per job.

		Under state-aware orchestration, each job independently evaluates whether a model needs rebuilding based on the model’s compiled code and upstream data state. It does not enforce a single build per model across different jobs.

SAO doc improvements #8234

Are you sure you want to change the base?

SAO doc improvements #8234

Conversation

luna-bianca commented Nov 28, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What are you changing in this pull request and why?

Checklist

Uh oh!

vercel bot commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

reubenmc left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

luna-bianca commented Nov 28, 2025 •

edited by github-actions bot

Loading

vercel bot commented Nov 28, 2025 •

edited

Loading