Skip to content

Fix time period end boundaries for time granularities finer than day#1957

Open
akuttig-block wants to merge 2 commits intodbt-labs:mainfrom
akuttig-block:akuttig/time_grain_fix
Open

Fix time period end boundaries for time granularities finer than day#1957
akuttig-block wants to merge 2 commits intodbt-labs:mainfrom
akuttig-block:akuttig/time_grain_fix

Conversation

@akuttig-block
Copy link

Summary

Fixes a critical bug in time period boundary calculations that caused generated SQL queries to exclude most of the end date when filtering data at DAY granularity and above.

Problem

The adjust_to_end_of_period method in DateutilTimePeriodAdjuster was incorrectly truncating end-of-period timestamps to 00:00:00 (midnight) for requested granularities coarser than DAY. This resulted in SQL queries that only captured the first portion of the end date rather than the entire day when the underlying dimension had a granularity finer than day.

Example of Incorrect Behavior

For a query requesting data grouped by day for 01/01/2025 through 01/31/2025 on a second-granularity time dimension, the generated SQL would be:

WHERE DATE_TRUNC('second', occurred_at) BETWEEN '2025-01-01 00:00:00' AND '2025-01-31 00:00:00'

This would only include records at exactly midnight on January 31st, excluding the remaining 23:59:59 of that day.

Solution

Updated the end-of-period calculation to set the time to 23:59:59 when requesting DAY granularity and above, ensuring the full period is captured:

WHERE DATE_TRUNC('second', occurred_at) BETWEEN '2025-01-01 00:00:00' AND '2025-01-31 23:59:59'

Changes

Core Fix

  • dateutil_adjuster.py: Changed end-of-period time from hour=0, minute=0, second=0 to hour=23, minute=59, second=59 for granularities >= DAY

Test Improvements

  • Added comprehensive test coverage for sub-day granularities (SECOND, MINUTE, HOUR)
  • Added test cases with various times throughout the day (midnight, noon, 23:59:59, etc.)
  • Added validation assertions to ensure:
    • End-of-period timestamps are always >= start-of-period
    • Correct time components for each granularity:
      • MINUTE+: second=59
      • HOUR+: minute=59, second=59
      • DAY+: hour=23, minute=59, second=59

Impact

This fix ensures that:

  • ✅ Time range constraints properly capture the full period when filtering data
  • ✅ Queries no longer have off-by-one errors that exclude most of the end date
  • ✅ Data analysis results are accurate and complete for the requested time ranges

Testing

All existing tests pass with updated snapshots. The new validation assertions provide ongoing protection against regression of this issue.

Previously, the adjust_to_end_of_period method incorrectly set the time
to 00:00:00 (start of day) for granularities >= DAY. This caused generated
SQL queries to exclude most of the end date. For example, a query for
January 2025 would generate:
  WHERE DATE_TRUNC('second', occurred_at) BETWEEN '2025-01-01' AND '2025-01-31'
This would only include records at exactly midnight on Jan 31, excluding
the remaining 23:59:59 of that day.

Changes:
- Update dateutil_adjuster.py to set hour=23, minute=59, second=59 for
  end-of-period calculations at DAY granularity and above
- Add comprehensive test coverage for sub-day granularities (SECOND,
  MINUTE, HOUR) including various times throughout the day
- Add validation assertions to ensure end-of-period timestamps are
  always >= start-of-period and have correct time components:
  * MINUTE+: second=59
  * HOUR+: minute=59, second=59
  * DAY+: hour=23, minute=59, second=59
- Update test snapshots to reflect corrected period boundaries
- Fix affected query parsing test snapshots

This ensures that time range constraints properly capture the full
period when filtering data, preventing off-by-one errors in queries.
@akuttig-block akuttig-block requested a review from a team as a code owner January 7, 2026 23:27
@cla-bot
Copy link

cla-bot bot commented Jan 7, 2026

Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA.

In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, please reach out through a comment on this PR.

CLA has not been signed by users: @akuttig-block

@cla-bot
Copy link

cla-bot bot commented Jan 7, 2026

Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA.

In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, please reach out through a comment on this PR.

CLA has not been signed by users: @akuttig-block

Copy link
Contributor

@tlento tlento left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. Can you provide more details on the scenario here? Is the time dimension itself defined with a granularity less than DAY? Or is the time dimension defined with a granularity of DAY but the underlying data column has a finer grain?

date_to_adjust = date_to_adjust.replace(microsecond=0)
if time_granularity.to_int() >= TimeGranularity.DAY.to_int():
date_to_adjust = date_to_adjust.replace(hour=0, minute=0, second=0)
date_to_adjust = date_to_adjust.replace(hour=23, minute=59, second=59)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addition of these extra values is causing the rendered SQL to change in cases where the output expression was correct (i.e., for anything >= DAY grain). We'll need to consider the best approach here. This may be it, although I'll need to recreate a bunch of snapshots and run further tests before we can get this merged, but there are some other options which might be less sensitive to the underlying column granularity (i.e., in the warehouse rather than in the MetricFlow model spec).

@akuttig-block
Copy link
Author

akuttig-block commented Jan 28, 2026

Sorry for the delay. Can you provide more details on the scenario here? Is the time dimension itself defined with a granularity less than DAY? Or is the time dimension defined with a granularity of DAY but the underlying data column has a finer grain?

The dimension is defined with granularity less than day, but the SQL request is at day or higher.

    - name: occurred_at
      type: time
      type_params:
        time_granularity: second

@tlento
Copy link
Contributor

tlento commented Feb 2, 2026

@akuttig-block ok, I see.

I think the right answer is to stop using BETWEEN and update the adjustment on end date to be adjusted_end_time + 1 granularity_unit. So the rendered SQL would become metric_time >= adjusted_start_time and metric_time < adjusted_end_time and the adjusted end time in this example would be period_end_date + 1 day.

If you're willing to take a shot at that please sign the CLA and update the PR and we can work with you to get the snapshots updated and make CI happy. Otherwise one of us will work on this at some point. The time_constraint property isn't fully supported - it's actually a legacy hook that we would like to remove in favor of more holistic predicate pushdown - but this seems like a good thing to clean up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants