Fix time period end boundaries for time granularities finer than day#1957
Fix time period end boundaries for time granularities finer than day#1957akuttig-block wants to merge 2 commits intodbt-labs:mainfrom
Conversation
Previously, the adjust_to_end_of_period method incorrectly set the time
to 00:00:00 (start of day) for granularities >= DAY. This caused generated
SQL queries to exclude most of the end date. For example, a query for
January 2025 would generate:
WHERE DATE_TRUNC('second', occurred_at) BETWEEN '2025-01-01' AND '2025-01-31'
This would only include records at exactly midnight on Jan 31, excluding
the remaining 23:59:59 of that day.
Changes:
- Update dateutil_adjuster.py to set hour=23, minute=59, second=59 for
end-of-period calculations at DAY granularity and above
- Add comprehensive test coverage for sub-day granularities (SECOND,
MINUTE, HOUR) including various times throughout the day
- Add validation assertions to ensure end-of-period timestamps are
always >= start-of-period and have correct time components:
* MINUTE+: second=59
* HOUR+: minute=59, second=59
* DAY+: hour=23, minute=59, second=59
- Update test snapshots to reflect corrected period boundaries
- Fix affected query parsing test snapshots
This ensures that time range constraints properly capture the full
period when filtering data, preventing off-by-one errors in queries.
|
Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA. In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, please reach out through a comment on this PR. CLA has not been signed by users: @akuttig-block |
|
Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA. In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, please reach out through a comment on this PR. CLA has not been signed by users: @akuttig-block |
tlento
left a comment
There was a problem hiding this comment.
Sorry for the delay. Can you provide more details on the scenario here? Is the time dimension itself defined with a granularity less than DAY? Or is the time dimension defined with a granularity of DAY but the underlying data column has a finer grain?
| date_to_adjust = date_to_adjust.replace(microsecond=0) | ||
| if time_granularity.to_int() >= TimeGranularity.DAY.to_int(): | ||
| date_to_adjust = date_to_adjust.replace(hour=0, minute=0, second=0) | ||
| date_to_adjust = date_to_adjust.replace(hour=23, minute=59, second=59) |
There was a problem hiding this comment.
The addition of these extra values is causing the rendered SQL to change in cases where the output expression was correct (i.e., for anything >= DAY grain). We'll need to consider the best approach here. This may be it, although I'll need to recreate a bunch of snapshots and run further tests before we can get this merged, but there are some other options which might be less sensitive to the underlying column granularity (i.e., in the warehouse rather than in the MetricFlow model spec).
The dimension is defined with granularity less than day, but the SQL request is at day or higher. |
|
@akuttig-block ok, I see. I think the right answer is to stop using If you're willing to take a shot at that please sign the CLA and update the PR and we can work with you to get the snapshots updated and make CI happy. Otherwise one of us will work on this at some point. The time_constraint property isn't fully supported - it's actually a legacy hook that we would like to remove in favor of more holistic predicate pushdown - but this seems like a good thing to clean up. |
Summary
Fixes a critical bug in time period boundary calculations that caused generated SQL queries to exclude most of the end date when filtering data at DAY granularity and above.
Problem
The
adjust_to_end_of_periodmethod inDateutilTimePeriodAdjusterwas incorrectly truncating end-of-period timestamps to00:00:00(midnight) for requested granularities coarser thanDAY. This resulted in SQL queries that only captured the first portion of the end date rather than the entire day when the underlying dimension had a granularity finer than day.Example of Incorrect Behavior
For a query requesting data grouped by day for 01/01/2025 through 01/31/2025 on a second-granularity time dimension, the generated SQL would be:
This would only include records at exactly midnight on January 31st, excluding the remaining 23:59:59 of that day.
Solution
Updated the end-of-period calculation to set the time to
23:59:59when requesting DAY granularity and above, ensuring the full period is captured:Changes
Core Fix
dateutil_adjuster.py: Changed end-of-period time fromhour=0, minute=0, second=0tohour=23, minute=59, second=59for granularities >= DAYTest Improvements
second=59minute=59, second=59hour=23, minute=59, second=59Impact
This fix ensures that:
Testing
All existing tests pass with updated snapshots. The new validation assertions provide ongoing protection against regression of this issue.