More sophisticated bounds handling for temporal averaging #735

pochedls · 2025-02-10T18:59:41Z

Description

This PR has more sophisticated handling of bounds for temporal operations. In particular, it will compare the dataset time bounds with that period that is targeted. For example, if you have a time point with arbitrary bounds of ["2019-12-28 00:00", "2020-01-03 00:00"] and want to create an average value for January 2020 (i.e., ["2020-01-01", "2020-01-01"]) this PR will correctly include 2 days of weight in that January average (since only 2 days of data are in January).

Closes [Feature]: More Sophisticated Bounds Handling for Temporal Averaging Operations #594

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
My changes generate no new warnings
Any dependent changes have been merged and published in downstream modules

If applicable:

I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass with my changes (locally and CI/CD build)
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

codecov · 2025-02-10T19:03:42Z

Codecov Report

❌ Patch coverage is 11.29032% with 55 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.98%. Comparing base (0d7e112) to head (b62383e).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
xcdat/temporal.py	11.29%	55 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##              main     #735      +/-   ##
===========================================
- Coverage   100.00%   96.98%   -3.02%     
===========================================
  Files           16       16              
  Lines         1767     1827      +60     
===========================================
+ Hits          1767     1772       +5     
- Misses           0       55      +55

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

pochedls

A very small start on this issue, but worth going over carefully because if we get the workflow right, we can use this PR (and modifications) as a template for more temporal operations. This mainly handles group averaging operations (we'd need to think more about how to apply this to climatologies, for example).

pochedls · 2025-02-10T19:14:25Z

xcdat/temporal.py

        return ds_departs


+    def compute_monthly_average(self, data_var):


This function wraps several steps/functions (that are defined below) in order to compute monthly averages (e.g., from hourly/daily/pentad data to monthly means):

ensure_bounds_order: function ensures that dataset bounds are in order [earlier time, later time] (since PR logic depends on this)

generate_monthly_bounds: function creates monthly bounds

get_temporal_weights: function computes weights for averaging source dataset into targeted time periods

_experimental_averager: function uses temporal weights to average data into targeted time periods

I think we could generalize this function (kind of like .temporal.group_average()) by having it call different functions to generate target bounds (e.g., generate_daily_bounds, generate_seasonal_bounds, generate_yearly_bounds). The other steps would work as-is.

pochedls · 2025-02-10T19:46:46Z

xcdat/temporal.py

+        return ds.temporal._experimental_averager(data_var, weights, target_bnds)
+
+
+    def _experimental_averager(self, data_var, weights, target_bnds):


This is intended to be a generic average averaging data variable information into targeted time periods (using the supplied weights).

pochedls · 2025-02-10T19:49:48Z

xcdat/temporal.py

+        return dsmean
+
+
+    def get_temporal_weights(self, target_bnds):


This function basically gets the intersection between the dataset's own time bounds and the targeted time bounds (i.e., averaging periods). For a given time step, it assigns weight proportional to the duration in which a given timestep is within the a given averaging period.

Note that this PR is ~10x slower than existing functionality. The slowdown is almost entirely in this function. If we could speed this step up, that would be great (but we likely can tolerate this slowdown, since the approach in this PR should be more robust/accurate).

pochedls · 2025-02-10T19:51:55Z

xcdat/temporal.py

+        return weights
+
+
+    def generate_monthly_bounds(self):


Prototype function for generating target bounds (i.e., what bins do you want to average your source data into). We could make other functions for other frequencies (e.g., daily, seasonal, yearly).

pochedls · 2025-02-10T19:52:12Z

xcdat/temporal.py

+        return monthly_time, monthly_bnds
+
+
+    def ensure_bounds_order(self):


This function just makes sure the bounds are ordered as expected.

tomvothecoder · 2025-07-17T17:43:53Z

For context, the current logic for generating temporal weights uses time bounds to compute the center time coordinate, which is then used to assign group labels. Grouping is done with Xarray’s groupby(). For example, when computing a monthly climatology, each data point is labeled based on the month of its center timestamp. If the time bounds are [2000-01-01, 2000-01-31], the center is 2000-01-16, which would be labeled as "Jan." This has been the expected behavior for users and should remain the default.

@pochedls we could introduce an optional bounds_aware parameter to the temporal APIs to support more sophisticated weighting based on the actual overlap between time bounds and group periods. As you've mentioned, it would cover instances like pentad data, where each time slice covers a 5-day span and some intervals may straddle two calendar months.

I tried to find if there were other packages that provided this feature, but none do (even native Xarray doesn't). This is a great opportunity to fill a gap for more accurate results.

Also, sorry for editing my messages after you get an email of my GitHub comment which looks different. I have a habit of doing this.

pochedls · 2025-07-17T18:34:29Z

For context, the current logic for generating temporal weights uses time bounds to compute the center time coordinate, which is then used to assign group labels. Groupng is done Xarray’s groupby(). For example, when computing a monthly climatology, each data point is labeled based on the month of its center timestamp. For example, if the time bounds are [2000-01-01, 2000-01-31], the center is 2000-01-16, which would be labeled as "Jan." This has been the expected behavior for users and should remain the default.

@pochedls we could introduce an optional bounds_aware parameter to the temporal APIs to support more sophisticated weighting based on the actual overlap between time bounds and group periods. As you've mentioned, it would cover instances like pentad data, where each time slice covers a 5-day span and some intervals may straddle two calendar months.

I tried to find if there were other packages that provided this feature, but none do (even native Xarray doesn't). This is a great opportunity to fill a gap for more accurate results.

Also, sorry for editing my messages after you get an email of my GitHub comment which looks different. I have a habit of doing this.

I don't have an immediate opinion on whether this more sophisticated logic should be kicked off with an optional argument.

Re: other packages: Yes...but are these other packages purporting to apply temporal weights? I think what we are doing is slightly different (and requires more sophisticated logic to ensure that we do it correctly).

Also...I edit/update GitHub all the time after posting...

tomvothecoder · 2025-07-17T20:40:01Z

I tried to find if there were other packages that provided this feature, but none do (even native Xarray doesn't). This is a great opportunity to fill a gap for more accurate results.

Re: other packages: Yes...but are these other packages purporting to apply temporal weights? I think what we are doing is slightly different (and requires more sophisticated logic to ensure that we do it correctly).

I'm not sure if I'm following you here. What I meant was I tried to find if any other package that offered #594 just to see if anybody has tried implementing it so far (and to analyze existing logic), but it doesn't like there are any. Does CDAT have this feature? Great opportunity for xCDAT to provide this capability.

pochedls · 2025-07-17T20:49:52Z

I tried to find if there were other packages that provided this feature, but none do (even native Xarray doesn't). This is a great opportunity to fill a gap for more accurate results.

Re: other packages: Yes...but are these other packages purporting to apply temporal weights? I think what we are doing is slightly different (and requires more sophisticated logic to ensure that we do it correctly).

I'm not sure if I'm following you here. What I meant was I tried to find if any other package that offered #594 just to see if anybody has tried implementing it so far (and to analyze existing logic), but it doesn't like there are any. Does CDAT have this feature? Great opportunity for xCDAT to provide this capability.

Other packages may do similar calculations, but typically without bound information or temporal weights (in which case they are probably doing what they advertise). Our temporal operations are explicitly weighted using the bounds to compute the weights. But the current logic is insufficient to do this generically – so we may be providing incorrect calculations in some cases (albeit, probably edge cases).

I realize this post does something to xcdat, but it is an example that isn't generalized (not a package).

github-actions bot added the type: enhancement New enhancement request label Feb 10, 2025

pochedls commented Feb 10, 2025

View reviewed changes

tomvothecoder added this to the FY25Q4 (07/01/25 - 09/30/25) milestone Jul 16, 2025

tomvothecoder moved this from Todo to In Progress in xCDAT Development Jul 16, 2025

tomvothecoder self-assigned this Jul 16, 2025

tomvothecoder marked this pull request as draft July 16, 2025 19:46

Prototyping more sophisticated bounds handling for temporal averaging

b62383e

tomvothecoder force-pushed the feature/594-enhance-temporal-averaging branch from 2dba031 to b62383e Compare July 16, 2025 19:59

tomvothecoder mentioned this pull request Jul 17, 2025

Add weight threshold option for temporal operations #683

Merged

9 tasks

tomvothecoder changed the title ~~More sophisticated bounds handling for temporal averaging~~ More sophisticated temporal weight generation with bounds Jul 17, 2025

tomvothecoder changed the title ~~More sophisticated temporal weight generation with bounds~~ More sophisticated bounds handling for temporal averaging Jul 17, 2025

		return ds_departs


		def compute_monthly_average(self, data_var):

		return ds.temporal._experimental_averager(data_var, weights, target_bnds)


		def _experimental_averager(self, data_var, weights, target_bnds):

		return monthly_time, monthly_bnds


		def ensure_bounds_order(self):

More sophisticated bounds handling for temporal averaging #735

Are you sure you want to change the base?

More sophisticated bounds handling for temporal averaging #735

Uh oh!

Conversation

pochedls commented Feb 10, 2025

Description

Checklist

Uh oh!

codecov bot commented Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

pochedls left a comment

Choose a reason for hiding this comment

Uh oh!

pochedls Feb 10, 2025

Choose a reason for hiding this comment

Uh oh!

pochedls Feb 10, 2025

Choose a reason for hiding this comment

Uh oh!

pochedls Feb 10, 2025

Choose a reason for hiding this comment

Uh oh!

pochedls Feb 10, 2025

Choose a reason for hiding this comment

Uh oh!

pochedls Feb 10, 2025

Choose a reason for hiding this comment

Uh oh!

pochedls Feb 10, 2025

Choose a reason for hiding this comment

Uh oh!

pochedls Feb 10, 2025

Choose a reason for hiding this comment

Uh oh!

tomvothecoder commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pochedls commented Jul 17, 2025

Uh oh!

tomvothecoder commented Jul 17, 2025

Uh oh!

pochedls commented Jul 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Feb 10, 2025 •

edited

Loading

tomvothecoder commented Jul 17, 2025 •

edited

Loading