Skip to content

Conversation

@pochedls
Copy link
Collaborator

Description

This PR has more sophisticated handling of bounds for temporal operations. In particular, it will compare the dataset time bounds with that period that is targeted. For example, if you have a time point with arbitrary bounds of ["2019-12-28 00:00", "2020-01-03 00:00"] and want to create an average value for January 2020 (i.e., ["2020-01-01", "2020-01-01"]) this PR will correctly include 2 days of weight in that January average (since only 2 days of data are in January).

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

If applicable:

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass with my changes (locally and CI/CD build)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

@github-actions github-actions bot added the type: enhancement New enhancement request label Feb 10, 2025
@codecov
Copy link

codecov bot commented Feb 10, 2025

Codecov Report

❌ Patch coverage is 11.29032% with 55 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.98%. Comparing base (0d7e112) to head (b62383e).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
xcdat/temporal.py 11.29% 55 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##              main     #735      +/-   ##
===========================================
- Coverage   100.00%   96.98%   -3.02%     
===========================================
  Files           16       16              
  Lines         1767     1827      +60     
===========================================
+ Hits          1767     1772       +5     
- Misses           0       55      +55     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator Author

@pochedls pochedls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A very small start on this issue, but worth going over carefully because if we get the workflow right, we can use this PR (and modifications) as a template for more temporal operations. This mainly handles group averaging operations (we'd need to think more about how to apply this to climatologies, for example).

return ds_departs


def compute_monthly_average(self, data_var):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function wraps several steps/functions (that are defined below) in order to compute monthly averages (e.g., from hourly/daily/pentad data to monthly means):

  • ensure_bounds_order: function ensures that dataset bounds are in order [earlier time, later time] (since PR logic depends on this)
  • generate_monthly_bounds: function creates monthly bounds
  • get_temporal_weights: function computes weights for averaging source dataset into targeted time periods
  • _experimental_averager: function uses temporal weights to average data into targeted time periods

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could generalize this function (kind of like .temporal.group_average()) by having it call different functions to generate target bounds (e.g., generate_daily_bounds, generate_seasonal_bounds, generate_yearly_bounds). The other steps would work as-is.

return ds.temporal._experimental_averager(data_var, weights, target_bnds)


def _experimental_averager(self, data_var, weights, target_bnds):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intended to be a generic average averaging data variable information into targeted time periods (using the supplied weights).

return dsmean


def get_temporal_weights(self, target_bnds):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function basically gets the intersection between the dataset's own time bounds and the targeted time bounds (i.e., averaging periods). For a given time step, it assigns weight proportional to the duration in which a given timestep is within the a given averaging period.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this PR is ~10x slower than existing functionality. The slowdown is almost entirely in this function. If we could speed this step up, that would be great (but we likely can tolerate this slowdown, since the approach in this PR should be more robust/accurate).

return weights


def generate_monthly_bounds(self):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prototype function for generating target bounds (i.e., what bins do you want to average your source data into). We could make other functions for other frequencies (e.g., daily, seasonal, yearly).

return monthly_time, monthly_bnds


def ensure_bounds_order(self):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function just makes sure the bounds are ordered as expected.

@tomvothecoder tomvothecoder moved this from Todo to In Progress in xCDAT Development Jul 16, 2025
@tomvothecoder tomvothecoder self-assigned this Jul 16, 2025
@tomvothecoder tomvothecoder marked this pull request as draft July 16, 2025 19:46
@tomvothecoder tomvothecoder force-pushed the feature/594-enhance-temporal-averaging branch from 2dba031 to b62383e Compare July 16, 2025 19:59
@tomvothecoder
Copy link
Collaborator

tomvothecoder commented Jul 17, 2025

For context, the current logic for generating temporal weights uses time bounds to compute the center time coordinate, which is then used to assign group labels. Grouping is done with Xarray’s groupby(). For example, when computing a monthly climatology, each data point is labeled based on the month of its center timestamp. If the time bounds are [2000-01-01, 2000-01-31], the center is 2000-01-16, which would be labeled as "Jan." This has been the expected behavior for users and should remain the default.

@pochedls we could introduce an optional bounds_aware parameter to the temporal APIs to support more sophisticated weighting based on the actual overlap between time bounds and group periods. As you've mentioned, it would cover instances like pentad data, where each time slice covers a 5-day span and some intervals may straddle two calendar months.

I tried to find if there were other packages that provided this feature, but none do (even native Xarray doesn't). This is a great opportunity to fill a gap for more accurate results.

Also, sorry for editing my messages after you get an email of my GitHub comment which looks different. I have a habit of doing this.

@tomvothecoder tomvothecoder changed the title More sophisticated bounds handling for temporal averaging More sophisticated temporal weight generation with bounds Jul 17, 2025
@tomvothecoder tomvothecoder changed the title More sophisticated temporal weight generation with bounds More sophisticated bounds handling for temporal averaging Jul 17, 2025
@pochedls
Copy link
Collaborator Author

For context, the current logic for generating temporal weights uses time bounds to compute the center time coordinate, which is then used to assign group labels. Groupng is done Xarray’s groupby(). For example, when computing a monthly climatology, each data point is labeled based on the month of its center timestamp. For example, if the time bounds are [2000-01-01, 2000-01-31], the center is 2000-01-16, which would be labeled as "Jan." This has been the expected behavior for users and should remain the default.

@pochedls we could introduce an optional bounds_aware parameter to the temporal APIs to support more sophisticated weighting based on the actual overlap between time bounds and group periods. As you've mentioned, it would cover instances like pentad data, where each time slice covers a 5-day span and some intervals may straddle two calendar months.

I tried to find if there were other packages that provided this feature, but none do (even native Xarray doesn't). This is a great opportunity to fill a gap for more accurate results.

Also, sorry for editing my messages after you get an email of my GitHub comment which looks different. I have a habit of doing this.

I don't have an immediate opinion on whether this more sophisticated logic should be kicked off with an optional argument.

Re: other packages: Yes...but are these other packages purporting to apply temporal weights? I think what we are doing is slightly different (and requires more sophisticated logic to ensure that we do it correctly).

Also...I edit/update GitHub all the time after posting...

@tomvothecoder
Copy link
Collaborator

I tried to find if there were other packages that provided this feature, but none do (even native Xarray doesn't). This is a great opportunity to fill a gap for more accurate results.

Re: other packages: Yes...but are these other packages purporting to apply temporal weights? I think what we are doing is slightly different (and requires more sophisticated logic to ensure that we do it correctly).

I'm not sure if I'm following you here. What I meant was I tried to find if any other package that offered #594 just to see if anybody has tried implementing it so far (and to analyze existing logic), but it doesn't like there are any. Does CDAT have this feature? Great opportunity for xCDAT to provide this capability.

@pochedls
Copy link
Collaborator Author

I tried to find if there were other packages that provided this feature, but none do (even native Xarray doesn't). This is a great opportunity to fill a gap for more accurate results.

Re: other packages: Yes...but are these other packages purporting to apply temporal weights? I think what we are doing is slightly different (and requires more sophisticated logic to ensure that we do it correctly).

I'm not sure if I'm following you here. What I meant was I tried to find if any other package that offered #594 just to see if anybody has tried implementing it so far (and to analyze existing logic), but it doesn't like there are any. Does CDAT have this feature? Great opportunity for xCDAT to provide this capability.

Other packages may do similar calculations, but typically without bound information or temporal weights (in which case they are probably doing what they advertise). Our temporal operations are explicitly weighted using the bounds to compute the weights. But the current logic is insufficient to do this generically – so we may be providing incorrect calculations in some cases (albeit, probably edge cases).

I realize this post does something to xcdat, but it is an example that isn't generalized (not a package).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: enhancement New enhancement request

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

[Feature]: More Sophisticated Bounds Handling for Temporal Averaging Operations

3 participants