-
Notifications
You must be signed in to change notification settings - Fork 16
More sophisticated bounds handling for temporal averaging #735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #735 +/- ##
===========================================
- Coverage 100.00% 96.98% -3.02%
===========================================
Files 16 16
Lines 1767 1827 +60
===========================================
+ Hits 1767 1772 +5
- Misses 0 55 +55 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A very small start on this issue, but worth going over carefully because if we get the workflow right, we can use this PR (and modifications) as a template for more temporal operations. This mainly handles group averaging operations (we'd need to think more about how to apply this to climatologies, for example).
| return ds_departs | ||
|
|
||
|
|
||
| def compute_monthly_average(self, data_var): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function wraps several steps/functions (that are defined below) in order to compute monthly averages (e.g., from hourly/daily/pentad data to monthly means):
ensure_bounds_order: function ensures that dataset bounds are in order [earlier time, later time] (since PR logic depends on this)generate_monthly_bounds: function creates monthly boundsget_temporal_weights: function computes weights for averaging source dataset into targeted time periods_experimental_averager: function uses temporal weights to average data into targeted time periods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could generalize this function (kind of like .temporal.group_average()) by having it call different functions to generate target bounds (e.g., generate_daily_bounds, generate_seasonal_bounds, generate_yearly_bounds). The other steps would work as-is.
| return ds.temporal._experimental_averager(data_var, weights, target_bnds) | ||
|
|
||
|
|
||
| def _experimental_averager(self, data_var, weights, target_bnds): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intended to be a generic average averaging data variable information into targeted time periods (using the supplied weights).
| return dsmean | ||
|
|
||
|
|
||
| def get_temporal_weights(self, target_bnds): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function basically gets the intersection between the dataset's own time bounds and the targeted time bounds (i.e., averaging periods). For a given time step, it assigns weight proportional to the duration in which a given timestep is within the a given averaging period.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this PR is ~10x slower than existing functionality. The slowdown is almost entirely in this function. If we could speed this step up, that would be great (but we likely can tolerate this slowdown, since the approach in this PR should be more robust/accurate).
| return weights | ||
|
|
||
|
|
||
| def generate_monthly_bounds(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prototype function for generating target bounds (i.e., what bins do you want to average your source data into). We could make other functions for other frequencies (e.g., daily, seasonal, yearly).
| return monthly_time, monthly_bnds | ||
|
|
||
|
|
||
| def ensure_bounds_order(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function just makes sure the bounds are ordered as expected.
2dba031 to
b62383e
Compare
|
For context, the current logic for generating temporal weights uses time bounds to compute the center time coordinate, which is then used to assign group labels. Grouping is done with Xarray’s @pochedls we could introduce an optional I tried to find if there were other packages that provided this feature, but none do (even native Xarray doesn't). This is a great opportunity to fill a gap for more accurate results. Also, sorry for editing my messages after you get an email of my GitHub comment which looks different. I have a habit of doing this. |
I don't have an immediate opinion on whether this more sophisticated logic should be kicked off with an optional argument. Re: other packages: Yes...but are these other packages purporting to apply temporal weights? I think what we are doing is slightly different (and requires more sophisticated logic to ensure that we do it correctly). Also...I edit/update GitHub all the time after posting... |
I'm not sure if I'm following you here. What I meant was I tried to find if any other package that offered #594 just to see if anybody has tried implementing it so far (and to analyze existing logic), but it doesn't like there are any. Does CDAT have this feature? Great opportunity for xCDAT to provide this capability. |
Other packages may do similar calculations, but typically without bound information or temporal weights (in which case they are probably doing what they advertise). Our temporal operations are explicitly weighted using the bounds to compute the weights. But the current logic is insufficient to do this generically – so we may be providing incorrect calculations in some cases (albeit, probably edge cases). I realize this post does something to xcdat, but it is an example that isn't generalized (not a package). |
Description
This PR has more sophisticated handling of bounds for temporal operations. In particular, it will compare the dataset time bounds with that period that is targeted. For example, if you have a time point with arbitrary bounds of
["2019-12-28 00:00", "2020-01-03 00:00"]and want to create an average value for January 2020 (i.e.,["2020-01-01", "2020-01-01"]) this PR will correctly include 2 days of weight in that January average (since only 2 days of data are in January).Checklist
If applicable: