Conversation
[pull] main from NCAR:main
[pull] main from NCAR:main
| - pandas | ||
| - xarray | ||
| - xskillscore>=0.0.17 | ||
| - uxarray>=2025.12.0 |
There was a problem hiding this comment.
I'd say yes. There were updates to UXarray's where and some wrappers that were previously roadblocks to this function. Perhaps conda will naturally resolve the environment to the latest version, but I'd like to let more time pass before trusting that.
erogluorhan
left a comment
There was a problem hiding this comment.
Great to see this in progress, thanks @jukent ! I've made a few comments below.
geocat/comp/meteorology.py
Outdated
| _generate_wrapper_docstring(dpres_plev, delta_pressure) | ||
|
|
||
|
|
||
| def zonal_mpsi(uxds, lat=list(range(-90, 91, 10))): |
There was a problem hiding this comment.
A couple questions:
- Is the name "zonal_mpsi" what we want to go forward with? Any alternatives discussed? I recognize finsing something fancy to replace the "mpsi" abbreviation wouldn't be easy though.
- Would it make sense to have
latargument default to a tuple of(start, step, end)and handle the list creation in the code rather than the func signature? Also, it would take tuple, array-like, or just a scalar, i.e. something similar to zonal_mean?
There was a problem hiding this comment.
- We haven't discussed alternatives yet. We aren't bound by character limit so
zonal_meridional_stream()could work (is including the word function redundant?) - Absolutely, I'll make that change right away.
geocat/comp/meteorology.py
Outdated
| g = 9.80665 # gravity (m/s^2) | ||
|
|
||
| # Basic input validation | ||
| if not hasattr(uxds, "V") or not hasattr(uxds, "PS"): |
There was a problem hiding this comment.
I am wondering if this is too limiting, i.e. if there was a meridional wind variable in a dataset without the name (or any attribute) "V", but having the standard_name being "northward_wind", w'd be missing it. Would it be a possible case? Similar applies to "PS".
There was a problem hiding this comment.
Fair. I had some checks in but it became very convoluted so I removed them. They are necessary, so I added them back in using 2 new helper functions in geocat.util that I imagine could be useful throughout the package. Unfortunately our dev and test data doesn't seem to be cf compliant for meridional wind or surface pressure. I tried using cf_xarray first.
There was a problem hiding this comment.
Would cf-xarray be of help on this?
geocat/comp/meteorology.py
Outdated
| ) | ||
|
|
||
| # Check if interpolation needs to be done | ||
| if "plev" in uxds["V"].dims: |
There was a problem hiding this comment.
Similar to above comment on "V" and "PS", maybe we need to look for some CF compliant checks that'd cover any case, e.g. which would use standard name, units, etc?
geocat/comp/meteorology.py
Outdated
| ] | ||
| hybridA_names = ['hyam', 'hybrid_A_midpoints'] | ||
| hybridB_names = ['hybm', 'hybrid_B_midpoints'] | ||
| plev_names = ['plev', 'pressure_lev', 'pressure_levels'] |
There was a problem hiding this comment.
This possible names-based scheme would probably cover majority of use cases, but I guess I still have some concerns about some easy misses it would have. Also some of these possible name values might not help as expected.
For instance, 'northward_wind' can mostly show up in dataarray.attrs.standard_name rather than being the name of the variable. If the wind variable is simply named "wind" (don't know if data creators would do that though) or anything other than those in this list and has the "standard_name" attribute being "northward_wind", I think _find_coord() would miss it.
So, there needs to be some kind of CF-compliant check mechanism here that'd look into not only the var names but also attrs like "standard_name", "units", etc.
anissa111
left a comment
There was a problem hiding this comment.
Okay, here's my initial review. Also, I'm aware that this is not exactly the place to bring this up, but I feel the need to lodge a comment that I think that geocat-comp ingesting uxarray objects is risky and possibly not in the best interest of geocat-comp.
That being said here's my comments:
- I still feel like we should make uxarray an optional dependency
pyproject.tomland other packaging should be updated to include the optional/required dependency- uxarray will need to be added to the benchmark environment
- uxarray should also probably be added to upstream testing (as in the upstream test should install uxarray from source in
ci/install-upstream.sh) - the helper functions should be added to the internal api documentation
- the helper functions should probably have their own tests
- uxarray specific test(s) for the functions in interpolation that will have uxarray objects passing through them from
zonal_meridional_psi - somewhere in the docstring (and also maybe in the code) there should be a citation for the calculation we’re preforming
- at least a handful more tests using inputs with different/missing names/descriptions/etc
| return var_name | ||
| error_parts.append(f"Tried units: {units}. ") | ||
|
|
||
| raise KeyError(f"Could not find {description} in dataset. {' '.join(error_parts)}") |
There was a problem hiding this comment.
| raise KeyError(f"Could not find {description} in dataset. {' '.join(error_parts)}") | |
| raise KeyError(f"Could not find {description} in dataset. {''.join(error_parts)}") |
| return name | ||
| error_parts.append(f"Tried names: {possible_names}. ") | ||
|
|
||
| # Finally try units match (less reliable) |
There was a problem hiding this comment.
Using units here still concerns me. I'm not convinced that units are a good way of determining a match.
| f"This is unreliable - multiple variables may share the same units. " | ||
| f"Please verify this is correct and add CF standard_name. {error_parts}", | ||
| UserWarning, | ||
| stacklevel=3, |
There was a problem hiding this comment.
Why stacklevel=3 here? If my understanding is correct, this would make sense for when _find_var is called by _find_optional_var, but not when _find_var is called directly
| assert out.sizes["latitudes"] == len(self.lat) | ||
| assert out.sizes["plev"] == len(uxds.V.plev) | ||
|
|
||
| # ---- numerical sanity ---- |
There was a problem hiding this comment.
| # ---- numerical sanity ---- | |
| # ---- numerical coherence ---- |
Two bits:
- small language best practice note
- I'm not necessarily objecting to having the isinfinite or allclose to 0 asserts, but I don't think I understand why are we doing them? Is there a risk that something will go wrong and we'll get inf/nan/0 values erroneously?
| lat = np.arange(36, 45, 1) | ||
|
|
||
| def test_zonal_meridional_psi_pressure_levels(self) -> None: | ||
| uxds = ux.open_dataset("test/grid_subset.nc", "test/plev_subset.nc") |
There was a problem hiding this comment.
This could be a fixture depending on the size/potential for reuse. Also, reading in the files should probably use the same format as the other file readings in our codebase with something like
try:
uxds = ux.open_dataset("grid_subset.nc", "plev_subset.nc")
except FileNotFoundError:
uxds = ux.open_dataset("test/grid_subset.nc", "test/plev_subset.nc")| return da_mpsi | ||
|
|
||
|
|
||
| def zonal_mpsi( |
There was a problem hiding this comment.
We have an entire utility function for exactly this that generates wrappers and copies the docstrings for ncl-like function names
| Returns | ||
| ------- | ||
| da_mpsi : xarray.DataArray | ||
| Zonal mean meridional streamf unction, scaled by Earth's geometry and gravity. |
There was a problem hiding this comment.
| Zonal mean meridional streamf unction, scaled by Earth's geometry and gravity. | |
| Zonal mean meridional stream function, scaled by Earth's geometry and gravity. |
| # apply scaling factor | ||
| da_scaling_factor = 2 * np.pi * a * np.cos(lats) / g | ||
| da_mpsi = da_mpsi * da_scaling_factor |
There was a problem hiding this comment.
This could be my misunderstanding, but isn't this scaling only valid for certain grid types? Like, specifically only structured ones with equally spaced latitudes?
@kafitzgerald, I think I'm remembering something that you flagged on my nmse PR, though completely possible I'm just not familiar enough with the calculation to know what I'm talking about here.
edit: sorry, coming back to say I get that the weighting is for the lat bands, not the unstructured grid itself, but wouldn't this still assume that the given latitudes are evenly spaced?
| except Exception: | ||
| pass |
There was a problem hiding this comment.
(nitpick) bare except makes me a bit nervous here, is there a reason?
Also, don't we create da_mpsi? Would updating the attributes ever raise an exception even if we didn't?
| if not hyam_coordname: | ||
| hyam_coordname = _find_optional_var( | ||
| uxds, | ||
| long_name='hybrid A coefficient at layer midpoints', | ||
| possible_names=['hyam', 'hya', 'hybrid_A_midpoints'], | ||
| description='coordinate,', | ||
| ) | ||
| if not hybm_coordname: | ||
| hybm_coordname = _find_optional_var( | ||
| uxds, | ||
| long_name='hybrid B coefficient at layer midpoints', | ||
| possible_names=['hybm', 'hyb', 'hybrid_B_midpoints'], | ||
| description='coordinate,', | ||
| ) | ||
| if not plev_coordname: | ||
| plev_coordname = _find_optional_var( | ||
| uxds, | ||
| standard_name='air_pressure', | ||
| possible_names=['plev', 'pressure_lev', 'pressure_levels'], | ||
| description='coordinate,', | ||
| ) |
There was a problem hiding this comment.
| if not hyam_coordname: | |
| hyam_coordname = _find_optional_var( | |
| uxds, | |
| long_name='hybrid A coefficient at layer midpoints', | |
| possible_names=['hyam', 'hya', 'hybrid_A_midpoints'], | |
| description='coordinate,', | |
| ) | |
| if not hybm_coordname: | |
| hybm_coordname = _find_optional_var( | |
| uxds, | |
| long_name='hybrid B coefficient at layer midpoints', | |
| possible_names=['hybm', 'hyb', 'hybrid_B_midpoints'], | |
| description='coordinate,', | |
| ) | |
| if not plev_coordname: | |
| plev_coordname = _find_optional_var( | |
| uxds, | |
| standard_name='air_pressure', | |
| possible_names=['plev', 'pressure_lev', 'pressure_levels'], | |
| description='coordinate,', | |
| ) | |
| if not hyam_coordname: | |
| hyam_coordname = _find_optional_var( | |
| uxds, | |
| long_name='hybrid A coefficient at layer midpoints', | |
| possible_names=['hyam', 'hya', 'hybrid_A_midpoints'], | |
| description='coordinate', | |
| ) | |
| if not hybm_coordname: | |
| hybm_coordname = _find_optional_var( | |
| uxds, | |
| long_name='hybrid B coefficient at layer midpoints', | |
| possible_names=['hybm', 'hyb', 'hybrid_B_midpoints'], | |
| description='coordinate', | |
| ) | |
| if not plev_coordname: | |
| plev_coordname = _find_optional_var( | |
| uxds, | |
| standard_name='air_pressure', | |
| possible_names=['plev', 'pressure_lev', 'pressure_levels'], | |
| description='coordinate', | |
| ) |
Given the way _find_var is written, say hya and hyb both have a description of coordinate and nothing else, wouldn't this mean that whichever one was found first would become the coord name for both?
Also, I don't have a better suggestion, but is just 'coordinate' the best default description to look for for all of these?
| - hyam : Hybrid A coefficients (if meridional_wind is on hybrid sigma-pressure levels) | ||
| - hybm : Hybrid B coefficients (if meridional_wind is on hybrid sigma-pressure levels) | ||
| - uxgrid : Grid information for uxarray | ||
| meridonal_wind_varname : str, optional |
There was a problem hiding this comment.
| meridonal_wind_varname : str, optional | |
| meridional_wind_varname : str, optional |
| # NCL NAME WRAPPER FUNCTIONS BELOW | ||
| def dpres_plev(pressure_lev, surface_pressure, pressure_top=None): | ||
| return delta_pressure(pressure_lev, surface_pressure, pressure_top) | ||
|
|
||
|
|
||
| _generate_wrapper_docstring(dpres_plev, delta_pressure) |
There was a problem hiding this comment.
This should stay at the bottom of the file (and be used to create the wrapper function made in this PR)
| ---------- | ||
| uxds : uxarray.UxDataset | ||
| Input dataset containing the following required fields: | ||
| - meridional_wind : CF Compliant Meridional wind component (on pressure or hybrid sigma-pressure levels) |
There was a problem hiding this comment.
What is the CF complaint name for meridional wind (is there one?)? Is it just northward_wind? That seems slightly strange to me.
I'm not finding something explicitly called meridional wind in the CF standard names, though, so maybe
PR Summary
Zonal_MPSI function for unstructured grid datasets.
Related Tickets & Documents
Closes #774
PR Checklist
General
docs/release-notes.rstin a relevant section for the upcoming release.Functionality
geocat/comp/__init__.pyfileTesting
Documentation
_) and have been added todocs/internal_api/index.rstdocs/user_api/index.rstunder their module