Skip to content

Zonal mpsi function#773

Open
jukent wants to merge 32 commits intoNCAR:mainfrom
jukent:zonal_mpsi
Open

Zonal mpsi function#773
jukent wants to merge 32 commits intoNCAR:mainfrom
jukent:zonal_mpsi

Conversation

@jukent
Copy link
Contributor

@jukent jukent commented Oct 9, 2025

PR Summary

Zonal_MPSI function for unstructured grid datasets.

Related Tickets & Documents

Closes #774

PR Checklist

General

  • PR includes a summary of changes
  • Link relevant issues, make one if none exist
  • Add a brief summary of changes to docs/release-notes.rst in a relevant section for the upcoming release.
  • Add appropriate labels to this PR
  • PR follows the Contributor's Guide

Functionality

  • New function(s) intended for public API added to geocat/comp/__init__.py file

Testing

  • Update or create tests in appropriate test file

Documentation

  • Docstrings have been created and/or updated in accordance with Documentation Standards.
  • Internal functions have a preceding underscore (_) and have been added to docs/internal_api/index.rst
  • User facing functions have been added to docs/user_api/index.rst under their module

@jukent jukent added dependencies Pull requests that update a dependency file feature a new feature that is going to be developed labels Oct 9, 2025
@jukent jukent marked this pull request as ready for review February 12, 2026 17:03
@jukent jukent requested review from anissa111, cyschneck and kafitzgerald and removed request for anissa111 and cyschneck February 12, 2026 17:03
@jukent jukent requested a review from cyschneck February 12, 2026 17:03
- pandas
- xarray
- xskillscore>=0.0.17
- uxarray>=2025.12.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say yes. There were updates to UXarray's where and some wrappers that were previously roadblocks to this function. Perhaps conda will naturally resolve the environment to the latest version, but I'd like to let more time pass before trusting that.

Copy link
Collaborator

@erogluorhan erogluorhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see this in progress, thanks @jukent ! I've made a few comments below.

_generate_wrapper_docstring(dpres_plev, delta_pressure)


def zonal_mpsi(uxds, lat=list(range(-90, 91, 10))):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple questions:

  • Is the name "zonal_mpsi" what we want to go forward with? Any alternatives discussed? I recognize finsing something fancy to replace the "mpsi" abbreviation wouldn't be easy though.
  • Would it make sense to have lat argument default to a tuple of (start, step, end) and handle the list creation in the code rather than the func signature? Also, it would take tuple, array-like, or just a scalar, i.e. something similar to zonal_mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We haven't discussed alternatives yet. We aren't bound by character limit so zonal_meridional_stream() could work (is including the word function redundant?)
  2. Absolutely, I'll make that change right away.

g = 9.80665 # gravity (m/s^2)

# Basic input validation
if not hasattr(uxds, "V") or not hasattr(uxds, "PS"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if this is too limiting, i.e. if there was a meridional wind variable in a dataset without the name (or any attribute) "V", but having the standard_name being "northward_wind", w'd be missing it. Would it be a possible case? Similar applies to "PS".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair. I had some checks in but it became very convoluted so I removed them. They are necessary, so I added them back in using 2 new helper functions in geocat.util that I imagine could be useful throughout the package. Unfortunately our dev and test data doesn't seem to be cf compliant for meridional wind or surface pressure. I tried using cf_xarray first.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would cf-xarray be of help on this?

)

# Check if interpolation needs to be done
if "plev" in uxds["V"].dims:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above comment on "V" and "PS", maybe we need to look for some CF compliant checks that'd cover any case, e.g. which would use standard name, units, etc?

]
hybridA_names = ['hyam', 'hybrid_A_midpoints']
hybridB_names = ['hybm', 'hybrid_B_midpoints']
plev_names = ['plev', 'pressure_lev', 'pressure_levels']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This possible names-based scheme would probably cover majority of use cases, but I guess I still have some concerns about some easy misses it would have. Also some of these possible name values might not help as expected.

For instance, 'northward_wind' can mostly show up in dataarray.attrs.standard_name rather than being the name of the variable. If the wind variable is simply named "wind" (don't know if data creators would do that though) or anything other than those in this list and has the "standard_name" attribute being "northward_wind", I think _find_coord() would miss it.

So, there needs to be some kind of CF-compliant check mechanism here that'd look into not only the var names but also attrs like "standard_name", "units", etc.

@anissa111 anissa111 added run-benchmark Add tag to a PR to run ASV comparison on new commits and removed run-benchmark Add tag to a PR to run ASV comparison on new commits labels Feb 13, 2026
Copy link
Member

@anissa111 anissa111 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, here's my initial review. Also, I'm aware that this is not exactly the place to bring this up, but I feel the need to lodge a comment that I think that geocat-comp ingesting uxarray objects is risky and possibly not in the best interest of geocat-comp.

That being said here's my comments:

  • I still feel like we should make uxarray an optional dependency
  • pyproject.toml and other packaging should be updated to include the optional/required dependency
  • uxarray will need to be added to the benchmark environment
  • uxarray should also probably be added to upstream testing (as in the upstream test should install uxarray from source in ci/install-upstream.sh)
  • the helper functions should be added to the internal api documentation
  • the helper functions should probably have their own tests
  • uxarray specific test(s) for the functions in interpolation that will have uxarray objects passing through them from zonal_meridional_psi
  • somewhere in the docstring (and also maybe in the code) there should be a citation for the calculation we’re preforming
  • at least a handful more tests using inputs with different/missing names/descriptions/etc

return var_name
error_parts.append(f"Tried units: {units}. ")

raise KeyError(f"Could not find {description} in dataset. {' '.join(error_parts)}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise KeyError(f"Could not find {description} in dataset. {' '.join(error_parts)}")
raise KeyError(f"Could not find {description} in dataset. {''.join(error_parts)}")

return name
error_parts.append(f"Tried names: {possible_names}. ")

# Finally try units match (less reliable)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using units here still concerns me. I'm not convinced that units are a good way of determining a match.

f"This is unreliable - multiple variables may share the same units. "
f"Please verify this is correct and add CF standard_name. {error_parts}",
UserWarning,
stacklevel=3,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why stacklevel=3 here? If my understanding is correct, this would make sense for when _find_var is called by _find_optional_var, but not when _find_var is called directly

assert out.sizes["latitudes"] == len(self.lat)
assert out.sizes["plev"] == len(uxds.V.plev)

# ---- numerical sanity ----
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# ---- numerical sanity ----
# ---- numerical coherence ----

Two bits:

  • small language best practice note
  • I'm not necessarily objecting to having the isinfinite or allclose to 0 asserts, but I don't think I understand why are we doing them? Is there a risk that something will go wrong and we'll get inf/nan/0 values erroneously?

lat = np.arange(36, 45, 1)

def test_zonal_meridional_psi_pressure_levels(self) -> None:
uxds = ux.open_dataset("test/grid_subset.nc", "test/plev_subset.nc")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a fixture depending on the size/potential for reuse. Also, reading in the files should probably use the same format as the other file readings in our codebase with something like

try:
    uxds = ux.open_dataset("grid_subset.nc", "plev_subset.nc")
except FileNotFoundError:
    uxds = ux.open_dataset("test/grid_subset.nc", "test/plev_subset.nc")

return da_mpsi


def zonal_mpsi(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have an entire utility function for exactly this that generates wrappers and copies the docstrings for ncl-like function names

Returns
-------
da_mpsi : xarray.DataArray
Zonal mean meridional streamf unction, scaled by Earth's geometry and gravity.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Zonal mean meridional streamf unction, scaled by Earth's geometry and gravity.
Zonal mean meridional stream function, scaled by Earth's geometry and gravity.

Comment on lines +1490 to +1492
# apply scaling factor
da_scaling_factor = 2 * np.pi * a * np.cos(lats) / g
da_mpsi = da_mpsi * da_scaling_factor
Copy link
Member

@anissa111 anissa111 Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be my misunderstanding, but isn't this scaling only valid for certain grid types? Like, specifically only structured ones with equally spaced latitudes?

@kafitzgerald, I think I'm remembering something that you flagged on my nmse PR, though completely possible I'm just not familiar enough with the calculation to know what I'm talking about here.

edit: sorry, coming back to say I get that the weighting is for the lat bands, not the unstructured grid itself, but wouldn't this still assume that the given latitudes are evenly spaced?

Comment on lines +1503 to +1504
except Exception:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nitpick) bare except makes me a bit nervous here, is there a reason?

Also, don't we create da_mpsi? Would updating the attributes ever raise an exception even if we didn't?

Comment on lines +1427 to +1447
if not hyam_coordname:
hyam_coordname = _find_optional_var(
uxds,
long_name='hybrid A coefficient at layer midpoints',
possible_names=['hyam', 'hya', 'hybrid_A_midpoints'],
description='coordinate,',
)
if not hybm_coordname:
hybm_coordname = _find_optional_var(
uxds,
long_name='hybrid B coefficient at layer midpoints',
possible_names=['hybm', 'hyb', 'hybrid_B_midpoints'],
description='coordinate,',
)
if not plev_coordname:
plev_coordname = _find_optional_var(
uxds,
standard_name='air_pressure',
possible_names=['plev', 'pressure_lev', 'pressure_levels'],
description='coordinate,',
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if not hyam_coordname:
hyam_coordname = _find_optional_var(
uxds,
long_name='hybrid A coefficient at layer midpoints',
possible_names=['hyam', 'hya', 'hybrid_A_midpoints'],
description='coordinate,',
)
if not hybm_coordname:
hybm_coordname = _find_optional_var(
uxds,
long_name='hybrid B coefficient at layer midpoints',
possible_names=['hybm', 'hyb', 'hybrid_B_midpoints'],
description='coordinate,',
)
if not plev_coordname:
plev_coordname = _find_optional_var(
uxds,
standard_name='air_pressure',
possible_names=['plev', 'pressure_lev', 'pressure_levels'],
description='coordinate,',
)
if not hyam_coordname:
hyam_coordname = _find_optional_var(
uxds,
long_name='hybrid A coefficient at layer midpoints',
possible_names=['hyam', 'hya', 'hybrid_A_midpoints'],
description='coordinate',
)
if not hybm_coordname:
hybm_coordname = _find_optional_var(
uxds,
long_name='hybrid B coefficient at layer midpoints',
possible_names=['hybm', 'hyb', 'hybrid_B_midpoints'],
description='coordinate',
)
if not plev_coordname:
plev_coordname = _find_optional_var(
uxds,
standard_name='air_pressure',
possible_names=['plev', 'pressure_lev', 'pressure_levels'],
description='coordinate',
)

Given the way _find_var is written, say hya and hyb both have a description of coordinate and nothing else, wouldn't this mean that whichever one was found first would become the coord name for both?

Also, I don't have a better suggestion, but is just 'coordinate' the best default description to look for for all of these?

- hyam : Hybrid A coefficients (if meridional_wind is on hybrid sigma-pressure levels)
- hybm : Hybrid B coefficients (if meridional_wind is on hybrid sigma-pressure levels)
- uxgrid : Grid information for uxarray
meridonal_wind_varname : str, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
meridonal_wind_varname : str, optional
meridional_wind_varname : str, optional

Comment on lines 1340 to 1345
# NCL NAME WRAPPER FUNCTIONS BELOW
def dpres_plev(pressure_lev, surface_pressure, pressure_top=None):
return delta_pressure(pressure_lev, surface_pressure, pressure_top)


_generate_wrapper_docstring(dpres_plev, delta_pressure)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should stay at the bottom of the file (and be used to create the wrapper function made in this PR)

----------
uxds : uxarray.UxDataset
Input dataset containing the following required fields:
- meridional_wind : CF Compliant Meridional wind component (on pressure or hybrid sigma-pressure levels)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the CF complaint name for meridional wind (is there one?)? Is it just northward_wind? That seems slightly strange to me.

I'm not finding something explicitly called meridional wind in the CF standard names, though, so maybe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file feature a new feature that is going to be developed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Zonal_MPSI

3 participants