Skip to content

feat: add units to datasets#577

Open
b8raoult wants to merge 14 commits intomainfrom
feat/units
Open

feat: add units to datasets#577
b8raoult wants to merge 14 commits intomainfrom
feat/units

Conversation

@b8raoult
Copy link
Copy Markdown
Collaborator

@b8raoult b8raoult commented Mar 14, 2026

Description

Add units to datasets, so we can check them when combining datasets, and in inference.

  • When building a dataset using concat: 💣 Variable cp has multiple units: m and kg m**-2
  • At run time, ValueError: Concat: Incompatible units: cp: m and kg m**-2, tp: m and kg m**-2

in addition to the actual unit, accumulation periods and equivalent are added: m;accumulation(6:00:00) means meters and accumulated over 6h

What problem does this change solve?

What issue or task does this change relate to?

Additional notes

As a contributor to the Anemoi framework, please ensure that your changes include unit tests, updates to any affected dependencies and documentation, and have been tested in a parallel setting (i.e., with multiple GPUs). As a reviewer, you are also responsible for verifying these aspects and requesting changes if they are not adequately addressed. For guidelines about those please refer to https://anemoi.readthedocs.io/en/latest/

By opening this pull request, I affirm that all authors agree to the Contributor License Agreement.

@github-project-automation github-project-automation bot moved this to To be triaged in Anemoi-dev Mar 14, 2026
@github-actions github-actions bot added the enhancement New feature or request label Mar 14, 2026
@b8raoult b8raoult changed the title feat: add units to datasets, so we can check them when combining datasets, and in inference feat: add units to datasets Mar 14, 2026
@b8raoult b8raoult marked this pull request as ready for review March 15, 2026 12:23
Copy link
Copy Markdown
Contributor

@aaron-hopkinson aaron-hopkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the anemoi-transform PR has been merged, we will need a new release of that package and to update the pyproject.toml here.

Code generally looks fine - just a few typos and one missing method in anemoi-transform.

for field in fields:
units = field.metadata("units", default=None)
if units is not None:
assert False, units
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this line be removed? (Or the one below won't execute)

(18, "0-3/3-6/6-9/9-12/12-15/15-18"),
]

case ("e6", "oper", _):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated code change?

def variables_metadata(self) -> dict[str, Any]:
"""Retrieve the metadata for the variables."""
return _fields_metatata(self.variables, self._cube)
return _fields_metatata(self.variables, self._cube, self._past_units)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return _fields_metatata(self.variables, self._cube, self._past_units)
return _fields_metadata(self.variables, self._cube, self._past_units)

dict
The metadata dictionary.
"""
def _fields_metatata(variables: tuple[str, ...], cube: Any, units_seen: dict) -> dict[str, Any]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _fields_metatata(variables: tuple[str, ...], cube: Any, units_seen: dict) -> dict[str, Any]:
def _fields_metadata(variables: tuple[str, ...], cube: Any, units_seen: dict) -> dict[str, Any]:

if isinstance(ds, pd.DataFrame):
raise ValueError(
"Did you forget meant to build a tabular dataset? Did you forget to specify 'format: tabular' in your recipe?"
"Did you forget meant to build a tabular dataset? Did you forget to specify 'layout: tabular' in your recipe?"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Did you forget meant to build a tabular dataset? Did you forget to specify 'layout: tabular' in your recipe?"
"Did you mean to build a tabular dataset? Did you forget to specify 'layout: tabular' in your recipe?"

other[variables[i]]["process"] = process
other[variables[i]]["period"] = (startStep, endStep)

units = Units.to_canonical(f.metadata("units", default=None))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB: anemoi-transform branch Unit class doesn't contain a to_canonical class method (yet?)

@github-project-automation github-project-automation bot moved this from To be triaged to Under Review in Anemoi-dev Mar 18, 2026
@github-actions github-actions bot added dependencies Pull requests that update a dependency file tests labels Mar 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ATS Approval not needed dependencies Pull requests that update a dependency file enhancement New feature or request tests

Projects

Status: Under Review

Development

Successfully merging this pull request may close these issues.

3 participants