Skip to content

Rename filter not consistent with rename-action in open_dataset #554

@mpvginde

Description

@mpvginde

Describe the bug
Let's say I want to build a dataset with 6h- and 3h accumulated precipitation

# 3h accumulations
       - dates:
            start: 2020-01-01 00:00:00
            end: 2021-01-01 00:00:00
            frequency: 3h
          accumulations:
            <<: *mars_request
            time: [0]
            accumulation_period: [0, 3]
            param:
            - tp    # total precipitation
# rename to allow 2 total precipitation fields    
      - rename:
          param: "{param}_3h"
# 6h accumulations
       - dates:
            start: 2020-01-01 00:00:00
            end: 2021-01-01 00:00:00
            frequency: 3h
          accumulations:
            <<: *mars_request
            time: [0]
            accumulation_period: [0, 6]
            param:
            - tp    # total precipitation
# rename to allow 2 total precipitation fields
      - rename:
          param: "{param}_6h"

The rename filter in the recipe updates the param key in the metadata, which I guess is then used as the variable name in the dataset.

Image

As far as I know there is currently no way of renaming a single variable (during dataset creation) without also changing the param metadata.

When changing the variable name structure of all variables with remapping, this is possible

output:
  remapping:
    param_level: "{param}_{levelist}"

I use the above snippet to get rid of the _2 or _10 in the name of surface levels fields like 2t_2 or 10v_10.
Here only the variable name is changed.

Now I want to combine this dataset with another dataset using the cutout functionality

Image Image Image

Now the combined dataset has variable name tp with param metadata tp_6h.
This (currently) trips up the scalers during training.

** Version number **
building of cerra
anemoi-datasets branch abstracting-accumulation (8fb0a16)

opening datasets with cutout:
anemoi-datasets: current main (c26a9d8)
anemoi-transform: current main (ecmwf/anemoi-transform@548e2fa)

Additional context
I see two possible solutions:

  1. Add functionality to specify the name of a variable during building without changing the param metadata
  2. Let the rename action in open_dataset also rename the param metadata, but this might lead to other problems with pressure level fields as these typically have name: t_600, param: t

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    To be triaged

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions