Skip to content

Conversation

b8raoult
Copy link
Collaborator

@b8raoult b8raoult commented Oct 6, 2025

Description

This is a PR to prepare for the observations support. This PR merely move files around, so a lot of files are affected, but there should not be any code change. This prepares for further PRs that may affect the code more. This should be kinder on reviewers.

Naming conventions proposed here:
Move from "create" -> "build"

Introduce two types of datasets to be built:
gridded & tabular

What problem does this change solve?

What issue or task does this change relate to?

Additional notes

As a contributor to the Anemoi framework, please ensure that your changes include unit tests, updates to any affected dependencies and documentation, and have been tested in a parallel setting (i.e., with multiple GPUs). As a reviewer, you are also responsible for verifying these aspects and requesting changes if they are not adequately addressed. For guidelines about those please refer to https://anemoi.readthedocs.io/en/latest/

By opening this pull request, I affirm that all authors agree to the Contributor License Agreement.

@b8raoult b8raoult changed the title Feat/move code to prepare for observations feat: move code to prepare for observations Oct 7, 2025
Copy link
Contributor

@aaron-hopkinson aaron-hopkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new structure is clearer, but there are a few imports that were missed


from anemoi.datasets.create.sources.mars import mars
from anemoi.datasets.create.utils import to_datetime_list
from anemoi.datasets.build.utils import to_datetime_list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB: from anemoi.datasets.build.gridded.utils import to_datetime_list in accumulations.py, but this is also valid (we have duplicated code). We should remove the duplicated code, but let's do that in another PR.


# from .mars import execute as mars
from anemoi.datasets.create.mars import execute as mars
from anemoi.datasets.build.mars import execute as mars
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this in the legacy sources refactor, but this should be
from .mars import mars

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(But I think this source might actually be broken – I need to test this. If it is, we can fix in another PR to keep this one simple)

from anemoi.datasets.build.gridded.result import Result
from anemoi.datasets.build.input.action import Action
from anemoi.datasets.build.input.action import action_factory
from anemoi.datasets.build.input.join import JoinResult
Copy link
Contributor

@aaron-hopkinson aaron-hopkinson Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might have been broken before (so fixing is outside of the scope of this PR), but JoinResult doesn't exist, neither does anemoi.datasets.build.input.join

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #444 to cover this

dataset = dataset._store.path

from ..stores import zarr_lookup
from anemoi.datasets.use.gridded.stores import zarr_lookup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Longer term maybe we move this up a level? (i.e. anemoi.datasets.use.stores.zarr_lookup)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe as part of the main obs dataset development, rather than in this PR

last_window_end = int(end.strftime("%Y%m%d%H%M%S"))

from .legacy_obs_dataset import ObsDataset
from anemoi.datasets.use.gridded.observations.legacy_obs_dataset import ObsDataset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from anemoi.datasets.use.gridded.observations.legacy_obs_dataset import ObsDataset
from anemoi.datasets.use.tabular.observations.legacy_obs_dataset import ObsDataset

from anemoi.datasets.data.misc import as_first_date
from anemoi.datasets.data.misc import as_last_date
from anemoi.datasets.use.gridded.misc import as_first_date
from anemoi.datasets.use.gridded.misc import as_last_date
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, these can probably be moved to anemoi.datasets.use.misc, or just anemoi.datasets.misc, but can be done in another PR


def write_metadata(self, metadata):
from anemoi.datasets.create import json_tidy
from anemoi.datasets.build.gridded import json_tidy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this can be moved to misc in the future?

print(f"Converting dataset to {output} using new backend '{backend}'")

from anemoi.datasets.data.records.backends import writer_backend_factory
from anemoi.datasets.use.gridded.records.backends import writer_backend_factory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from anemoi.datasets.use.gridded.records.backends import writer_backend_factory
from anemoi.datasets.use.tabular.records.backends import writer_backend_factory

Comment on lines +22 to +23
from .source import Source
from .sources import source_registry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from .source import Source
from .sources import source_registry
from ..source import Source
from ..sources import source_registry

If we want to move to relative imports here

from anemoi.datasets.create.check import DatasetName
from anemoi.datasets.build.gridded.check import DatasetName

from . import Command
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitHub won't let me comment on the correct line, but line 93 should be

        from anemoi.datasets.misc.check import check_zarr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Under Review

Development

Successfully merging this pull request may close these issues.

3 participants