Skip to content

Conversation

floriankrb
Copy link
Member

@floriankrb floriankrb commented May 28, 2025

Description

Main addition: Implement flexible window API when reading an observation dataset : window='(-6h,3h]’
See context here: #342

Some additional minor changes :

The code doing open_dataset is modified to allow opening the new observations datasets, with ‘.vz’ extension.

A field dataset has no group, we introduce “set_group”, adding a group name to a field dataset.

The padding feature allow extending datasets of different time spans (such as observations of different instruments) to merge them into one unique dataset. It is adapted in this PR, fixing corner cases potential bugs.

Several backends have been implemented, mostly to make sure the architecture is flexible enough to test and benchmark other implementations and eventually choose the right one.

Presentation1.pdf

As a contributor to the Anemoi framework, please ensure that your changes include unit tests, updates to any affected dependencies and documentation, and have been tested in a parallel setting (i.e., with multiple GPUs). As a reviewer, you are also responsible for verifying these aspects and requesting changes if they are not adequately addressed. For guidelines about those please refer to https://anemoi.readthedocs.io/en/latest/

@floriankrb floriankrb force-pushed the feature/missing-features-for-observations branch from 2752749 to eeeedcd Compare June 4, 2025 14:42
@OpheliaMiralles OpheliaMiralles mentioned this pull request Jun 13, 2025
@FussyDuck
Copy link

FussyDuck commented Jun 17, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ floriankrb
✅ JPXKQX
❌ pinnstorm
You have signed the CLA already but the status is still pending? Let us recheck it.

@aaron-hopkinson aaron-hopkinson marked this pull request as ready for review August 14, 2025 09:48
Copy link
Contributor

@aaron-hopkinson aaron-hopkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super familiar with this code, so it might be a slightly "superficial" review, but hopefully some of the comments/questions are still useful


class ObservationsSource:
def __call__(self, window):
raise NotImplementedError("This method should be implemented by subclasses")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make an @abstractmethod?

class ObservationsFilter:
def __call__(self, df):
"""Filter the data based on the given window."""
check_dataframe(df)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call self._check(df)?


def _check(self, df):
check_dataframe(df)
return df
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and above, should we return df? (I don't mind – just wonder if it's surprising)

return self.mutate()

name = kwargs.pop("name", None)
name = kwargs.pop("set_group", None) # TODO(Florian)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO what? 😄

def check_dataframe(df):
"""Check the DataFrame for consistency."""
if df.empty:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if df.empty I guess you don't want to do the checks below? If so, should this be return?


dates = [datetime.datetime(2025, 1, 1, 0, 0) + datetime.timedelta(hours=i * 8) for i in range(3)]

source = MarsSource(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this code go into a pytest test?

log = logging.getLogger(__name__)


class DummpySource(ObservationsSource):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo? Should it be DummySource?

self.dataset = dataset

# in this class anything with 1 refers to the original window/dataset
# and anything with 2 refers to the new window/dataset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_orig _new is more characters but might be clearer

)
self._dates = dates

before_span1 = self._window1.start / self.frequency
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could probably halve the number of lines below

from anemoi.datasets.data.records import window_from_str


class DummpySource(ObservationsSource):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo? Should be DummySource?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplication of sources in tests/create/test_observations_mars.py. Maybe have one copy and use in both sets of tests?

@github-project-automation github-project-automation bot moved this from Now In Progress to Under Review in Anemoi-dev Aug 15, 2025
@aaron-hopkinson
Copy link
Contributor

It might also be useful to change the description to highlight what missing features are being added here

@floriankrb floriankrb force-pushed the feature/missing-features-for-observations branch from 5346640 to 8b6b765 Compare September 26, 2025 09:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Under Review
Development

Successfully merging this pull request may close these issues.

7 participants