-
Notifications
You must be signed in to change notification settings - Fork 45
feat: missing features for observations #344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
2752749
to
eeeedcd
Compare
for more information, see https://pre-commit.ci
…ompatible with dataset created previously
for more information, see https://pre-commit.ci
|
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…ithub.com/ecmwf/anemoi-datasets into feature/missing-features-for-observations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not super familiar with this code, so it might be a slightly "superficial" review, but hopefully some of the comments/questions are still useful
|
||
class ObservationsSource: | ||
def __call__(self, window): | ||
raise NotImplementedError("This method should be implemented by subclasses") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make an @abstractmethod
?
class ObservationsFilter: | ||
def __call__(self, df): | ||
"""Filter the data based on the given window.""" | ||
check_dataframe(df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call self._check(df)
?
|
||
def _check(self, df): | ||
check_dataframe(df) | ||
return df |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and above, should we return df
? (I don't mind – just wonder if it's surprising)
src/anemoi/datasets/data/dataset.py
Outdated
return self.mutate() | ||
|
||
name = kwargs.pop("name", None) | ||
name = kwargs.pop("set_group", None) # TODO(Florian) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO what? 😄
def check_dataframe(df): | ||
"""Check the DataFrame for consistency.""" | ||
if df.empty: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if df.empty
I guess you don't want to do the checks below? If so, should this be return
?
|
||
dates = [datetime.datetime(2025, 1, 1, 0, 0) + datetime.timedelta(hours=i * 8) for i in range(3)] | ||
|
||
source = MarsSource( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this code go into a pytest test?
log = logging.getLogger(__name__) | ||
|
||
|
||
class DummpySource(ObservationsSource): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo? Should it be DummySource
?
self.dataset = dataset | ||
|
||
# in this class anything with 1 refers to the original window/dataset | ||
# and anything with 2 refers to the new window/dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_orig
_new
is more characters but might be clearer
) | ||
self._dates = dates | ||
|
||
before_span1 = self._window1.start / self.frequency |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could probably halve the number of lines below
from anemoi.datasets.data.records import window_from_str | ||
|
||
|
||
class DummpySource(ObservationsSource): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo? Should be DummySource
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplication of sources in tests/create/test_observations_mars.py
. Maybe have one copy and use in both sets of tests?
It might also be useful to change the description to highlight what missing features are being added here |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
5346640
to
8b6b765
Compare
Description
Main addition: Implement flexible window API when reading an observation dataset : window='(-6h,3h]’
See context here: #342
Some additional minor changes :
The code doing open_dataset is modified to allow opening the new observations datasets, with ‘.vz’ extension.
A field dataset has no group, we introduce “set_group”, adding a group name to a field dataset.
The padding feature allow extending datasets of different time spans (such as observations of different instruments) to merge them into one unique dataset. It is adapted in this PR, fixing corner cases potential bugs.
Several backends have been implemented, mostly to make sure the architecture is flexible enough to test and benchmark other implementations and eventually choose the right one.
Presentation1.pdf
As a contributor to the Anemoi framework, please ensure that your changes include unit tests, updates to any affected dependencies and documentation, and have been tested in a parallel setting (i.e., with multiple GPUs). As a reviewer, you are also responsible for verifying these aspects and requesting changes if they are not adequately addressed. For guidelines about those please refer to https://anemoi.readthedocs.io/en/latest/