-
Notifications
You must be signed in to change notification settings - Fork 45
feat: track origins of variable from dataset creation to inference #437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…o feat/recipe-generator
…o feat/recipe-generator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are quite a few unrelated changes in this PR. It would be good to isolate this to just the changes required for origin tracking in order to make it easier to review.
Some of the new classes, particularly the Origin
related ones and the Projection
related ones could do with some unit tests to help make clear how they're supposed to work. At the moment, the code is a bit hard to follow without the additional context. Informative type hints might also help too.
"anemoi-utils[provenance]>=0.4.32", | ||
"cfunits", | ||
"glom", | ||
"jsonschema", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this being removed?
|
||
return GroupOfDates(sorted(set(group_of_dates) & set(filtering_dates)), group_of_dates.provider) | ||
|
||
def origin(self, data: Any, action: Any, action_arguments: Any) -> Any: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to have type hints, can they be more specific? e.g. action
must be an object supporting the origin
method.
previous = fs.metadata("anemoi_origin", default=None) | ||
fall_through = fs.metadata("anemoi_fall_through", default=False) | ||
if fall_through: | ||
# The field has pass unchanges in a filter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# The field has pass unchanges in a filter | |
# The field has passed unchanged through a filter |
return _data_request(self.datasource) | ||
|
||
@property | ||
def origins(self) -> dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def origins(self) -> dict[str, Any]: | |
def origins(self) -> dict[str, list]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Unless self._origins
eventually becomes a different object)
@property | ||
def origins(self) -> dict[str, Any]: | ||
"""Returns a dictionary with the parameters needed to retrieve the data.""" | ||
return {"version": 1, "origins": self._origins} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we anticipate that this will need to be a versioned object of some kind? Do we really need it, or will it just make things more complicated?
def from_slices(cls, slices): | ||
return Projection(slices) | ||
|
||
@classmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both this and the method above don't use cls
. Would they be better as staticmethods
, or do you envisage these depending on the class in the future?
------- | ||
Iterator[datetime.datetime] | ||
An iterator of datetime objects. | ||
Parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes seem unrelated to the origins work – move to a separate PR?
Returns: | ||
Union[str, List[str]]: The shortened list of dates. | ||
Parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above - unrelated to this work
# TODO: Use the one from anemoi.utils.grids instead | ||
# from anemoi.utils.grids import ... | ||
from scipy.spatial import cKDTree | ||
from scipy.spatial import KDTree |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to origins tracking
@zarr_tests | ||
@not_ready | ||
def test_class_missing_dates_fill(): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of empty tests here - maybe better to create an issue for them rather than having them in here where they'll be forgotten about
Description
This PR will allow tracking of the origins of variables, previously only the mars request was clearly stored:
The outcome is available calling
ds.metadata()
Example:
Will return for
tp
What problem does this change solve?
What issue or task does this change relate to?
Additional notes
As a contributor to the Anemoi framework, please ensure that your changes include unit tests, updates to any affected dependencies and documentation, and have been tested in a parallel setting (i.e., with multiple GPUs). As a reviewer, you are also responsible for verifying these aspects and requesting changes if they are not adequately addressed. For guidelines about those please refer to https://anemoi.readthedocs.io/en/latest/
By opening this pull request, I affirm that all authors agree to the Contributor License Agreement.
📚 Documentation preview 📚: https://anemoi-datasets--437.org.readthedocs.build/en/437/