Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
98 commits
Select commit Hold shift + click to select a range
9cb8c52
feat: recipe generator
b8raoult May 8, 2025
01774dd
update
b8raoult May 8, 2025
33793a6
update
b8raoult May 8, 2025
920d523
update
b8raoult May 8, 2025
ed5d190
update
b8raoult May 8, 2025
2562baf
fix: better handling of xarray metadata
b8raoult May 10, 2025
f048a4b
update
b8raoult May 10, 2025
8ad5eb0
update
b8raoult May 10, 2025
5db7a97
Merge branch 'fix/better-handling-of-xarray-metadata' into feat/recip…
b8raoult May 10, 2025
b33f3ac
fix: support other keys that param in rename filter
b8raoult May 10, 2025
71d8180
Merge branch 'fix/support-other-keys-than-param-in-rename-filter' int…
b8raoult May 10, 2025
6d23027
typo
b8raoult May 10, 2025
6933574
Merge branch 'fix/support-other-keys-than-param-in-rename-filter' int…
b8raoult May 10, 2025
9179dae
add command line
b8raoult May 10, 2025
79a391b
update
b8raoult May 11, 2025
203e09b
update
b8raoult May 11, 2025
b4433bd
update
b8raoult May 11, 2025
6f3fdb0
update
b8raoult May 11, 2025
45365a1
upadte
b8raoult May 12, 2025
d7cc82c
update
b8raoult May 12, 2025
f381b00
update
b8raoult May 12, 2025
da93ad6
update
b8raoult May 14, 2025
3082edf
refactor missing
b8raoult Jul 9, 2025
ef4a5c9
add references
b8raoult Jul 9, 2025
93410d5
refactor
b8raoult Jul 9, 2025
1df0ef7
refactor
b8raoult Jul 9, 2025
18df4eb
refactor
b8raoult Jul 9, 2025
3341d4c
update
b8raoult Jul 9, 2025
58dc8a2
work on migrate
b8raoult Jul 10, 2025
3e180f9
work on migrate
b8raoult Jul 10, 2025
255c22d
merge
b8raoult Aug 11, 2025
83936f7
merge
b8raoult Aug 11, 2025
5209f26
update
b8raoult Aug 11, 2025
c7a0e5d
update
b8raoult Aug 12, 2025
b78a098
update
b8raoult Aug 12, 2025
d641ea7
update
b8raoult Aug 13, 2025
3754eb2
update
b8raoult Aug 14, 2025
39ebc13
update
b8raoult Aug 14, 2025
5d32745
update
b8raoult Aug 14, 2025
6c1f146
update
b8raoult Aug 14, 2025
37de369
update
b8raoult Aug 14, 2025
24f2c2a
update
b8raoult Aug 15, 2025
db4d895
add dumper
b8raoult Aug 15, 2025
55f740d
update
b8raoult Aug 15, 2025
014dbbc
update
b8raoult Aug 15, 2025
8ad9396
bug fix in path
b8raoult Aug 15, 2025
9756618
join recipe command
b8raoult Aug 15, 2025
a493a96
join recipe command
b8raoult Aug 15, 2025
1cde9f8
use ampersand
b8raoult Aug 15, 2025
cdb1a9a
use ampersand
b8raoult Aug 15, 2025
e69eb10
add settings
b8raoult Aug 16, 2025
92165b4
add settings
b8raoult Aug 16, 2025
a044e14
udpate
b8raoult Aug 16, 2025
cb9c576
use ruamel
b8raoult Aug 16, 2025
99a5fb7
fix source as parameters
b8raoult Aug 16, 2025
ce027f4
update
b8raoult Aug 18, 2025
3bf7c35
Merge branch 'feat/refactor-create' of github.com:ecmwf/anemoi-datase…
b8raoult Aug 22, 2025
96dfe3d
Merge branch 'feat/recipe-generator' into feat/refactor-create
b8raoult Aug 22, 2025
f68a11e
Merge remote-tracking branch 'origin/main' into feat/refactor-create
b8raoult Aug 22, 2025
3d5f0ef
tidy
b8raoult Aug 22, 2025
70272f6
update
b8raoult Aug 22, 2025
38ced18
update tests
b8raoult Aug 22, 2025
cb3847e
fix tests
b8raoult Aug 22, 2025
00477c9
add missing package
b8raoult Aug 25, 2025
b0508a9
update
b8raoult Aug 25, 2025
7d494b9
update
b8raoult Aug 26, 2025
dd62e77
fix icon grid test
b8raoult Aug 29, 2025
21208ad
review origins
b8raoult Aug 31, 2025
53f915c
work on components
b8raoult Sep 1, 2025
b0348bd
work on components
b8raoult Sep 1, 2025
28d6ffa
work on components
b8raoult Sep 1, 2025
1a6a3e4
add projections
b8raoult Sep 2, 2025
7b332b5
add projection
b8raoult Sep 3, 2025
c40025a
tidy code
b8raoult Sep 3, 2025
81e355b
tidy
b8raoult Sep 3, 2025
06850d8
add transformations
b8raoult Sep 3, 2025
e09ed7e
rename variables
b8raoult Sep 6, 2025
cd06c98
add origins test
b8raoult Sep 6, 2025
f9fd3a0
tidy
b8raoult Sep 8, 2025
c9259c6
fix skipped origins
b8raoult Sep 9, 2025
8df9cc5
work on origin
b8raoult Sep 10, 2025
4c06588
python recipes
b8raoult Sep 13, 2025
a917c49
add doc
b8raoult Sep 13, 2025
39bedac
add origins to metadata
b8raoult Sep 14, 2025
56ac3ca
Merge branch 'feat/origin' of github.com:ecmwf/anemoi-datasets into f…
b8raoult Sep 14, 2025
9309cfe
update
b8raoult Sep 14, 2025
437e4aa
docs
b8raoult Sep 14, 2025
79f1706
add filter.rst
b8raoult Sep 16, 2025
ef431f8
add docs
b8raoult Sep 21, 2025
48c9d07
compress origins
b8raoult Sep 25, 2025
caf5b90
Merge remote-tracking branch 'origin/main' into feat/origin
b8raoult Oct 7, 2025
9a7a8b7
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 7, 2025
57901c5
tidy
b8raoult Oct 7, 2025
9343e66
tidy
b8raoult Oct 7, 2025
095d57d
tidy
b8raoult Oct 7, 2025
5148b61
tidy
b8raoult Oct 7, 2025
0cd3bf6
add origins to metadata
b8raoult Oct 7, 2025
ae1bb2f
remove exception
b8raoult Oct 8, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/cli/grib-index.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. _grib-index_command:

Grib-index Command
============
==================

The `grib-index` command is used to create an index file for GRIB files. The index file is then used
by the `grib-index` :ref:`source <grib-index_source>`.
Expand Down
14 changes: 14 additions & 0 deletions docs/datasets/building/filters.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
.. _filters:

#########
Filters
#########

.. warning::

This is still a work-in-progress. Some of the filters may be renamed
later.

Filters are used to modify the data or metadata in a dataset.

See :ref:`install <anemoi-transform:filters>` for more information.
2 changes: 1 addition & 1 deletion docs/datasets/building/sources/repeated-dates.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ dates of the dataset.

The general format of the `repeated-dates` source is:

.. literalinclude:: yaml/repeated_dates1.yaml
.. literalinclude:: yaml/repeated-dates1.yaml
:language: yaml

where ``source`` is any of the :ref:`operations <operations>` or
Expand Down
1 change: 0 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,6 @@ dependencies = [
"anemoi-utils[provenance]>=0.4.32",
"cfunits",
"glom",
"jsonschema",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this being removed?

"numcodecs<0.16", # Until we move to zarr3
"numpy",
"pyyaml",
Expand Down
1 change: 1 addition & 0 deletions src/anemoi/datasets/create/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -743,6 +743,7 @@ def _run(self) -> int:
metadata["end_date"] = dates[-1].isoformat()
metadata["frequency"] = frequency
metadata["missing_dates"] = [_.isoformat() for _ in missing]
metadata["origins"] = self.minimal_input.origins

metadata["version"] = VERSION

Expand Down
48 changes: 34 additions & 14 deletions src/anemoi/datasets/create/input/action.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,20 +8,15 @@
# nor does it submit to any jurisdiction.

import logging
from abc import ABC
from abc import abstractmethod

from anemoi.datasets.dates import DatesProvider

LOG = logging.getLogger(__name__)


class Action:
"""An "Action" represents a single operation described in the yaml configuration, e.g. a source, a filter,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this docstring was actually quite useful when I first went through the code. Unless there's a good reason, I think we should keep it (even if we don't want to expose it in the docs)

pipe, join, etc.

See :ref:`operations` for more details.

"""

class Action(ABC):
def __init__(self, config, *path):
self.config = config
self.path = path
Expand All @@ -30,6 +25,13 @@ def __init__(self, config, *path):
"data_sources",
), f"{self.__class__.__name__}: path must start with 'input' or 'data_sources': {path}"

@abstractmethod
def __call__(self, context, argument):
pass

def __repr__(self):
return f"{self.__class__.__name__}({'.'.join(str(x) for x in self.path)}, {self.config})"


class Concat(Action):
"""The Concat contruct is used to concat different actions that are responsible
Expand Down Expand Up @@ -65,6 +67,7 @@ def __init__(self, config, *path):

for i, item in enumerate(config):

assert "dates" in item, f"Value must contain the key 'dates' {item}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to tracking origin work?

dates = item["dates"]
filtering_dates = DatesProvider.from_config(**dates)
action = action_factory({k: v for k, v in item.items() if k != "dates"}, *self.path, str(i))
Expand Down Expand Up @@ -186,7 +189,13 @@ def create_object(self, context, config):
return create_datasets_source(context, config)

def call_object(self, context, source, argument):
return source.execute(context.source_argument(argument))
result = source.execute(context.source_argument(argument))
return context.origin(result, self, argument)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the parameter names, the method call looks correct - but I have no idea what call_object is actually supposed to do. Before it looked like it executed something and returned the result, which made sense – I don't know why it's now okay for it to return a context's origin...?


def origin(self):
from .origin import Source

return Source(self.path[-1], self.config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's better to have self.path[-1] as the action's name attribute?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, looking in the origin module, it's not clear to me why Join and Pipe are distinct types of Origin, whereas this is using Source. Is there a reason they can't all be Sources, or should this have a specialised Concat type of Origin.

In other words, I think the link between Actions and corresponding Origins is little unclear at the moment



class TransformSourceMixin:
Expand All @@ -197,6 +206,15 @@ def create_object(self, context, config):

return create_transform_source(context, config)

def combine_origins(self, current, previous):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both the transform mixins have combine_origins, is there a reason we don't need it for the DatasetSourceMixin? Is this because they don't have a "previous" source feeding them?

assert previous is None, f"Cannot combine origins, previous already exists: {previous}"
return current

def origin(self):
from .origin import Source

return Source(self.path[-1], self.config)


class TransformFilterMixin:
"""Mixin class for filters defined in anemoi-transform"""
Expand All @@ -207,14 +225,16 @@ def create_object(self, context, config):
return create_transform_filter(context, config)

def call_object(self, context, filter, argument):
return filter.forward(context.filter_argument(argument))
result = filter.forward(context.filter_argument(argument))
return context.origin(result, self, argument)

def origin(self):
from .origin import Filter

class FilterFunction(Function):
"""Action to call a filter on the argument (e.g. rename, regrid, etc.)."""
return Filter(self.path[-1], self.config)

def __call__(self, context, argument):
return self.call(context, argument, context.filter_argument)
def combine_origins(self, current, previous):
return {"_apply": current, **(previous or {})}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand this one



def _make_name(name, what):
Expand Down
2 changes: 1 addition & 1 deletion src/anemoi/datasets/create/input/context/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ def register(self, data: Any, path: list[str]) -> Any:

assert path[0] in ("input", "data_sources"), path

LOG.info(f"Registering data at path: {path}")
LOG.info(f"Registering data at path: {'.'.join(str(x) for x in path)}")
self.results[tuple(path)] = data
return data

Expand Down
19 changes: 19 additions & 0 deletions src/anemoi/datasets/create/input/context/field.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@

from typing import Any

from anemoi.transform.fields import new_field_with_metadata
from anemoi.transform.fields import new_fieldlist_from_list
from earthkit.data.core.order import build_remapping

from ..result.field import FieldResult
Expand Down Expand Up @@ -52,3 +54,20 @@ def matching_dates(self, filtering_dates, group_of_dates: Any) -> Any:
from anemoi.datasets.dates.groups import GroupOfDates

return GroupOfDates(sorted(set(group_of_dates) & set(filtering_dates)), group_of_dates.provider)

def origin(self, data: Any, action: Any, action_arguments: Any) -> Any:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to have type hints, can they be more specific? e.g. action must be an object supporting the origin method.


origin = action.origin()

result = []
for fs in data:
previous = fs.metadata("anemoi_origin", default=None)
fall_through = fs.metadata("anemoi_fall_through", default=False)
if fall_through:
# The field has pass unchanges in a filter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# The field has pass unchanges in a filter
# The field has passed unchanged through a filter

result.append(fs)
else:
anemoi_origin = origin.combine(previous, action, action_arguments)
result.append(new_field_with_metadata(fs, anemoi_origin=anemoi_origin))

return new_fieldlist_from_list(result)
159 changes: 159 additions & 0 deletions src/anemoi/datasets/create/input/origin.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
# (C) Copyright 2025 Anemoi contributors.
#
# This software is licensed under the terms of the Apache Licence Version 2.0
# which can be obtained at http://www.apache.org/licenses/LICENSE-2.0.
#
# In applying this licence, ECMWF does not waive the privileges and immunities
# granted to it by virtue of its status as an intergovernmental organisation
# nor does it submit to any jurisdiction.

import logging
from abc import ABC

LOG = logging.getLogger(__name__)


class Origin(ABC):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're creating an abstract base class, are there any abstract methods or properties we should be requiring?


def __init__(self, when="dataset-create"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it anticipated that other packages will create Origin objects (or subclasses of this)? If not, can when just be a class variable?

self.when = when

def __eq__(self, other):
if not isinstance(other, Origin):
return False
return self is other
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's just doing object equality, isn't the type check above redundant?


def __hash__(self):
return id(self)


def _un_dotdict(x):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this already in anemoi-utils?

if isinstance(x, dict):
return {k: _un_dotdict(v) for k, v in x.items()}

if isinstance(x, (list, tuple, set)):
return [_un_dotdict(a) for a in x]

return x


class Pipe(Origin):
def __init__(self, s1, s2, when="dataset-create"):
super().__init__(when)
self.steps = [s1, s2]

assert s1 is not None, (s1, s2)
assert s2 is not None, (s1, s2)

if isinstance(s1, Pipe):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is s1 is not a Pipe, but s2 is?

assert not isinstance(s2, Pipe), (s1, s2)
self.steps = s1.steps + [s2]

def combine(self, previous, action, action_arguments):
assert False, (self, previous)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to raise a more sensible exception than an AssertionError

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This is true in many places in the code)


def as_dict(self):
return {
"type": "pipe",
"steps": [s.as_dict() for s in self.steps],
"when": self.when,
}

def __repr__(self):
return " | ".join(repr(s) for s in self.steps)


class Join(Origin):
def __init__(self, origins, when="dataset-create"):
assert isinstance(origins, (list, tuple, set)), origins
super().__init__(when)
self.steps = list(origins)

assert all(o is not None for o in origins), origins

def combine(self, previous, action, action_arguments):
assert False, (self, previous)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above – better to raise a more specific exception


def as_dict(self):
return {
"type": "join",
"steps": [s.as_dict() for s in self.steps],
"when": self.when,
}

def __repr__(self):
return " & ".join(repr(s) for s in self.steps)


class Source(Origin):
def __init__(self, name, config, when="dataset-create"):
super().__init__(when)
assert isinstance(config, dict), f"Config must be a dictionary {config}"
self.name = name
self.config = _un_dotdict(config)

def combine(self, previous, action, action_arguments):
assert previous is None, f"Cannot combine origins, previous already exists: {previous}"
return self

def as_dict(self):
return {
"type": "source",
"name": self.name,
"config": self.config,
"when": self.when,
}

def __repr__(self):
return f"{self.name}({id(self)})"


class Filter(Origin):
def __init__(self, name, config, when="dataset-create"):
super().__init__(when)
assert isinstance(config, dict), f"Config must be a dictionary {config}"
self.name = name
self.config = _un_dotdict(config)
self._cache = {}

def combine(self, previous, action, action_arguments):

if previous is None:
# This can happen if the filter does not tag its output with an origin
# (e.g. a user plugin). In that case we try to get the origin from the action arguments
key = (id(action), id(action_arguments))
if key not in self._cache:

LOG.warning(f"No previous origin to combine with: {self}. Action: {action}")
LOG.warning(f"Connecting to action arguments {action_arguments}")
origins = set()
for k in action_arguments:
o = k.metadata("anemoi_origin", default=None)
if o is None:
raise ValueError(
f"Cannot combine origins, previous is None and action_arguments {action_arguments} has no origin"
)
origins.add(o)
if len(origins) == 1:
self._cache[key] = origins.pop()
else:
self._cache[key] = Join(origins)
previous = self._cache[key]

if previous in self._cache:
# We use a cache to avoid recomputing the same combination
return self._cache[previous]

self._cache[previous] = Pipe(previous, self)
return self._cache[previous]

def as_dict(self):
return {
"type": "filter",
"name": self.name,
"config": self.config,
"when": self.when,
}

def __repr__(self):
return f"{self.name}({id(self)})"
Loading
Loading