Skip to content

Commit 67b8939

Browse files
authored
MRG: Merge pull request #168 from octue/release/0.1.19
Release/0.1.19
2 parents 07f6e1b + bd726c6 commit 67b8939

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+2102
-1147
lines changed

docs/source/analysis_objects.rst

Lines changed: 2 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -27,18 +27,5 @@ your app can always be verified. These hashes exist on the following attributes:
2727
- ``configuration_values_hash``
2828
- ``configuration_manifest_hash``
2929

30-
If an input or configuration attribute is ``None``, so will its hash attribute be. For ``Manifests``, some metadata
31-
about the ``Datafiles`` and ``Datasets`` within them, and about the ``Manifest`` itself, is included when calculating
32-
the hash:
33-
34-
- For a ``Datafile``, the content of its on-disk file is hashed, along with the following metadata:
35-
36-
- ``name``
37-
- ``cluster``
38-
- ``sequence``
39-
- ``timestamp``
40-
- ``tags``
41-
42-
- For a ``Dataset``, the hashes of its ``Datafiles`` are included, along with its ``tags``.
43-
44-
- For a ``Manifest``, the hashes of its ``Datasets`` are included, along with its ``keys``.
30+
If a strand is ``None``, so will its corresponding hash attribute be. The hash of a datafile is the hash of
31+
its file, while the hash of a manifest or dataset is the cumulative hash of the files it refers to.

docs/source/child_services.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -104,13 +104,13 @@ The children field must also be present in the ``twine.json`` file:
104104
"key": "wind_speed",
105105
"purpose": "A service that returns the average wind speed for a given latitude and longitude.",
106106
"notes": "Some notes.",
107-
"filters": "tags:wind_speed"
107+
"filters": "labels:wind_speed"
108108
},
109109
{
110110
"key": "elevation",
111111
"purpose": "A service that returns the elevation for a given latitude and longitude.",
112112
"notes": "Some notes.",
113-
"filters": "tags:elevation"
113+
"filters": "labels:elevation"
114114
}
115115
],
116116
...

docs/source/cloud_storage.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ in Octue SDK, please join the discussion `in this issue. <https://github.com/oct
1212
Data container classes
1313
----------------------
1414
All of the data container classes in the SDK have a ``to_cloud`` and a ``from_cloud`` method, which handles their
15-
upload/download to/from the cloud, including all relevant metadata from the instance (e.g. tags, ID). Data integrity is
15+
upload/download to/from the cloud, including all relevant metadata from the instance (e.g. labels, ID). Data integrity is
1616
checked before and after upload and download to ensure any data corruption is avoided.
1717

1818
Datafile

docs/source/cloud_storage_advanced_usage.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,14 +26,14 @@ to any of these methods.
2626
local_path=<path/to/file>,
2727
bucket_name=<bucket-name>,
2828
path_in_bucket=<path/to/file/in/bucket>,
29-
metadata={"tags": ["blah", "glah", "jah"], "cleaned": True, "id": 3}
29+
metadata={"id": 3, "labels": ["blah", "glah", "jah"], "cleaned": True, "colour": "blue"}
3030
)
3131
3232
storage_client.upload_from_string(
3333
string='[{"height": 99, "width": 72}, {"height": 12, "width": 103}]',
3434
bucket_name=<bucket-name>,
3535
path_in_bucket=<path/to/file/in/bucket>,
36-
metadata={"tags": ["dimensions"], "cleaned": True, "id": 96}
36+
metadata={"id": 96, "labels": ["dimensions"], "cleaned": True, "colour": "red", "size": "small"}
3737
)
3838
3939
**Downloading**
@@ -61,7 +61,7 @@ to any of these methods.
6161
bucket_name=<bucket-name>,
6262
path_in_bucket=<path/to/file/in/bucket>,
6363
)
64-
>>> {"tags": ["dimensions"], "cleaned": True, "id": 96}
64+
>>> {"id": 96, "labels": ["dimensions"], "cleaned": True, "colour": "red", "size": "small"}
6565
6666
6767
**Deleting**

docs/source/datafile.rst

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,8 @@ the following main attributes:
1010
- ``path`` - the path of this file, which may include folders or subfolders, within the dataset.
1111
- ``cluster`` - the integer cluster of files, within a dataset, to which this belongs (default 0)
1212
- ``sequence`` - a sequence number of this file within its cluster (if sequences are appropriate)
13-
- ``tags`` - a space-separated string or iterable of tags relevant to this file
13+
- ``tags`` - key-value pairs of metadata relevant to this file
14+
- ``labels`` - a space-separated string or iterable of labels relevant to this file
1415
- ``timestamp`` - a posix timestamp associated with the file, in seconds since epoch, typically when it was created but could relate to a relevant time point for the data
1516

1617

@@ -43,14 +44,15 @@ Example A
4344
bucket_name = "my-bucket",
4445
datafile_path = "path/to/data.csv"
4546
46-
with Datafile.from_cloud(project_name, bucket_name, datafile_path, mode="r") as datafile, f:
47+
with Datafile.from_cloud(project_name, bucket_name, datafile_path, mode="r") as (datafile, f):
4748
data = f.read()
4849
new_metadata = metadata_calculating_function(data)
4950
5051
datafile.timestamp = new_metadata["timestamp"]
5152
datafile.cluster = new_metadata["cluster"]
5253
datafile.sequence = new_metadata["sequence"]
5354
datafile.tags = new_metadata["tags"]
55+
datafile.labels = new_metadata["labels"]
5456
5557
5658
Example B
@@ -76,7 +78,8 @@ Example B
7678
datafile.timestamp = datetime.now()
7779
datafile.cluster = 0
7880
datafile.sequence = 3
79-
datafile.tags = {"manufacturer:Vestas", "output:1MW"}
81+
datafile.tags = {"manufacturer": "Vestas", "output": "1MW"}
82+
datafile.labels = {"new"}
8083
8184
datafile.to_cloud() # Or, datafile.update_cloud_metadata()
8285
@@ -122,10 +125,11 @@ For creating new data in a new local file:
122125
123126
124127
sequence = 2
125-
tags = {"cleaned:True", "type:linear"}
128+
tags = {"cleaned": True, "type": "linear"}
129+
labels = {"Vestas"}
126130
127131
128-
with Datafile(path="path/to/local/file.dat", sequence=sequence, tags=tags, mode="w") as datafile, f:
132+
with Datafile(path="path/to/local/file.dat", sequence=sequence, tags=tags, labels=labels, mode="w") as (datafile, f):
129133
f.write("This is some cleaned data.")
130134
131135
datafile.to_cloud(project_name="my-project", bucket_name="my-bucket", path_in_bucket="path/to/data.dat")
@@ -139,7 +143,8 @@ For existing data in an existing local file:
139143
140144
141145
sequence = 2
142-
tags = {"cleaned:True", "type:linear"}
146+
tags = {"cleaned": True, "type": "linear"}
147+
labels = {"Vestas"}
143148
144-
datafile = Datafile(path="path/to/local/file.dat", sequence=sequence, tags=tags)
149+
datafile = Datafile(path="path/to/local/file.dat", sequence=sequence, tags=tags, labels=labels)
145150
datafile.to_cloud(project_name="my-project", bucket_name="my-bucket", path_in_bucket="path/to/data.dat")

docs/source/dataset.rst

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,10 @@ A ``Dataset`` contains any number of ``Datafiles`` along with the following meta
88

99
- ``name``
1010
- ``tags``
11+
- ``labels``
1112

1213
The files are stored in a ``FilterSet``, meaning they can be easily filtered according to any attribute of the
13-
:doc:`Datafile <datafile>` instances it contains.
14+
:doc:`Datafile <datafile>` instances contained.
1415

1516

1617
--------------------------------
@@ -23,23 +24,26 @@ You can filter a ``Dataset``'s files as follows:
2324
2425
dataset = Dataset(
2526
files=[
26-
Datafile(timestamp=time.time(), path="path-within-dataset/my_file.csv", tags="one a:2 b:3 all"),
27-
Datafile(timestamp=time.time(), path="path-within-dataset/your_file.txt", tags="two a:2 b:3 all"),
28-
Datafile(timestamp=time.time(), path="path-within-dataset/another_file.csv", tags="three all"),
27+
Datafile(path="path-within-dataset/my_file.csv", labels=["one", "a", "b" "all"]),
28+
Datafile(path="path-within-dataset/your_file.txt", labels=["two", "a", "b", "all"),
29+
Datafile(path="path-within-dataset/another_file.csv", labels=["three", "all"]),
2930
]
3031
)
3132
32-
dataset.files.filter(filter_name="name__ends_with", filter_value=".csv")
33+
dataset.files.filter(name__ends_with=".csv")
3334
>>> <FilterSet({<Datafile('my_file.csv')>, <Datafile('another_file.csv')>})>
3435
35-
dataset.files.filter("tags__contains", filter_value="a:2")
36+
dataset.files.filter(labels__contains="a")
3637
>>> <FilterSet({<Datafile('my_file.csv')>, <Datafile('your_file.txt')>})>
3738
38-
You can also chain filters indefinitely:
39+
You can also chain filters indefinitely, or specify them all at the same time:
3940
4041
.. code-block:: python
4142
42-
dataset.files.filter(filter_name="name__ends_with", filter_value=".csv").filter("tags__contains", filter_value="a:2")
43+
dataset.files.filter(name__ends_with=".csv").filter(labels__contains="a")
44+
>>> <FilterSet({<Datafile('my_file.csv')>})>
45+
46+
dataset.files.filter(name__ends_with=".csv", labels__contains="a")
4347
>>> <FilterSet({<Datafile('my_file.csv')>})>
4448
4549
Find out more about ``FilterSets`` :doc:`here <filter_containers>`, including all the possible filters available for each type of object stored on

docs/source/filter_containers.rst

Lines changed: 95 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -4,43 +4,61 @@
44
Filter containers
55
=================
66

7-
A filter container is just a regular python container that has some extra methods for filtering or ordering its
7+
A filter container is just a regular python container that has some extra methods for filtering and ordering its
88
elements. It has the same interface (i.e. attributes and methods) as the primitive python type it inherits from, with
99
these extra methods:
1010

1111
- ``filter``
1212
- ``order_by``
1313

14-
There are two types of filter containers currently implemented:
14+
There are three types of filter containers currently implemented:
1515

1616
- ``FilterSet``
1717
- ``FilterList``
18+
- ``FilterDict``
1819

19-
``FilterSets`` are currently used in:
20+
``FilterSets`` are currently used in ``Dataset.files`` to store ``Datafiles`` and make them filterable, which is useful
21+
for dealing with a large number of datasets, while ``FilterList`` is returned when ordering any filter container.
2022

21-
- ``Dataset.files`` to store ``Datafiles``
22-
- ``TagSet.tags`` to store ``Tags``
23-
24-
You can see filtering in action on the files of a ``Dataset`` :doc:`here <dataset>`.
23+
You can see an example of filtering of a ``Dataset``'s files :doc:`here <dataset>`.
2524

2625

2726
---------
2827
Filtering
2928
---------
3029

31-
Filters are named as ``"<name_of_attribute_to_check>__<filter_action>"``, and any attribute of a member of the
32-
``FilterSet`` whose type or interface is supported can be filtered.
30+
Key points:
31+
32+
* Any attribute of a member of a filter container whose type or interface is supported can be used when filtering
33+
* Filters are named as ``"<name_of_attribute_to_check>__<filter_action>"``
34+
* Multiple filters can be specified at once for chained filtering
35+
* ``<name_of_attribute_to_check>`` can be a single attribute name or a double-underscore-separated string of nested attribute names
36+
* Nested attribute names work for real attributes as well as dictionary keys (in any combination and to any depth)
3337

3438
.. code-block:: python
3539
3640
filter_set = FilterSet(
37-
{Datafile(timestamp=time.time(), path="my_file.csv"), Datafile(timestamp=time.time(), path="your_file.txt"), Datafile(timestamp=time.time(), path="another_file.csv")}
41+
{
42+
Datafile(path="my_file.csv", cluster=0, tags={"manufacturer": "Vestas"}),
43+
Datafile(path="your_file.txt", cluster=1, tags={"manufacturer": "Vergnet"}),
44+
Datafile(path="another_file.csv", cluster=2, tags={"manufacturer": "Enercon"})
45+
}
3846
)
3947
40-
filter_set.filter(filter_name="name__ends_with", filter_value=".csv")
48+
# Single filter, non-nested attribute.
49+
filter_set.filter(name__ends_with=".csv")
4150
>>> <FilterSet({<Datafile('my_file.csv')>, <Datafile('another_file.csv')>})>
4251
43-
The following filters are implemented for the following types:
52+
# Two filters, non-nested attributes.
53+
filter_set.filter(name__ends_with=".csv", cluster__gt=1)
54+
>>> <FilterSet({<Datafile('another_file.csv')>})>
55+
56+
# Single filter, nested attribute.
57+
filter_set.filter(tags__manufacturer__startswith("V"))
58+
>>> <FilterSet({<Datafile('my_file.csv')>, <Datafile('your_file.csv')>})>
59+
60+
61+
These filters are currently available for the following types:
4462

4563
- ``bool``:
4664

@@ -67,25 +85,60 @@ The following filters are implemented for the following types:
6785
* ``not_starts_with``
6886
* ``ends_with``
6987
* ``not_ends_with``
88+
* ``in_range``
89+
* ``not_in_range``
7090

7191
- ``NoneType``:
7292

7393
* ``is``
7494
* ``is_not``
7595

76-
- ``TagSet``:
96+
- ``LabelSet``:
7797

7898
* ``is``
7999
* ``is_not``
80100
* ``equals``
81101
* ``not_equals``
82-
* ``any_tag_contains``
83-
* ``not_any_tag_contains``
84-
* ``any_tag_starts_with``
85-
* ``not_any_tag_starts_with``
86-
* ``any_tag_ends_with``
87-
* ``not_any_tag_ends_with``
88-
102+
* ``contains``
103+
* ``not_contains``
104+
* ``any_label_contains``
105+
* ``not_any_label_contains``
106+
* ``any_label_starts_with``
107+
* ``not_any_label_starts_with``
108+
* ``any_label_ends_with``
109+
* ``not_any_label_ends_with``
110+
111+
- ``datetime.datetime``:
112+
* ``is``
113+
* ``is_not``
114+
* ``equals``
115+
* ``not_equals``
116+
* ``lt`` (less than)
117+
* ``lte`` (less than or equal)
118+
* ``gt`` (greater than)
119+
* ``gte`` (greater than or equal)
120+
* ``in_range``
121+
* ``not_in_range``
122+
* ``year_equals``
123+
* ``year_in``
124+
* ``month_equals``
125+
* ``month_in``
126+
* ``day_equals``
127+
* ``day_in``
128+
* ``weekday_equals``
129+
* ``weekday_in``
130+
* ``iso_weekday_equals``
131+
* ``iso_weekday_in``
132+
* ``time_equals``
133+
* ``time_in``
134+
* ``hour_equals``
135+
* ``hour_in``
136+
* ``minute_equals``
137+
* ``minute_in``
138+
* ``second_equals``
139+
* ``second_in``
140+
* ``in_date_range``
141+
* ``in_time_range``
89142

90143

91144
Additionally, these filters are defined for the following *interfaces* (duck-types). :
@@ -100,6 +153,8 @@ Additionally, these filters are defined for the following *interfaces* (duck-typ
100153
* ``lte``
101154
* ``gt``
102155
* ``gte``
156+
* ``in_range``
157+
* ``not_in_range``
103158

104159
- Iterables:
105160

@@ -118,14 +173,31 @@ list of filters.
118173
--------
119174
Ordering
120175
--------
121-
As sets are inherently orderless, ordering a ``FilterSet`` results in a new ``FilterList``, which has the same extra
122-
methods and behaviour as a ``FilterSet``, but is based on the ``list`` type instead - meaning it can be ordered and
123-
indexed etc. A ``FilterSet`` or ``FilterList`` can be ordered by any of the attributes of its members:
176+
As sets and dictionaries are inherently orderless, ordering any filter container results in a new ``FilterList``, which
177+
has the same methods and behaviour but is based on ``list`` instead, meaning it can be ordered and indexed etc. A
178+
filter container can be ordered by any of the attributes of its members:
124179

125180
.. code-block:: python
126181
127182
filter_set.order_by("name")
128183
>>> <FilterList([<Datafile('another_file.csv')>, <Datafile('my_file.csv')>, <Datafile(path="your_file.txt")>])>
129184
185+
filter_set.order_by("cluster")
186+
>>> <FilterList([<Datafile('my_file.csv')>, <Datafile('your_file.csv')>, <Datafile(path="another_file.txt")>])>
187+
130188
The ordering can also be carried out in reverse (i.e. descending order) by passing ``reverse=True`` as a second argument
131189
to the ``order_by`` method.
190+
191+
192+
--------------
193+
``FilterDict``
194+
--------------
195+
The keys of a ``FilterDict`` can be anything, but each value must be a ``Filterable``. Hence, a ``FilterDict`` is
196+
filtered and ordered by its values' attributes; when ordering, its items (key-value tuples) are returned in a
197+
``FilterList``.
198+
199+
-----------------------
200+
Using for your own data
201+
-----------------------
202+
If using filter containers for your own data, all the members must inherit from ``octue.mixins.filterable.Filterable``
203+
to be filterable and orderable.

0 commit comments

Comments
 (0)