Commit d9df13b
committed
Load pyarrow dataset on TIMDEXDataset init
Why these changes are being introduced:
As the TIMDEXDatasetMetadata becomes more integrated, there is
less need to be explicit about how we load the pyarrow dataset.
Formerly, the method .load() needed to be called manually and
supported options like 'current_records' or 'include_parquet_files'.
This also reflected a time when 'TIMDEXDataset.load()' suggested that
"loading" was the pyarrow dataset only. With the introduction of
metadata, it is also better to be specific we are loading a pyarrow
dataset which is only one of many assets associated with a
TIMDEXDataset instance.
How this addresses that need:
Renames .load() to .load_pyarrow_dataset() to be explicit about
what is happening.
We no longer store the pyarrow dataset filesystem or paths on self,
as they are only used briefly during this dataset load. We can get
them anytime via .dataset.
Really most important, we limit the root 'location' that we init
a TIMDEXDataset instance to be a string only, the root of the dataset.
Now that we don't allow a list of strings at that level, we can trust
the nature of self.location to be a string, and the root of the TIMDEX
dataset.
Side effects of this change:
* TIMDEXDataset and TIMDEXDatasetMetadata can only be initialized
with a string, which is the root of the TIMDEX dataset. From there,
both know where their assets can be found.
* You cannot "pre-filter" the pyarrow dataset when loading, which had
confusing overlap with the read methods; the read methods themselves
may change somewhat dramatically now that we have metadata to use.
Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-5331 parent 05383bc commit d9df13b
File tree
8 files changed
+346
-598
lines changed- tests
- timdex_dataset_api
8 files changed
+346
-598
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
110 | 110 | | |
111 | 111 | | |
112 | 112 | | |
113 | | - | |
114 | | - | |
115 | | - | |
116 | | - | |
117 | | - | |
118 | | - | |
119 | 113 | | |
120 | 114 | | |
121 | 115 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
54 | 54 | | |
55 | 55 | | |
56 | 56 | | |
57 | | - | |
| 57 | + | |
58 | 58 | | |
59 | 59 | | |
60 | 60 | | |
| |||
95 | 95 | | |
96 | 96 | | |
97 | 97 | | |
| 98 | + | |
| 99 | + | |
98 | 100 | | |
99 | 101 | | |
100 | 102 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
82 | 82 | | |
83 | 83 | | |
84 | 84 | | |
85 | | - | |
86 | 85 | | |
87 | 86 | | |
88 | 87 | | |
| |||
110 | 109 | | |
111 | 110 | | |
112 | 111 | | |
113 | | - | |
114 | | - | |
115 | 112 | | |
116 | 113 | | |
117 | 114 | | |
| |||
165 | 162 | | |
166 | 163 | | |
167 | 164 | | |
168 | | - | |
169 | | - | |
170 | 165 | | |
171 | 166 | | |
172 | 167 | | |
| |||
202 | 197 | | |
203 | 198 | | |
204 | 199 | | |
205 | | - | |
206 | | - | |
207 | 200 | | |
208 | 201 | | |
209 | 202 | | |
| |||
0 commit comments