You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#1622: Fix issues with nested context manager calls
# Summary
## Problem statement
The `Resource` class is also a [Context Manager](https://docs.python.org/3/reference/datamodel.html#context-managers). That is, it implements the
`__enter()__` and `__exit()__` methods to allow the use of
`with Resource(...)` statements.
Prior to this PR, there was no limit on nesting `with` statements on
the same `Resource`, but this caused problems because while the second
`__enter()__` allowed the `Resource` to already be open, the first
`__exit()__` would `close()` the Resource while the higher level context
would expect it to still be open.
This would cause errors like "ValueError: I/O operation on closed file",
or the iterator would appear to start from part way through a file rather
than at the start of the file, and other similar behaviour depending on
the exact locations of the nested functions.
This was made more complex because these `with` statements were often
far removed from each other in the code, hidden behind iterators driven
by generators, etc. They also could have different behaviour depending on
number of rows read, the type of Resource (local file vs inline, etc.),
the different steps in a pipeline, etc. etc.
All this meant that the problem was rare, hard to reduce down to an
obvious reproduction case, and not realistic to expect developers to
understand while developing new functionality.
## Solution
This PR prevents nested contexts being created by throwing an exception
when the second, nested, `with` is attempted. This means that code that
risks these issues can be quickly identified and resolved during development.
The best way to resolve it is to use `Resource.to_copy()` to copy so that
the nested `with` is acting on an independent view of the same Resource,
which is likely what is intended in most cases anyway.
This PR also updates a number of the internal uses of `with` to work
on a copy of the Resource they are passed so that they are independent
of any external code and what it might have done with the Resource prior
to the library methods being called.
## Breaking Change
This is technically a breaking change as any external code that was
developed using nested `with` statements - possibly deliberately, but
more likely unknowingly not falling into the error cases - will have to
be updated to use `to_copy()` or similar.
However, the library functions have all been updated in a way that
doesn't change their signature or their expected behaviour as documented
by the unit tests. All pre-existing unit tests pass with no changes,
and added unit tests for the specific updated behaviour do not require
any unusual constructs. It is still possible that some undocumented
and untested side effect behaviours are different than before and any
code relying on those may also be affected (e.g. `to_petl()` iterators are now independent rather than causing changes in each other)
So it is likely that very few actual impacts will occur in real world
code, and the exception thrown does it's best to explain the issue
and suggest resolutions.
# Tests
- All existing unit tests run and pass unchanged
- New unit tests were added to cover the updated behaviour
- These unit tests were confirmed to fail without the updates
in this PR (where appropriate).
- These unit tests now pass with the updated code.
- The original script that identified the issue in #1622 was run and
now gives the correct result (all rows appropriately converted and
saved to file)
# TODO: shall we guarantee here that it's at the beggining for the file?
258
259
# TODO: maybe it's possible to do type narrowing here?
259
260
def__enter__(self):
260
-
ifself.closed:
261
-
self.open()
261
+
"""
262
+
Enters a context manager for the resource.
263
+
We need to be careful with contexts because they open and close the Resource
264
+
(and thus any underlying files) and we don't want to close a file that is
265
+
being used somewhere higher up the call stack.
266
+
267
+
e.g. if nested contexts were allowed then:
268
+
269
+
with Resource("in.csv") as resource:
270
+
with resource:
271
+
# use resource
272
+
resource.write("out.csv")
273
+
274
+
would result in errors because the second context would close the file
275
+
before the write happened. While the above code is obvious, similar
276
+
things can happen when composing steps in pipelines, calling petl code etc.
277
+
where the various functions may have no knowledge of each other.
278
+
See #1622 for more details.
279
+
280
+
So we only allow a single context to be open at a time, and raise an
281
+
exception if nested context is attempted. For similar reasons, we
282
+
also raise an exception if a context is attempted on an open resource.
283
+
284
+
The above code can be successfully written as:
285
+
286
+
with Resource("in.csv") as resource:
287
+
with resource.to_copy() as resource2:
288
+
use resource2:
289
+
resource.write("out.csv")
290
+
291
+
which keeps resource and resource2 as independent views on the same file.
292
+
293
+
Note that if you absolutely need to use a resource in a manner where you
294
+
don't care if it is "opened" multiple times and closed once then you
295
+
can directly use `open()` and `close()` but you also become responsible
296
+
for ensuring the file is closed at the correct time.
297
+
"""
298
+
ifself.__context_manager_entered:
299
+
note="Resource has previously entered a context manager (`with` statement) and does not support nested contexts. To use in a nested context use `to_copy()` then use the copy in the `with`."
300
+
raiseFrictionlessException(note)
301
+
ifself.closed==False:
302
+
note="Resource is currently open, and cannot be used in a `with` statement (which would reopen the file). To use `with` on an open Resouece, use to_copy() then use the copy in the `with`."
0 commit comments