You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wanted to use a loop in my pyinfra code. Having some experience with Ansible and knowing that loops and conditionals can be tricky in this context, I searched around in the documentation and codebase.
I found host.loop via a Google result that led to what turned out to be outdated pyinfra 2.x documentation. The page didn't make it immediately obvious that it was for an outdated version but the problem it explained sounds real to me. Perhaps a banner warning about outdated information would help future visitors?
Anyway, so I started using host.loop and noticed it was missing type annotation because my code editor stopped suggestion auto-completion. Thus I've started adding type annotations for host.loop so auto-completion would work in my editor again. While doing that, I noticed a potential issue, but before diving into the details, I have a preliminary question.
Is host.loop still a supported feature?
I noticed that host.loop was documented in the pyinfra 2.x documentation but this section was removed in 3.x. The current documentation doesn't mention loops at all. Is host.loop deprecated or being phased out? If so, the rest of this message might be moot.
If a loop exits early via break, return, or exception, self.loop_position.pop() is never executed. Cleanup only happens when the generator is garbage collected which in CPython usually happens immediately due to reference counting, but this is an implementation detail, not guaranteed behavior.
Note: wrapping the generator body in try/finally doesn't fully solve this either unfortunately. The finally only runs when the generator is closed or collected and Python's for loop does not call .close() on break. This is a known limitation that has a proposed fix, but it was deferred.
Why this might matter
The loop_position feeds into solve_operation_consistency:
A leaked entry means subsequent operations (outside the loop) would still include a stale loop position in their op_order, potentially affecting operation ordering or hashing.
My attempted solution for the cleanup problem
I refactored Host.loop to return a HostLoop class that:
Implements both the iterator and context manager protocols
Tracks its index in loop_position at creation time
Uses index-based updates (loop_position[self._index] = i) instead of loop_position[-1]
Uses a weakref finalizer for best-effort cleanup of abandoned loops
Sets a None sentinel on cleanup instead of popping (to preserve indices)
Sentinel values in op_order: with None sentinels, op_order could contain None values. Alternatively, abandoned loops could retain their last reported position, which would mimic current behavior while still allowing eventual cleanup.
Loop identity: currently a loop's identity is just its index in loop_position. If two separate loops happen to be at the same index and position, they'd contribute identical values to op_order. Would sequential loop IDs help distinguish them?
The Questions left in my head
So is host.loop still supported, or is it deprecated?
If supported: is leaking loop_position entries a real problem that needs fixing?
If it needs fixing: is requiring with host.loop(...): for deterministic cleanup an acceptable API change?
Should abandoned loops contribute to op_order? If so, what value? Should it be None, their last position, or filtered out entirely?
Would sequential loop IDs add value, or is position sufficient?
Happy to share my implementation if it would help the discussion. Looking forward to your thoughts and insights on this matter.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hey,
I wanted to use a loop in my pyinfra code. Having some experience with Ansible and knowing that loops and conditionals can be tricky in this context, I searched around in the documentation and codebase.
I found
host.loopvia a Google result that led to what turned out to be outdated pyinfra 2.x documentation. The page didn't make it immediately obvious that it was for an outdated version but the problem it explained sounds real to me. Perhaps a banner warning about outdated information would help future visitors?Anyway, so I started using
host.loopand noticed it was missing type annotation because my code editor stopped suggestion auto-completion. Thus I've started adding type annotations forhost.loopso auto-completion would work in my editor again. While doing that, I noticed a potential issue, but before diving into the details, I have a preliminary question.Is
host.loopstill a supported feature?I noticed that
host.loopwas documented in the pyinfra 2.x documentation but this section was removed in 3.x. The current documentation doesn't mention loops at all. Ishost.loopdeprecated or being phased out? If so, the rest of this message might be moot.The problem I found
Looking at the current implementation:
pyinfra/src/pyinfra/api/host.py
Lines 140 to 145 in 5291e90
If a loop exits early via
break,return, or exception,self.loop_position.pop()is never executed. Cleanup only happens when the generator is garbage collected which in CPython usually happens immediately due to reference counting, but this is an implementation detail, not guaranteed behavior.Note: wrapping the generator body in
try/finallydoesn't fully solve this either unfortunately. Thefinallyonly runs when the generator is closed or collected and Python'sforloop does not call.close()onbreak. This is a known limitation that has a proposed fix, but it was deferred.Why this might matter
The
loop_positionfeeds intosolve_operation_consistency:pyinfra/src/pyinfra/api/operation.py
Lines 410 to 441 in 5291e90
A leaked entry means subsequent operations (outside the loop) would still include a stale loop position in their
op_order, potentially affecting operation ordering or hashing.My attempted solution for the cleanup problem
I refactored
Host.loopto return aHostLoopclass that:loop_positionat creation timeloop_position[self._index] = i) instead ofloop_position[-1]Nonesentinel on cleanup instead of popping (to preserve indices)Nonevalues lazilyThis allows both usage patterns:
The edge cases this revealed
Out-of-order cleanup: a parent loop can be abandoned while a child loop continues:
Sentinel values in op_order: with
Nonesentinels,op_ordercould containNonevalues. Alternatively, abandoned loops could retain their last reported position, which would mimic current behavior while still allowing eventual cleanup.Loop identity: currently a loop's identity is just its index in
loop_position. If two separate loops happen to be at the same index and position, they'd contribute identical values toop_order. Would sequential loop IDs help distinguish them?The Questions left in my head
host.loopstill supported, or is it deprecated?loop_positionentries a real problem that needs fixing?with host.loop(...):for deterministic cleanup an acceptable API change?op_order? If so, what value? Should it beNone, their last position, or filtered out entirely?Happy to share my implementation if it would help the discussion. Looking forward to your thoughts and insights on this matter.
Beta Was this translation helpful? Give feedback.
All reactions