-
Notifications
You must be signed in to change notification settings - Fork 878
use PyThreadState_GetUnchecked
to test if attached to the interpreter
#5350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
// Annoyingly, `PyThreadState_GetUnchecked` can return non-null when in the GC, so | ||
// we have to check if we're in GC traversal. | ||
if is_in_gc_traversal() { | ||
return None; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @ZeroIntensity is this behaviour of PyThreadState_GetUnchecked
returning non-null when inside GC traversal known? It's possible for user code to observe this in implementations of tp_traverse
handlers. (Yes, in theory they should not do anything other than visit types in those calls, but we cannot safely guarantee that!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, what was being used previously that returned NULL
inside the GC?
As far as I know, Python has always required that tp_traverse
is called with an attached thread state, because:
- They're allowed to call arbitrary Python code (though doing so is rare).
- More importantly, they access fields protected by the GIL or stop-the-world.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, what was being used previously that returned NULL inside the GC?
PyO3 was using an internal TLS counter (ATTACH_COUNT
in the diff here) because we couldn't rely on PyGILState_Check
, and using PyGILState_Ensure
was a performance drag. Looks like PyThreadState_GetUnchecked
is fast enough, and probably would be as fast as our internal TLS counter if we didn't have to do a second TLS check for GC, so it would be great to make that rule that second check unnecessary.
As far as I know, Python has always required that
tp_traverse
is called with an attached thread state, because
This surprises me, because we've had several crashes reported over time when arbitrary code has been run inside tp_traverse
, especially related to Py_DecRef
(which we have to assume arbitrary code will call). e.g. #3165, #4479. As a result we banned attaching to the interpreter while inside a tp_traverse
handler.
More importantly, they access fields protected by the GIL or stop-the-world.
I think that is not incompatible with the thread being "detached", because GC is defined as a stop-the-world event?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok, you're right on the first part -- tp_traverse
is not allowed to call Python code. On the free-threaded build, tp_traverse
handlers are called during a stop-the-world, so it is definitely not allowed to enter the eval loop. Sorry about the confusion there!
As a result we banned attaching to the interpreter while inside a tp_traverse handler.
We don't have any explicit behavior for disallowing this in CPython. We just follow the usual rules; we either hold the GIL or make a stop-the-world pause so that no other thread can be attached. I think this is just a difference of the "consenting adults" approach in C vs the memory-safety approach in Rust.
I think that is not incompatible with the thread being "detached", because GC is defined as a stop-the-world event?
I haven't looked too deep into it, but it might be safe to detach the thread state around tp_traverse
calls, depending on what state we currently allow them to access. It's just not too useful for us to do so, because it will likely regress performance and also might break some weird-but-valid use cases (for example, calling Py_INCREF
in a tp_traverse
handler, but not Py_DECREF
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I previously mentioned Py_DECREF
as a concern it's worth pointing out that at least on the free-threaded build Py_INCREF
is also a problem: python/cpython#123241
It seems like there's some clarification needed on what is allowed operation during tp_traverse
and the free-threaded build has made the problem worse, potentially. Until it's documented as permitted my instinct is to allow nothing.
I think this is just a difference of the "consenting adults" approach in C vs the memory-safety approach in Rust.
Perhaps, note that in PyO3 a user can still just do unsafe { Python::assume_attached() }
to get "proof" that they're attached to the interpreter, but this would at least stick out as a big red flag in code review to analyze carefully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(we're in the process of cleaning up API names, currently it's called assume_gil_acquired
if you happen to search for this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Py_INCREF
should generally be allowed in tp_traverse
. It's a bug that the free-threaded GC doesn't deal with it.
If you're up to it, feel free to submit a PR adding some clarification to tp_traverse
. Otherwise, I can do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for falling off the thread a little - I'm happy to submit a PR, however I might take a little while to get around to it 👍
CodSpeed Performance ReportMerging #5350 will degrade performances by 49.58%Comparing Summary
Benchmarks breakdown
|
This is a bit of an experiment based upon the recommendation in PEP 788 to use
PyThreadState_GetUnchecked
as a correct way to check if the thread is currently attached to the interpreter.The really nice thing about this is that it would solve the long-standing problem #1683, at least on versions where this API is available, because we'd be using the interpreter's actual state instead of our guard in PyO3.