Skip to content

Conversation

davidhewitt
Copy link
Member

This is a bit of an experiment based upon the recommendation in PEP 788 to use PyThreadState_GetUnchecked as a correct way to check if the thread is currently attached to the interpreter.

  • It's avaiable on 3.13 and up, but not yet for the stable ABI
  • Annoyingly, during testing I found that it would return a non-null pointer while inside GC traversal, so I had to guard against this.

The really nice thing about this is that it would solve the long-standing problem #1683, at least on versions where this API is available, because we'd be using the interpreter's actual state instead of our guard in PyO3.

Comment on lines +103 to +107
// Annoyingly, `PyThreadState_GetUnchecked` can return non-null when in the GC, so
// we have to check if we're in GC traversal.
if is_in_gc_traversal() {
return None;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @ZeroIntensity is this behaviour of PyThreadState_GetUnchecked returning non-null when inside GC traversal known? It's possible for user code to observe this in implementations of tp_traverse handlers. (Yes, in theory they should not do anything other than visit types in those calls, but we cannot safely guarantee that!)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, what was being used previously that returned NULL inside the GC?

As far as I know, Python has always required that tp_traverse is called with an attached thread state, because:

  1. They're allowed to call arbitrary Python code (though doing so is rare).
  2. More importantly, they access fields protected by the GIL or stop-the-world.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, what was being used previously that returned NULL inside the GC?

PyO3 was using an internal TLS counter (ATTACH_COUNT in the diff here) because we couldn't rely on PyGILState_Check, and using PyGILState_Ensure was a performance drag. Looks like PyThreadState_GetUnchecked is fast enough, and probably would be as fast as our internal TLS counter if we didn't have to do a second TLS check for GC, so it would be great to make that rule that second check unnecessary.

As far as I know, Python has always required that tp_traverse is called with an attached thread state, because

This surprises me, because we've had several crashes reported over time when arbitrary code has been run inside tp_traverse, especially related to Py_DecRef (which we have to assume arbitrary code will call). e.g. #3165, #4479. As a result we banned attaching to the interpreter while inside a tp_traverse handler.

More importantly, they access fields protected by the GIL or stop-the-world.

I think that is not incompatible with the thread being "detached", because GC is defined as a stop-the-world event?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok, you're right on the first part -- tp_traverse is not allowed to call Python code. On the free-threaded build, tp_traverse handlers are called during a stop-the-world, so it is definitely not allowed to enter the eval loop. Sorry about the confusion there!

As a result we banned attaching to the interpreter while inside a tp_traverse handler.

We don't have any explicit behavior for disallowing this in CPython. We just follow the usual rules; we either hold the GIL or make a stop-the-world pause so that no other thread can be attached. I think this is just a difference of the "consenting adults" approach in C vs the memory-safety approach in Rust.

I think that is not incompatible with the thread being "detached", because GC is defined as a stop-the-world event?

I haven't looked too deep into it, but it might be safe to detach the thread state around tp_traverse calls, depending on what state we currently allow them to access. It's just not too useful for us to do so, because it will likely regress performance and also might break some weird-but-valid use cases (for example, calling Py_INCREF in a tp_traverse handler, but not Py_DECREF).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I previously mentioned Py_DECREF as a concern it's worth pointing out that at least on the free-threaded build Py_INCREF is also a problem: python/cpython#123241

It seems like there's some clarification needed on what is allowed operation during tp_traverse and the free-threaded build has made the problem worse, potentially. Until it's documented as permitted my instinct is to allow nothing.

I think this is just a difference of the "consenting adults" approach in C vs the memory-safety approach in Rust.

Perhaps, note that in PyO3 a user can still just do unsafe { Python::assume_attached() } to get "proof" that they're attached to the interpreter, but this would at least stick out as a big red flag in code review to analyze carefully.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(we're in the process of cleaning up API names, currently it's called assume_gil_acquired if you happen to search for this)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Py_INCREF should generally be allowed in tp_traverse. It's a bug that the free-threaded GC doesn't deal with it.

If you're up to it, feel free to submit a PR adding some clarification to tp_traverse. Otherwise, I can do it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for falling off the thread a little - I'm happy to submit a PR, however I might take a little while to get around to it 👍

Copy link

codspeed-hq bot commented Aug 21, 2025

CodSpeed Performance Report

Merging #5350 will degrade performances by 49.58%

Comparing davidhewitt:check-thread-state (438b4f8) with main (e93393c)

Summary

❌ 3 regressions
✅ 89 untouched

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
extract_str_extract_fail 1.8 µs 2.1 µs -15.8%
drop_many_objects 11.4 µs 22.6 µs -49.58%
drop_many_objects_without_gil 41.8 µs 51.2 µs -18.47%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants