-
Notifications
You must be signed in to change notification settings - Fork 60
Add support for the free-threaded build #178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fix various errors and warnings seen on clang and gcc
…d-flag Add CT_UNDER_CONSTRUCTION to indicate that a type is being mutated
(Kumar will fix this soon!)
Move CT_CUSTOM_FIELD_POS and CT_WITH_PACKED_CHANGE to ct_flags_mut
Replace ct_is_hidden with asserts
fix headers to avoid duplicated definitions
…dsafe fix thread safety of `_realize_c_struct_or_union`
Co-authored-by: Nathan Goldbaum <[email protected]>
OK great, please feel free to push directly to this PR branch if that's convenient for you. I'll also try to reproduce what you were seeing with the test extensions. Having a per-test check in pytest-run-parallel like you describe is probably a good idea. We could add it to the test executor, which is already wrapped in a try/finally. Of course that doesn't help if the GIL gets re-enabled in a subprocess, which might be what's happening here. |
This wording is making me think that maybe you're on Windows and are using the Python.org python installer, which unfortunately has a pretty major issue that won't be fixed: python/cpython#127294. In short, if you install both the free-threaded and GIL-enabled interpreter, they will share a The Windows Python.org installer is the only distribution that has this issue - if you install Python via the new PyManager installer or nuget, the free-threaded interpreter and GIL-enabled interpreter won't share an overlapping installation. |
I looked at the extensions created by the tests and it looks like they all set I spent a little time drilling down into the extensions generated by @colesbury tells me he thinks the most likely result of not having I'll wait until I have more info to proceed further. |
I'm not sure why @nitzmahone is seeing GIL enabled warnings. I don't see any locally and I don't see any in the GitHub CI logs. Could you have some unrelated or stale modules in the same directory that are pulled in by The Windows explanation doesn't make sense to me. First, @nitzmahone - when you get a chance to look at it more, would you please share the relevant logs? |
The issues I've been seeing were all from manual poking around on my Linux dev box (Fedora 42 x86_64) with a bone-stock build from Most of the test extensions are being correctly built with Now that I'm less pudding-brain, I'll go back over it with fresh eyes and make sure I'm not getting tripped up by random stale non- More to come... |
Grr, sorry for the fire-drill- I don't know exactly where it was coming from because I just blew it all away, but must've been some cached intermediate build cruft from previous local/manual test runs where the extension init GIL opt-out wasn't occurring. I'm just validating a tweak to the CIBW test config to force warning-as-error on GIL-re-enable for all the Thanks all! |
Hrm, after kicking off a few manual workflow runs with all targets enabled to simulate a release, the Linux and MacOS 3.13t parallel runs (skipped for PRs) are both segfaulting in CI in exactly the same place ( I'm able to very reliably repro the segfault locally on Fedora 42's packaged build of 3.13.5t with:
where various similar permutations seem fine on 3.14.0rc1t. I'm not opposed to proceeding with merge/beta without fixing this, but I would really like to at least understand what's going on there. I need to hang it up for the day pretty soon, and I'm unavailable on Friday morning. If someone else that can repro this locally wants to try and catch it in the act against a debug build, cool, otherwise I'll take a stab at it Friday afternoon. |
Argh! So close... I didn't see this when I triggered the full CI back in June: https://github.com/ngoldbaum/cffi-ft/actions/runs/15769464108. I Just double-checked and that run did include the commit that makes sure the GIL doesn't get re-enabled in the tests, so that test run did pass on 3.13t with the GIL disabled. Too bad I didn't think to redo that exercise in the month since then. Apologies for the oversight. I ran the tests tonight using 3.13.5t on my Mac and I see similar results (TSAN reports and random test failures, although no segfaults). We did some updates to the PR after I triggered that full CI run - in particular we adjusted the locking strategy to do something a lot simpler - just use a single critical section on a single globally allocated dummy PyObject. On Python 3.14, this acts more-or-less like a sort of "per-library GIL". On Python 3.13 there are different semantics for when critical sections can be suspended and I suspect that's the source of the differences here. That's just my guess anyway. I pinged @kumaraditya303 to help track this down. Hopefully he has time to look at this tomorrow and there is something straightforward we can do, but it may take a little time to work out the correct fix. Just a thought - maybe we can disable pytest-run-parallel CI for the wheel builds on 3.13t while we work out these issues, just to unblock people who want to enable builds downstream. We can note in the release notes that there are known thread safety issues on 3.13t. |
I wouldn't want the first release to go out with known problems, hopefully it will be straightforward to fix this. |
Yeah, I'd prefer to have the beta working without caveat- hopefully there's a quick fix once we figure out exactly what's going on. If we can't get an easy fix, well, we can cross that bridge if we come to it. Painfully slow as it is, it'd probably be wise to at least temporarily enable the full test suite for a Windows target or two as well- IIRC I had the whole thing passing for either early 3.14 alpha or late 3.13 pre-relase (only tested with GIL-ful threading), so not sure what its current operational state is. I'll pick that up Friday afternoon as well. |
@kumaraditya303 did some digging and we're now pretty confident the issue is what I described above: recursive critical sections can be suspended more often in 3.13 than they could be in 3.14. This comes down to this change: python/cpython#128126. While that is billed as a performance improvement, it also has a side-effect of making recursive critical sections behave more like recursive locks in 3.14 than they did in 3.13. We now think the clearest path forward is probably just to not try to support 3.13t in CFFI, at least not at first. If it turns out that 3.13t support is critical for whatever reason, we can add support later, but it'd be a shame to not ship 3.14t support using our current approach when we know that works fine. Another reason not to support 3.13t is there are also races in CPython internals that are triggered in the CFFI test suite (see the gist I shared yesterday). Some of them are coming from PyDict internals -- CFFI uses python dictionaries heavily in its implementation -- and as I understand it the fixes for these bugs probably won't be backported to 3.13. If we do need to support 3.13t then we probably need to replace the If anyone really does need to use 3.13t, they can, they'll just need to deal with the thread safety issues. You can work around them by initializing types in the main thread before spawning worker threads. Obviously that's not good and CFFI shouldn't officially support that, but 3.13t is experimental as-is and anyone using it can be expected to go a little out of their way to get things working. @nitzmahone @mattip if you're ok with not supporting 3.13t then I'll go ahead and remove the 3.13t CI and wheel builds from this PR. |
As a user, we are okay with only 3.14 support. |
Yeah, I'm fine with that. It's not ideal, but IIUC the experimental label wasn't retroactively removed from 3.13t- 3.14t is the first release that's been "blessed" for production use. If 3.13t support would complicate things with bespoke/non-portable sync primitives, I'm all for skipping it. Are you thinking that CFFI 2.0+ should explicitly refuse to build against 3.13t, or just that it's documented as "YMMV" and that we won't offer wheels for it? |
Awesome! I'll work on this now.
I can do it either way. I think maybe let's be conservative and add a check to the To make that concrete, right now you see errors if you try to install something that depends on CFFI due to trying to use the limited API. Here's the output of If I add a Someone can patch that error away and the build will succeed, but then they know they're doing unsupported things. Does that sound reasonable? |
See latest commits. I also kicked off a "full" CI run on my fork: https://github.com/ngoldbaum/cffi-ft/actions/runs/16530469003 |
It looks like everything is passing. Still waiting on PPC64le and s390x but that will take a while since they're emulated. Maybe I should re-enable Windows too? |
Yeah, that all looks great- I'm ready to merge if you are. |
Let's ship it! |
Also if you need any support in any tasks for shipping the beta or final release, feel free to ping me 😀 |
Is anything else required to merge? (Or did we livelock here :D) |
Nah, just woke up to lots of other things on fire this morning- merging now... |
Woooo! Huge thank you to everyone who made this happen! |
Fixes #126.
Adds support for the free-threaded build of Python 3.13 and 3.14 by improving the thread safety of CFFI internals.
Overview
Mutable state that is accessible to more than one thread in the CFFI backend is now mediated via atomic operations, critical sections, or mutexes, depending on the use-case:
get_primitive_type
: We initialize all primitive types at module initialization, instead of doing them lazily. This is pretty inexpensive (~100 µs).file_struct
: now initialized during module initialization_get_ct_int
: now initialized during module initializationinit_once_cache
: The cache itself is initialized under a critical section and we use APIs that return strong references instead of borrowed references.malloc_closure
: use a global mutexb_complete_struct_or_union
: Added critical section.pytest-run-parallel
and sets uppytest-run-parallel
Linux, Mac, and Windows CI as well as a run using a TSAN-instrumented build running on Python 3.14.Open questions
ctypes
module in CPython isn't thread-safe in 3.13t. Should we worry about thectypes
backend on 3.13t?cc @kumaraditya303 @colesbury
@colesbury also wants to do another round of code review on the final version of this PR.