Add support for the free-threaded build #178

ngoldbaum · 2025-06-19T18:32:03Z

Fixes #126.

Adds support for the free-threaded build of Python 3.13 and 3.14 by improving the thread safety of CFFI internals.

Overview

Mutable state that is accessible to more than one thread in the CFFI backend is now mediated via atomic operations, critical sections, or mutexes, depending on the use-case:

Replace use of PyDict and PyWeakref C APIs producing borrowed references with new APIs (or shims on older versions) that produce strong references.
get_primitive_type: We initialize all primitive types at module initialization, instead of doing them lazily. This is pretty inexpensive (~100 µs).
file_struct: now initialized during module initialization
_get_ct_int: now initialized during module initialization
init_once_cache: The cache itself is initialized under a critical section and we use APIs that return strong references instead of borrowed references.
malloc_closure: use a global mutex
Moved mutable state to uint8 flags synchronized via atomic operations.
b_complete_struct_or_union: Added critical section.
Make the test suite runnable under pytest-run-parallel and sets up pytest-run-parallel Linux, Mac, and Windows CI as well as a run using a TSAN-instrumented build running on Python 3.14.

Open questions

More docs?
The ctypes module in CPython isn't thread-safe in 3.13t. Should we worry about the ctypes backend on 3.13t?
More explicit multithreaded tests?

cc @kumaraditya303 @colesbury

@colesbury also wants to do another round of code review on the final version of this PR.

fix various errors and warnings seen on clang and gcc

…d-flag Add CT_UNDER_CONSTRUCTION to indicate that a type is being mutated

(Kumar will fix this soon!)

Move CT_CUSTOM_FIELD_POS and CT_WITH_PACKED_CHANGE to ct_flags_mut

Replace ct_is_hidden with asserts

fix headers to avoid duplicated definitions

…dsafe fix thread safety of `_realize_c_struct_or_union`

Co-authored-by: Nathan Goldbaum <[email protected]>

ngoldbaum · 2025-07-24T13:07:12Z

OK great, please feel free to push directly to this PR branch if that's convenient for you. I'll also try to reproduce what you were seeing with the test extensions.

Having a per-test check in pytest-run-parallel like you describe is probably a good idea. We could add it to the test executor, which is already wrapped in a try/finally.

Of course that doesn't help if the GIL gets re-enabled in a subprocess, which might be what's happening here.

ngoldbaum · 2025-07-24T15:11:42Z

Looking at several of the cases from 1), it feels like there needs to be an extension kwarg (or some other knob) to force Py_GIL_DISABLED to be defined (or explicitly undefined) for a given generated extension, rather than relying solely on whatever preprocessor state happens to be floating around when we build an extension (kinda like all the effort that goes into deciding whether or not to set Py_LIMITED_API).

This wording is making me think that maybe you're on Windows and are using the Python.org python installer, which unfortunately has a pretty major issue that won't be fixed: python/cpython#127294.

In short, if you install both the free-threaded and GIL-enabled interpreter, they will share a Python.h and site-packages folder, which can cause issues like you describe above. In particular, on Windows, the build for a free-threaded extension has to manually pass in Py_GIL_DISABLED, since Python.h doesn't set it.

The Windows Python.org installer is the only distribution that has this issue - if you install Python via the new PyManager installer or nuget, the free-threaded interpreter and GIL-enabled interpreter won't share an overlapping installation.

ngoldbaum · 2025-07-24T17:19:08Z

I looked at the extensions created by the tests and it looks like they all set Py_MOD_GIL_NOT_USED, at least on my Mac testing environment. I'm also unable to reproduce the GIL being re-enabled. It looks like it's not happening on any of the CI runs either, unless I'm missing something.

I spent a little time drilling down into the extensions generated by recompiler.py, but ultimately all the modules created there are modified versions of the "main" _cffi_backend module, which has Py_MOD_GIL_NOT_USED set.

@colesbury tells me he thinks the most likely result of not having Py_GIL_DISABLED set is a crash, rather than the GIL being re-enabled.

I'll wait until I have more info to proceed further.

colesbury · 2025-07-24T17:27:05Z

I'm not sure why @nitzmahone is seeing GIL enabled warnings. I don't see any locally and I don't see any in the GitHub CI logs. Could you have some unrelated or stale modules in the same directory that are pulled in by pytest?

The Windows explanation doesn't make sense to me. First, setuptools passed the necessary Py_GIL_DISABLED compiler flag to work around the Windows installer issue that @ngoldbaum mentioned. Even if that didn't happen, a missing Py_GIL_DISABLED flag when compiling will lead to crashes on import due to incorrect struct definitions; I don't think you'd see GIL enabled warnings.

@nitzmahone - when you get a chance to look at it more, would you please share the relevant logs?

nitzmahone · 2025-07-24T18:07:55Z

The issues I've been seeing were all from manual poking around on my Linux dev box (Fedora 42 x86_64) with a bone-stock build from pyenv install 3.14.0rc1t.

Most of the test extensions are being correctly built with Py_GIL_DISABLED, which was why I hacked in the more granular per-test pre/post check (which currently has to be run sans worker threads to blame the correct test). I'd just started digging into the why last night when I ran out of time. From my past experience with it, CFFI's test suite is notoriously "leaky", and there are a number of one-off tests sprinkled around that do things ... differently. It's also complicated by so many operational modes- I've fixed some tests in the past that were directly (or indirectly) invoking the wrong Python and/or config, so it's possible that's what's happening here. I was just disabling the individual tests that were re-enabling the GIL last night to come up with the full list, so I haven't dived into the why yet.

Now that I'm less pudding-brain, I'll go back over it with fresh eyes and make sure I'm not getting tripped up by random stale non-t test build artifacts or something else. I might also temporarily switch to an xdist/forked run model like we do for Ansible- getting rid of intra-test interpreter state leakage makes it a lot easier to pinpoint problems.

More to come...

nitzmahone · 2025-07-24T21:12:13Z

Grr, sorry for the fire-drill- I don't know exactly where it was coming from because I just blew it all away, but must've been some cached intermediate build cruft from previous local/manual test runs where the extension init GIL opt-out wasn't occurring.

I'm just validating a tweak to the CIBW test config to force warning-as-error on GIL-re-enable for all the t targets- once I'm sure it's behaving correctly, I'll add a commit to this PR, then assuming it's all green we can merge and do a release.

Thanks all!

nitzmahone · 2025-07-24T23:33:00Z

Hrm, after kicking off a few manual workflow runs with all targets enabled to simulate a release, the Linux and MacOS 3.13t parallel runs (skipped for PRs) are both segfaulting in CI in exactly the same place (testing/cffi1/test_pkgconfig.py) where 3.14t is fine.

I'm able to very reliably repro the segfault locally on Fedora 42's packaged build of 3.13.5t with:

pytest testing/cffi1/test_re_python.py --parallel-threads=3  # 2 breaks sometimes, >=3 breaks quite reliably

where various similar permutations seem fine on 3.14.0rc1t.

I'm not opposed to proceeding with merge/beta without fixing this, but I would really like to at least understand what's going on there. I need to hang it up for the day pretty soon, and I'm unavailable on Friday morning. If someone else that can repro this locally wants to try and catch it in the act against a debug build, cool, otherwise I'll take a stab at it Friday afternoon.

ngoldbaum · 2025-07-25T01:14:19Z

Argh! So close...

I didn't see this when I triggered the full CI back in June: https://github.com/ngoldbaum/cffi-ft/actions/runs/15769464108. I Just double-checked and that run did include the commit that makes sure the GIL doesn't get re-enabled in the tests, so that test run did pass on 3.13t with the GIL disabled.

Too bad I didn't think to redo that exercise in the month since then. Apologies for the oversight.

I ran the tests tonight using 3.13.5t on my Mac and I see similar results (TSAN reports and random test failures, although no segfaults).

We did some updates to the PR after I triggered that full CI run - in particular we adjusted the locking strategy to do something a lot simpler - just use a single critical section on a single globally allocated dummy PyObject. On Python 3.14, this acts more-or-less like a sort of "per-library GIL". On Python 3.13 there are different semantics for when critical sections can be suspended and I suspect that's the source of the differences here. That's just my guess anyway.

I pinged @kumaraditya303 to help track this down. Hopefully he has time to look at this tomorrow and there is something straightforward we can do, but it may take a little time to work out the correct fix.

Just a thought - maybe we can disable pytest-run-parallel CI for the wheel builds on 3.13t while we work out these issues, just to unblock people who want to enable builds downstream. We can note in the release notes that there are known thread safety issues on 3.13t.

mattip · 2025-07-25T01:36:36Z

I wouldn't want the first release to go out with known problems, hopefully it will be straightforward to fix this.

nitzmahone · 2025-07-25T05:42:03Z

Yeah, I'd prefer to have the beta working without caveat- hopefully there's a quick fix once we figure out exactly what's going on. If we can't get an easy fix, well, we can cross that bridge if we come to it.

Painfully slow as it is, it'd probably be wise to at least temporarily enable the full test suite for a Windows target or two as well- IIRC I had the whole thing passing for either early 3.14 alpha or late 3.13 pre-relase (only tested with GIL-ful threading), so not sure what its current operational state is. I'll pick that up Friday afternoon as well.

ngoldbaum · 2025-07-25T13:54:54Z

@kumaraditya303 did some digging and we're now pretty confident the issue is what I described above: recursive critical sections can be suspended more often in 3.13 than they could be in 3.14. This comes down to this change: python/cpython#128126. While that is billed as a performance improvement, it also has a side-effect of making recursive critical sections behave more like recursive locks in 3.14 than they did in 3.13.

We now think the clearest path forward is probably just to not try to support 3.13t in CFFI, at least not at first. If it turns out that 3.13t support is critical for whatever reason, we can add support later, but it'd be a shame to not ship 3.14t support using our current approach when we know that works fine.

Another reason not to support 3.13t is there are also races in CPython internals that are triggered in the CFFI test suite (see the gist I shared yesterday). Some of them are coming from PyDict internals -- CFFI uses python dictionaries heavily in its implementation -- and as I understand it the fixes for these bugs probably won't be backported to 3.13.

If we do need to support 3.13t then we probably need to replace the CFFI_LOCK critical section with a recursive mutex. The problem there is there isn't an obvious, portable choice to use in C. In NumPy we solved a similar issue by using C++ standard library features but we probably can't do that here. The Python C API doesn't expose a recursive mutex.

If anyone really does need to use 3.13t, they can, they'll just need to deal with the thread safety issues. You can work around them by initializing types in the main thread before spawning worker threads. Obviously that's not good and CFFI shouldn't officially support that, but 3.13t is experimental as-is and anyone using it can be expected to go a little out of their way to get things working.

@nitzmahone @mattip if you're ok with not supporting 3.13t then I'll go ahead and remove the 3.13t CI and wheel builds from this PR.

ofek · 2025-07-25T14:52:51Z

As a user, we are okay with only 3.14 support.

nitzmahone · 2025-07-25T19:16:12Z

Yeah, I'm fine with that. It's not ideal, but IIUC the experimental label wasn't retroactively removed from 3.13t- 3.14t is the first release that's been "blessed" for production use. If 3.13t support would complicate things with bespoke/non-portable sync primitives, I'm all for skipping it.

Are you thinking that CFFI 2.0+ should explicitly refuse to build against 3.13t, or just that it's documented as "YMMV" and that we won't offer wheels for it?

ngoldbaum · 2025-07-25T19:32:48Z

If 3.13t support would complicate things with bespoke/non-portable sync primitives, I'm all for skipping it.

Awesome! I'll work on this now.

Are you thinking that CFFI 2.0+ should explicitly refuse to build against 3.13t, or just that it's documented as "YMMV" and that we won't offer wheels for it?

I can do it either way. I think maybe let's be conservative and add a check to the setup.py file that bails with a RuntimeError saying 3.13t is unsupported and free-threaded support starts in 3.14. Since there will only be an sdist available on 3.13t, anyone with a build or runtime dependency on CFFI will continue to see build errors in 3.13t.

To make that concrete, right now you see errors if you try to install something that depends on CFFI due to trying to use the limited API. Here's the output of pip install cryptography today: https://gist.github.com/ngoldbaum/cf271f81ef13802e1575d3770ed0928c

If I add a sys.version_info check to the cffi setup.py and force cryptography to use my local clone of CFFI as a build dependency, I see this output instead, with a much more friendly error message that explains what the problem and solution are: https://gist.github.com/ngoldbaum/232efb3b79ebd4c69a376be1b3ffbddb.

Someone can patch that error away and the build will succeed, but then they know they're doing unsupported things.

Does that sound reasonable?

ngoldbaum · 2025-07-25T19:50:46Z

See latest commits. I also kicked off a "full" CI run on my fork: https://github.com/ngoldbaum/cffi-ft/actions/runs/16530469003

ngoldbaum · 2025-07-25T20:35:37Z

It looks like everything is passing. Still waiting on PPC64le and s390x but that will take a while since they're emulated. Maybe I should re-enable Windows too?

nitzmahone · 2025-07-25T23:07:18Z

Yeah, that all looks great- I'm ready to merge if you are.

ngoldbaum · 2025-07-25T23:10:21Z

Let's ship it!

ngoldbaum · 2025-07-26T14:11:43Z

Also if you need any support in any tasks for shipping the beta or final release, feel free to ping me 😀

alex · 2025-07-28T14:05:48Z

Is anything else required to merge? (Or did we livelock here :D)

nitzmahone · 2025-07-28T17:41:54Z

Nah, just woke up to lots of other things on fire this morning- merging now...

alex · 2025-07-28T17:54:32Z

Woooo! Huge thank you to everyone who made this happen!

colesbury and others added 30 commits March 12, 2025 13:46

Initial support for free threaded Python (#1)

c06e9e1

fix various errors and warnings seen on clang and gcc

b996eab

use manylinux 2_28 on x86_64 (this will be the default in manylinux 3)

0a6b216

Merge pull request Quansight-Labs#4 from ngoldbaum/fix-warnings

1101273

fix various errors and warnings seen on clang and gcc

split CT_LAZY_FIELD_LIST into a new listing of ct_flags_mut flags

92a4fff

Add CT_UNDER_CONSTRUCTION to indicate that a type is being mutated

f4baa7a

adjust assert

7153808

only set CT_UNDER_CONSTRUCTION during lazy init

1d70aa7

Merge pull request Quansight-Labs#5 from ngoldbaum/mutable-constructe…

b26c421

…d-flag Add CT_UNDER_CONSTRUCTION to indicate that a type is being mutated

Move CT_CUSTOM_FIELD_POS and CT_WITH_PACKED_CHANGE to ct_flags_mut

a33cf8e

rearrange and add asserts in do_realize_lazy_struct

e33ecc5

remove unhelpful comment

bc7f192

(Kumar will fix this soon!)

replace ct_is_hidden with asserts

2344819

Merge pull request Quansight-Labs#6 from ngoldbaum/mutable-flags

90fd95f

Move CT_CUSTOM_FIELD_POS and CT_WITH_PACKED_CHANGE to ct_flags_mut

clarify comment

a93b351

Merge pull request Quansight-Labs#7 from ngoldbaum/rm-is-hidden

3ad732d

Replace ct_is_hidden with asserts

fix thread safety of _realize_c_struct_or_union

62625c1

fix missing unlock

c8a478a

mark tests as thread unsafe

c66f5af

mark TestZIntegration as thread unsafe

35bea7e

Merge pull request Quansight-Labs#11 from kumaraditya303/up

95d9835

fix headers to avoid duplicated definitions

53507aa

Merge pull request Quansight-Labs#12 from kumaraditya303/headers

bda2a3e

fix headers to avoid duplicated definitions

fix thread safety of enum (Quansight-Labs#10)

19dd4c6

Merge pull request Quansight-Labs#9 from kumaraditya303/realize-threa…

be992cb

…dsafe fix thread safety of `_realize_c_struct_or_union`

fix thread safety of cffi (Quansight-Labs#13)

c053f17

Co-authored-by: Nathan Goldbaum <[email protected]>

add trove classifier

bb65e0b

add missing CI jobs

afd03eb

back out unnecessary CI config changes

03bbac3

pin cibuildwheel<3 in CI config

631fecd

ngoldbaum added 3 commits July 25, 2025 13:47

error on free-threaded Python 3.13

41f19a8

drop 3.13t CI and wheel builds

ed8c1d3

drop cibuilwheel ~=3.0b1

7730052

delete now-unnecessary CIBW_ENABLE env var

fa20e28

nitzmahone merged commit 7ed073d into python-cffi:main Jul 28, 2025
32 checks passed

ambv mentioned this pull request Jul 28, 2025

Add support for building ARM64 iOS wheels on CI #181

Open

ofek mentioned this pull request Aug 3, 2025

pip install hatch fails on free-threaded build pypa/hatch#1801

Open

pitrou mentioned this pull request Aug 18, 2025

GH-47256: [Python] Pin cffi in free-threaded builds apache/arrow#47313

Draft

Add support for the free-threaded build #178

Add support for the free-threaded build #178

Uh oh!

Conversation

ngoldbaum commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Open questions

Uh oh!

ngoldbaum commented Jul 24, 2025

Uh oh!

ngoldbaum commented Jul 24, 2025

Uh oh!

ngoldbaum commented Jul 24, 2025

Uh oh!

colesbury commented Jul 24, 2025

Uh oh!

nitzmahone commented Jul 24, 2025

Uh oh!

nitzmahone commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nitzmahone commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngoldbaum commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattip commented Jul 25, 2025

Uh oh!

nitzmahone commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngoldbaum commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ofek commented Jul 25, 2025

Uh oh!

nitzmahone commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngoldbaum commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngoldbaum commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngoldbaum commented Jul 25, 2025

Uh oh!

nitzmahone commented Jul 25, 2025

Uh oh!

ngoldbaum commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngoldbaum commented Jul 26, 2025

Uh oh!

alex commented Jul 28, 2025

Uh oh!

nitzmahone commented Jul 28, 2025

Uh oh!

Uh oh!

alex commented Jul 28, 2025

Uh oh!

Uh oh!

ngoldbaum commented Jun 19, 2025 •

edited

Loading

nitzmahone commented Jul 24, 2025 •

edited

Loading

nitzmahone commented Jul 24, 2025 •

edited

Loading

ngoldbaum commented Jul 25, 2025 •

edited

Loading

nitzmahone commented Jul 25, 2025 •

edited

Loading

ngoldbaum commented Jul 25, 2025 •

edited

Loading

nitzmahone commented Jul 25, 2025 •

edited

Loading

ngoldbaum commented Jul 25, 2025 •

edited

Loading

ngoldbaum commented Jul 25, 2025 •

edited

Loading

ngoldbaum commented Jul 25, 2025 •

edited

Loading