Curated results from personal bench_runner benchmarks. The hosts differ quite a bit:
- "tc1" and "tc2" are identical 4Gb RAM, Intel Core i3 6100T systems with 4Gb RAM (fairly old, low-powered), running Debian testing. These are dedicated machines and so noise is limited. Any variability between tc1 and tc2 is suspect/noise since they are identical machines.
- "Pi5" is a dedicated Rapsberry Pi with 8Gb RAM and an SSD for storage, running raspbian.
The plots are produced by bench_runner, which runs pyperformance benchmarks and tracks the results over time.
bench_runner
builds Python with --enable-optimizations
and
--with-lto=full
, but without BOLT or --with-tail-call-interp
. The
compiler versions are from Debian/Ubuntu packages. (Adding BOLT is on my
TODO list.)
Below are longitudinal timing results.
The plots named "Python 3.14.x vs. 3.13.0" compare performance across pyperformance benchmarks by taking the geometric mean of all the benchmarks. A result of 1.05x for a particular compiler means Python 3.14.x built with that compiler performs 5% faster than Python 3.13.0 built with the same compiler.
The plots named "Python 3.14.x (NOGIL) vs. 3.13.0" compare the same thing,
but with 3.14.x built with
--disable-gil
. The 3.13.0 it compares against
is not built with --disable-gil
. A result of 1.00x means that
free-threaded Python 3.14.x (built with that compiler) is just as fast as
regular Python 3.13.0 (built with the same compiler).
The plots named "Python 3.15.x vs 3.13.0" and "Python 3.15.x (NOGIL) vs. 3.13.0" are the same but compare "main" since the 3.14 branch was cut, which will be 3.15. For easy tracking of long-term changes it's still comparing against 3.13.0.
These plots show the effect of different ways to build Python at the same version. There are two versions of each plot, one for the 3.14 branch and one for 3.15 (main).
The plots named "Effect of free-threading" show the performance difference between free-threaded and gil-ful builds of the same Python version. A result of 0.95x means free-threaded Python is 5% slower than the same version of Python with the GIL.
The plot named "Effect of JIT" show the performance difference between a build with the experimental JIT and without it.
The plots named "Effect of gcc versions" and "clang versions" show the performance difference between the same Python, built the same way (without free-threading), using different compilers. The baseline is GCC 9.4 (GCC 11.3 for Pi5), because that's the compiler version used on Faster CPython's benchmarks. These plots can be a bit more noisy since the baseline is more of a moving target (Python compiled with GCC 9 getting faster makes the other compilers appear to get worse). For the most part, all the compiler versions I'm tracking seem pretty stable and of similar performance. The most notable outlier is clang 19, which has a known regression in computed-goto handling that significantly impacts Python.
These plots show longitudinal results for 3.14 with a few compiler versions,
with a separate plot per benchmark in the pyperformance
benchmark suite.
They compare the same thing as the longitudinal "Python 3.14.x vs. 3.13.0"
plot above: a result of 1.05x means that benchmark is 5% faster with 3.14.x
than 3.13.0. These individual plots are useful to evaluate the noisiness of
individual benchmarks, as well as identifying consistent changes
(performance improvements or regressions in Python). Be careful comparing
these plots by eye, as they use different Y axes!