Skip to content

ARM64 NUDUPL VDF path + cross-platform build/CI fixes#298

Open
hoffmang9 wants to merge 4 commits intomainfrom
nudupl-arm64-ci
Open

ARM64 NUDUPL VDF path + cross-platform build/CI fixes#298
hoffmang9 wants to merge 4 commits intomainfrom
nudupl-arm64-ci

Conversation

@hoffmang9
Copy link
Member

@hoffmang9 hoffmang9 commented Feb 5, 2026

Summary

  • Enable a fast, maintainable ARM64 path for the main VDF loop using the C++ NUDUPL implementation (and guard x86-only pipeline/asm codepaths).
  • Improve cross-platform builds (macOS arm64, Linux aarch64, Windows) and CI robustness (apt update retries, Rust fuzz matrix + caching).
  • Document benchmark behavior on non-x86 and add a fast mode for prover_test to keep CI runtime reasonable.

Benchmark (NUDUPL)

On macOS arm64, ./vdf_bench square 5000000 reports ~292K iterations/sec (example run: Time: 17123 ms; speed: 292.0K ips).

Test plan

  • CI
  • Local: cd src && make optimized=1 -f Makefile.vdf-client && ./vdf_bench square 5000000

Note

High Risk
Changes the core VDF squaring loop selection and optimizes inner arithmetic routines with thread-local GMP scratch state, which can impact correctness/performance across architectures if any subtle math or lifecycle assumptions are wrong.

Overview
Adds a non-x86 VDF execution path using the C++ NUDUPL implementation and gates the phased/asm pipeline to ARCH_X86/ARCH_X64 only. This includes a new repeated_square_nudupl loop for ARM/non-x86, plus cross-arch guards in hot math code (vdf.h, vdf_bench.cpp, avx512_integer.h, callback.h) and build system tweaks so non-x86 builds don’t link x86 asm objects.

Optimizes and instruments NUDUPL hot loops by reusing thread-local GMP temporaries in qfb_nudupl and mpz_xgcd_partial, adding optional profiling hooks (chiavdf_profile.h), and improving portability (MSVC clz fallbacks). Test/CI behavior is adjusted with a CHIAVDF_PROVER_TEST_FAST mode for prover_test, expanded macOS arm64 testing, more robust apt installs, and Rust fuzzing changes (per-target matrix + caching + capped prove fuzz iterations to prevent CI OOM).

Written by Cursor Bugbot for commit 41fbbed. This will update automatically on new commits. Configure here.

Route ARM builds through the C++ NUDUPL squaring loop, guard x86-only codepaths,
and update build/CI tooling for better macOS arm64, Linux aarch64, Windows, and
workflow reliability. Also document benchmark and fast prover test usage.

uint64 actual_iterations=repeated_square_fast(square_state, f, D, L, num_iterations, batch_size, weso);
actual_iterations = repeated_square_fast(square_state, f, D, L, num_iterations, batch_size, weso);
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about reversing this as well
if defined(ARCH_X64)
do thing
else
do fallback thing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like what was done in vdf_bench.cpp

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will address the similar issue above which was left in for the same dubious reasons

Select asm objects only for x86_64 builds and invert the squaring path
selection so non-x86 targets use NUDUPL by default.
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants