ARM64 NUDUPL VDF path + cross-platform build/CI fixes #298

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open

hoffmang9 wants to merge 4 commits into main from nudupl-arm64-ci

+558 −81

.github/workflows/rust.yml

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -20,21 +20,43 @@ permissions:
  
    jobs:

      fuzz_targets:

        name: Run fuzzers

        name: Run fuzzers (${{ matrix.target }})

        runs-on: ubuntu-latest

        env:

          CARGO_PROFILE_RELEASE_LTO: false

        strategy:

          fail-fast: false

          matrix:

            target:

              - create_discriminant

              - prove

              - verify

              - verify_n_wesolowski

        steps:

          - uses: actions/checkout@v6

          - uses: dtolnay/rust-toolchain@nightly

          - name: Cache cargo registry + build artifacts

            uses: actions/cache@v4

            with:

              path: |

                ~/.cargo/bin

                ~/.cargo/registry

                ~/.cargo/git

                target

                rust_bindings/fuzz/corpus

              key: ${{ runner.os }}-rust-fuzz-${{ hashFiles('Cargo.lock') }}

          - name: Install cargo-fuzz

            run: cargo +nightly install cargo-fuzz

            run: |

              if ! command -v cargo-fuzz >/dev/null 2>&1; then

                cargo +nightly install cargo-fuzz --locked

              fi

          - name: Cargo fuzz

          - name: Cargo fuzz (${{ matrix.target }})

            run: |

              cd rust_bindings

              cargo fuzz list | xargs -I "%" sh -c "cargo +nightly fuzz run % -- -max_total_time=600 || exit 255"

              cargo +nightly fuzz run ${{ matrix.target }} -- -max_total_time=600

      lint:

        name: Lint

    @@ -110,7 +132,10 @@ jobs:
  
          - name: Install libclang-dev on Linux

            if: matrix.os.name == 'Ubuntu'

            run: sudo apt-get install libclang-dev -y

            run: |

              # Avoid transient 404s from stale apt indices / mirror lag.

              sudo apt-get update -y -o Acquire::Retries=3

              sudo apt-get install libclang-dev -y

          - name: Set up Rust

            uses: dtolnay/rust-toolchain@stable

.github/workflows/test.yaml

-Original file line number
+Diff line change
@@ Expand Up / @@ -17,7 +17,7 @@ jobs: @@
         strategy:
           fail-fast: false
           matrix:
-            os: [macos-13-intel, ubuntu-latest]
+            os: [macos-13-intel, macos-13-arm64, ubuntu-latest]
             config: [optimized=1, TSAN=1, ASAN=1]
         steps:
@@ Expand All / @@ -34,6 +34,8 @@ jobs: @@
         - name: Build vdf-client on Ubuntu
           if: startsWith(matrix.os, 'ubuntu')
           run: |
+            # Avoid transient 404s from stale apt indices / mirror lag.
+            sudo apt-get update -y -o Acquire::Retries=3
             sudo apt-get install libgmp-dev libboost-python-dev libpython3-dev libboost-system-dev build-essential -y
             cd src
             make ${{ matrix.config }} -f Makefile.vdf-client
@@ Expand All / @@ -54,7 +56,11 @@ jobs: @@
             echo "Running 2weso_test"
             ./2weso_test
             echo "Running prover_test"
-            ./prover_test
+            if [[ "${{ matrix.os }}" == ubuntu* ]]; then
+              ./prover_test
+            else
+              CHIAVDF_PROVER_TEST_FAST=1 ./prover_test
+            fi
         - name: Test vdf-client
           if: matrix.config != 'optimized=1'
@@ Expand All / @@ -73,7 +79,11 @@ jobs: @@
           run: |
             cd src
             echo "Running prover_test"
-            ./prover_test
+            if [[ "${{ matrix.os }}" == ubuntu* ]]; then
+              ./prover_test
+            else
+              CHIAVDF_PROVER_TEST_FAST=1 ./prover_test
+            fi
         - name: Benchmark vdf-client
           if: matrix.config == 'optimized=1'
@@ Expand Down @@

README.md

-Original file line number
+Diff line change
@@ Expand Up @@
     vdf_client is the core VDF process that completes the Proof of Time submitted
     to it by the Timelord. The repo also includes a benchmarking tool to get a
     sense of the iterations per second of a given CPU called vdf_bench. Try
-    `./vdf_bench square_asm 250000` for an ips estimate.
+    `./vdf_bench square_asm 250000` for an ips estimate on x86/x64 (phased/asm
+    pipeline). On non-x86 architectures, use `./vdf_bench square 250000` (NUDUPL).
     To build vdf_client set the environment variable BUILD_VDF_CLIENT to "Y".
     `export BUILD_VDF_CLIENT=Y`.
@@ Expand All @@
     Those tests will simulate the vdf_client and verify for correctness the produced proofs.
+    Note: `./prover_test` defaults to a long soak/stress run. Set
+    `CHIAVDF_PROVER_TEST_FAST=1` to run a short, CI-friendly correctness check.
+    ## Fuzzing
+    Fuzz targets live under `rust_bindings/fuzz`. The `prove` target includes an
+    iteration cap to avoid out-of-memory conditions in CI. If you want deeper
+    iteration coverage, raise the cap in `rust_bindings/fuzz/fuzz_targets/prove.rs`
+    after validating memory usage and exec/s on your runner.
     ## Contributing and workflow
     Contributions are welcome and more details are available in chia-blockchain's
@@ Expand Down @@

pyproject.toml

-Original file line number
+Diff line change
@@ Expand Up / @@ -32,7 +32,10 @@ before-build = "python -m pip install --upgrade pip" @@
     [tool.cibuildwheel.macos]
     build-verbosity = 0
-    before-all = "brew install gmp boost cmake"
+    before-all = """
+    brew --prefix --installed gmp >/dev/null 2>&1 || brew install gmp
+    brew install boost cmake
+    """
     before-build = "python -m pip install --upgrade pip"
     environment = {MACOSX_DEPLOYMENT_TARGET="13", SYSTEM_VERSION_COMPAT=0, BUILD_VDF_CLIENT="N"}
@@ Expand Down @@

rust_bindings/fuzz/fuzz_targets/prove.rs

-Original file line number
+Diff line change
@@ -1,9 +1,32 @@
     #![no_main]
     use chiavdf::prove;
-    use libfuzzer_sys::fuzz_target;
+    use libfuzzer_sys::{fuzz_target, Corpus};
-    fuzz_target!(|data: ([u8; 32], [u8; 100], u16)| {
+    // Fuzzing `prove()` with unbounded `iters` can explode memory usage and runtime.
+    // The cost of the underlying VDF prover is at least linear in `iters`, and in
+    // practice can become superlinear due to internal allocation patterns. We have
+    // observed OOM (exit 137) in CI when `iters` is allowed to reach the full `u16`
+    // range, so we cap it to keep fuzzing stable and high-throughput.
+    //
+    // Why 4096:
+    // - Large enough to exercise multiple loop iterations and proof paths beyond
+    //   "toy" counts, preserving meaningful coverage.
+    // - Small enough to keep inputs fast and avoid pathological allocations across
+    //   typical CI memory limits.
+    // - Selected empirically as a conservative upper bound given prior OOMs; it can
+    //   be raised later if measurements show steady memory and acceptable exec/s.
+    //
+    // If you want deeper iteration coverage, consider a separate stress target or
+    // a time/iteration-budgeted harness rather than unbounded fuzz inputs.
+    const MAX_ITERS: u64 = 4096;
+    fuzz_target!(|data: ([u8; 32], [u8; 100], u16)| -> Corpus {
         let (genesis_challenge, element, iters) = data;
-        prove(&genesis_challenge, &element, 1024, iters as u64);
+        let iters = iters as u64;
+        if iters > MAX_ITERS {
+            return Corpus::Reject;
+        }
+        prove(&genesis_challenge, &element, 1024, iters);
+        Corpus::Keep
     });

setup.py

-Original file line number
+Diff line change
@@ Expand Up / @@ -3,6 +3,7 @@ @@
     import shutil
     import subprocess
     import sys
+    from pathlib import Path
     from setuptools import Command, Extension, setup
     from setuptools.command.build import build
@@ Expand Down Expand Up / @@ -134,14 +135,19 @@ def build_extension(self, ext): @@
     build.sub_commands.append(("build_hook", lambda x: True))  # type: ignore
     install.sub_commands.append(("install_hook", lambda x: True))
+    # Wheel metadata generation on Windows can run with a non-UTF8 default encoding.
+    # Read `README.md` explicitly as UTF-8 so `long_description` is robust across runners.
+    _readme_path = Path(__file__).resolve().parent / "README.md"
+    _long_description = _readme_path.read_text(encoding="utf-8")
     setup(
         name="chiavdf",
         author="Florin Chirica",
         author_email="florin@chia.net",
         description="Chia vdf verification (wraps C++)",
         license="Apache-2.0",
         python_requires=">=3.9",
-        long_description=open("README.md").read(),
+        long_description=_long_description,
         long_description_content_type="text/markdown",
         url="https://github.com/Chia-Network/chiavdf",
         ext_modules=[CMakeExtension("chiavdf", "src")],
@@ Expand Down @@

src/Makefile.vdf-client

-Original file line number
+Diff line change
@@ -1,16 +1,32 @@
     UNAME := $(shell uname)
+    ARCH := $(shell uname -m)
     ifneq (,$(findstring clang, $(shell $(CXX) --version)))
     NOPIE = -fno-PIE
     else
     NOPIE = -no-pie
     endif
+    # macOS arm64 ignores -no_pie and warns; omit to avoid deprecation warnings
+    ifeq ($(UNAME),Darwin)
+    ifneq ($(filter $(ARCH),arm64),)
+    NOPIE =
+    endif
+    endif
     LDFLAGS += -flto $(NOPIE) -g
     LDLIBS += -lgmpxx -lgmp -pthread
     CXXFLAGS += -flto -std=c++1z -D VDF_MODE=0 -D FAST_MACHINE=1 -pthread $(NOPIE) -fvisibility=hidden
     ifeq ($(UNAME),Darwin)
     CXXFLAGS += -D CHIAOSX=1
+    # Homebrew (common on macOS) installs boost/gmp to /opt/homebrew or /usr/local
+    ifneq ($(wildcard /opt/homebrew/include/boost/asio.hpp),)
+    CXXFLAGS += -I/opt/homebrew/include
+    LDFLAGS += -L/opt/homebrew/lib
+    endif
+    ifneq ($(wildcard /usr/local/include/boost/asio.hpp),)
+    CXXFLAGS += -I/usr/local/include
+    LDFLAGS += -L/usr/local/lib
+    endif
     endif
     OPT_CFLAGS = -O3 -g
@@ Expand All / @@ -27,13 +43,20 @@ endif @@
     .PHONY: all clean
+    # Only x86_64 builds use the x86 asm objects
+    ifeq ($(ARCH),x86_64)
+    ASM_OBJS = asm_compiled.o avx2_asm_compiled.o avx512_asm_compiled.o
+    else
+    ASM_OBJS =
+    endif
     BINS = vdf_client prover_test 1weso_test 2weso_test vdf_bench
     all: $(BINS)
     clean:
     	rm -f *.o hw/*.o $(BINS) compile_asm emu_hw_test hw_test hw_vdf_client emu_hw_vdf_client
-    $(BINS) avx512_test: %: %.o lzcnt.o asm_compiled.o avx2_asm_compiled.o avx512_asm_compiled.o
+    $(BINS) avx512_test: %: %.o lzcnt.o $(ASM_OBJS)
     	$(CXX) $(LDFLAGS) -o $@ $^ $(LDLIBS)
     $(addsuffix .o,$(BINS)) avx512_test.o: CXXFLAGS += $(OPT_CFLAGS)
@@ Expand Down @@

src/avx512_integer.h

-Original file line number
+Diff line change
@@ Expand Up / @@ -123,6 +123,7 @@ void mpz_impl_set_mul( @@
         const mpz<expected_size_a, padded_size_a>& a,
         const mpz<expected_size_b, padded_size_b>& b
     ) {
+    #if defined(ARCH_X86) || defined(ARCH_X64)
         if (enable_avx512_ifma) {
             typename avx512_integer_for_size<expected_size_a>::i a_avx512;
             typename avx512_integer_for_size<expected_size_b>::i b_avx512;
@@ Expand All / @@ -132,7 +133,9 @@ void mpz_impl_set_mul( @@
             b_avx512=b;
             out_avx512.set_mul(a_avx512, b_avx512);
             out_avx512.assign(out);
-        } else {
+        } else
+    #endif
+        {
             mpz_mul(out._(), a._(), b._());
         }
     }
@@ Expand Down @@

src/callback.h

-Original file line number
+Diff line change
@@ Expand Up / @@ -2,6 +2,7 @@ @@
     #define CALLBACK_H
     #include "util.h"
+    #include "nudupl_listener.h"
     // Applies to n-weso.
     const int kWindowSize = 20;
@@ Expand Down Expand Up / @@ -32,13 +33,19 @@ class WesolowskiCallback :public INUDUPLListener { @@
             switch(type) {
                 case NL_SQUARESTATE:
                 {
+    #if defined(ARCH_X86) || defined(ARCH_X64)
                     //cout << "NL_SQUARESTATE" << endl;
                     uint64 res;
                     square_state_type *square_state=(square_state_type *)data;
                     if(!square_state->assign(mulf->a, mulf->b, mulf->c, res))
                         cout << "square_state->assign failed" << endl;
+    #else
+                    // Phased pipeline is x86/x64-only.
+                    (void)data;
+                    cout << "NL_SQUARESTATE unsupported on this architecture" << endl;
+    #endif
                     break;
                 }
                 case NL_FORM:
@@ Expand Down @@

src/chiavdf_profile.h

-Original file line number
+Diff line change
@@ -0,0 +1,40 @@
+    #ifndef CHIAVDF_PROFILE_H
+    #define CHIAVDF_PROFILE_H
+    #include <cstdint>
+    // This header centralizes optional profiling hooks used by `vdf.h` (driver) and
+    // hot-loop primitives like NUDUPL (`nucomp.h`). Everything is no-op unless:
+    // - `VDF_TEST` is enabled (VDF_MODE=1), and
+    // - the caller sets `chiavdf_nudupl_profile_sink` (and optionally enables timing).
+    struct chiavdf_nudupl_profile_stats {
+        // Outer-loop counts (from `repeated_square_nudupl`).
+        uint64_t iters = 0;
+        uint64_t reduce_calls = 0;
+        uint64_t reduce_skipped = 0;
+        uint64_t max_a_limbs = 0;
+        // Outer-loop timing (from `repeated_square_nudupl`).
+        uint64_t nudupl_form_time_ns = 0;
+        uint64_t reduce_time_ns = 0;
+        // Inner-loop breakdown (from `qfb_nudupl`).
+        uint64_t qfb_nudupl_calls = 0;
+        uint64_t b_negative = 0;
+        uint64_t branch_a_lt_L = 0;
+        uint64_t branch_a_ge_L = 0;
+        uint64_t gcdext_time_ns = 0;
+        uint64_t gcdext_s_eq_1 = 0;
+        uint64_t gcdext_s_ne_1 = 0;
+        uint64_t xgcd_partial_time_ns = 0;
+        uint64_t else_branch_time_ns = 0; // time spent in the a>=L branch overall
+    };
+    #if defined(VDF_TEST)
+    inline thread_local chiavdf_nudupl_profile_stats* chiavdf_nudupl_profile_sink = nullptr;
+    inline thread_local bool chiavdf_nudupl_profile_timing_enabled = false;
+    #endif
+    #endif // CHIAVDF_PROFILE_H

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARM64 NUDUPL VDF path + cross-platform build/CI fixes #298

Diff view

Diff view

There are no files selected for viewing

Uh oh!

Uh oh!

Uh oh!

ARM64 NUDUPL VDF path + cross-platform build/CI fixes #298

Are you sure you want to change the base?

ARM64 NUDUPL VDF path + cross-platform build/CI fixes #298

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Uh oh!

Uh oh!

Uh oh!