Skip to content

Fix numerical NAC factor-2 and unify energy-gap sign#160

Draft
karmachoi wants to merge 7 commits into
Open-Quantum-Platform:mainfrom
karmachoi:bugfix-num-NAC
Draft

Fix numerical NAC factor-2 and unify energy-gap sign#160
karmachoi wants to merge 7 commits into
Open-Quantum-Platform:mainfrom
karmachoi:bugfix-num-NAC

Conversation

@karmachoi

@karmachoi karmachoi commented May 31, 2026

Copy link
Copy Markdown
Contributor

Summary

OpenQP's overlap-based numerical NAC had two confirmed bugs in the Python assembly layer (pyoqp/oqp/library/single_point.py):

  1. Factor of 2NACME.nacme divided the antisymmetrized state overlap by dt instead of 2·dt. Since S − Sᵀ ≈ ±2·dt·d_ij, the Hammes–Schiffer–Tully convention requires 2·dt; the missing ½ made the derivative coupling 2× too large. (The Fortran get_dcv returns the raw S − Sᵀ and explicitly leaves the 2a denominator to the caller — so the backend was correct; only the Python layer was wrong.)
  2. Opposite gap signsnacme used gap[i,j] = E_j − E_i while numerical_nac used E_i − E_j, so the two paths reported h with opposite sign (+2·h_std vs −2·h_std).

Changes

  • NACME.nacme: /(self.dt)/(2.0 * self.dt) — restores the HST ½ at the source (corrects the FD oracle, the per-frame time-derivative coupling, and the export).
  • NAC.numerical_nac: unify the energy gap to gap[i,j] = E_j − E_i, matching nacme.
  • Treat d_ij = ⟨Ψ_i|∇Ψ_j⟩ (antisymmetric) as the canonical stored quantity and derive h_ij = (E_j − E_i)·d_ij (symmetric) from it; add a verify_nac_conventions guard at both storage sites to catch future gap-sign / symmetry regressions.

After the fix both paths yield dcv = d (antisymmetric) and nacv = h (symmetric, standard sign), agreeing element-wise.

Benchmark results

Full details in BENCHMARK_RESULTS.md. Two independent levels:

1. Exact convention proof — fd_nac_logic_check.py (no backend)

Synthetic overlaps from a known antisymmetric d exercise the exact assembly logic:

[PASS] numerical_nac dc == true d      rel=1.33e-17
[PASS] d antisymmetric / h symmetric (both paths)
[PASS] cross-path dc/h agree (sign+mag) rel=0.00e+00
[PASS] OLD numerical_nac dc == 2 x true d   ratio=2.0000
pair 1->2:  current(old)/hand = 2.0000   corrected(new)/hand = 1.0000

2. Live MRSF backend — fd_nac_benchmark.py

H₂O (C₂ᵥ, no spatial degeneracy), MRSF/BHHLYP/6-31G, O displaced along z, Δ-scan, single-threaded for bit-reproducibility:

The factor-2 (every element, every Δ): current/corrected = 2.000000 (min = max).

corrected vs the true HST d (= antisymmetric part of the FD), clean pairs (Δ=0.0025 Å):

pair gap (Ha) corrected corr/h_anti cur/h_anti
S1→S4 0.3937 -0.052552 1.0000 2.0000
S1→S6 0.8104 0.003511 1.0000 2.0000
S4→S6 0.4167 -0.023359 1.0000 2.0000

corrected is exactly antisymmetric (‖M+Mᵀ‖/‖M‖ = 0.00) at every Δ — a valid derivative coupling.

Note: the raw MRSF state overlap carries a symmetric off-diagonal component (basis-following/phase). The derivative coupling is defined as the antisymmetrized overlap, and both code paths antisymmetrize, correctly discarding it — which is why the physically-correct comparison (against the antisymmetric part) is exactly 1.0000, while a naive total (S⁺−S⁻)/2Δ reference is not.

⚠️ Gating before merge (external)

This fix halves the exported DCME/NACME (and back_door API) consumed by the external PyRAI2MD NAMD driver. OpenQP has no internal NAMD propagator, so the internal NAC is unambiguously corrected — but PyRAI2MD's expected denominator must be confirmed / de-compensated in lockstep before this affects production dynamics. Kept as draft until that is resolved.

Not validated here

An absolute cross-check of h_ij against an independent analytic NAC isn't possible in OpenQP today (analytic MRSF NAC is not implemented) — recommended as the next validation once that backend exists.

🤖 Generated with Claude Code


Generated by Claude Code


Source fork PR: karmachoi#3

Phase/root stability follow-up

Latest commit 3289ffd adds a gauge-stabilization layer before overlap-derived NACME is exported:

  • reorders current/displaced states by maximum absolute overlap, then sign-aligns assigned diagonals;
  • projects the aligned overlap to the nearest orthogonal matrix (polar/Procrustes alignment) before forming d_ij;
  • stores raw/aligned/polar overlaps plus phase/root diagnostics in mol.data;
  • applies a final antisymmetric projection to the full numerical NAC-vector tensor before deriving h_ij, so Cartesian NAC vectors obey d_ij = -d_ji and h_ij = h_ji;
  • adds backend-free tests for sign/permutation invariance, polar cleanup, non-strict diagnostics, and full NAC-vector state-pair cleanup.

Additional local validation after this commit:

python3 fd_nac_logic_check.py
# ALL CHECKS PASSED

python3 - <<'PYTEST'
# manually executed tests/test_nac_phase_alignment.py because pytest is not installed locally
# RUN test_numerical_nac_vector_cleanup_projects_state_pair_noise
# RUN test_phase_and_permutation_alignment_recovers_canonical_overlap
# RUN test_polar_projection_suppresses_symmetric_overlap_noise
# RUN test_stable_nac_is_invariant_to_random_state_signs
# RUN test_verify_nac_conventions_can_warn_without_asserting
# OK
PYTEST

python3 -m py_compile pyoqp/oqp/library/nac_utils.py tests/test_nac_phase_alignment.py
git diff --check

CI was re-triggered on the PR head and is currently queued.

Cheol Ho Choi and others added 5 commits May 31, 2026 05:40
…e change)

Investigation handoff for the overlap-based numerical NAC bug:
- NACME.nacme divides (S-S^T) by dt instead of HST 2*dt -> dc = 2*d_ij
- nacme and numerical_nac use opposite energy-gap signs
- d antisymmetric, h=(E_j-E_i)d symmetric (private analytic branch had both backwards)
- LiH MRSF run confirms factor 2.0000 -> 1.0000 after restoring 1/2
- NAMD propagator is external (PyRAI2MD); no internal 2x compensation
No production code modified; fix gated on convention sign-off + PyRAI2MD contract.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Restore the Hammes-Schiffer-Tully 1/2 at the source and standardize the
derivative-coupling / interstate-coupling conventions across both overlap-NAC
code paths.

- NACME.nacme: divide the antisymmetrized state overlap by (2*dt) instead of
  dt. The Fortran get_dcv returns the raw (S - S^T) ~ +/- 2*dt*d_ij and leaves
  the denominator to the caller; applying only /dt produced a derivative
  coupling 2x too large. This corrects both the FD oracle and the per-frame
  time-derivative coupling.
- NAC.numerical_nac: unify the energy gap to gap[i,j] = E_j - E_i, matching
  NACME.nacme. The two paths previously used opposite gap signs, so
  numerical_nac reported h with a flipped sign (-2*h_std vs +2*h_std).
- Treat d_ij = <Psi_i|grad Psi_j> (antisymmetric) as the canonical stored
  quantity and derive h_ij = (E_j - E_i)*d_ij (symmetric) from it; add a
  convention guard (verify_nac_conventions) at both storage sites to catch
  future gap-sign / symmetry regressions.

After the fix both paths yield dcv = d (antisymmetric) and nacv = h
(symmetric, standard sign), agreeing element-wise.

Note: the source fix changes the value OpenQP exports (DCME/NACME files and
the back_door API) to the external PyRAI2MD NAMD driver, halving the exported
time-derivative coupling. PyRAI2MD's expected denominator must be confirmed /
de-compensated in lockstep before this affects production dynamics.

Verification: fd_nac_logic_check.py exercises the exact assembly math on
synthetic overlaps built from a known d (no Fortran backend required),
confirming factor-2 removal (old ratio 2.0 -> new 1.0), d antisymmetry,
h symmetry, h == (E_j-E_i)*d, and cross-path sign/magnitude agreement.

https://claude.ai/code/session_0159ZCLgerRZ6JRj1gByyfdN
Add __pycache__/ and *.py[cod] to .gitignore so running the pyoqp code or
the verification scripts does not leave untracked bytecode artifacts.

https://claude.ai/code/session_0159ZCLgerRZ6JRj1gByyfdN
Add fd_nac_benchmark.py (H2O MRSF/BHHLYP/6-31G displacement scan through the
compiled Fortran backend) and BENCHMARK_RESULTS.md documenting exact numbers.

Results confirm the fix on real MRSF state overlaps:
- current/corrected = 2.000000 exactly for every element at every displacement
  (the factor-2 the fix removes);
- corrected equals the antisymmetric part of the central-difference overlap
  (the HST derivative coupling) exactly: corr/h_anti = 1.0000, cur/h_anti =
  2.0000 for all clean non-degenerate pairs and all step sizes;
- corrected is exactly antisymmetric (||M+M^T||/||M|| = 0) at every step;
- the raw overlap's symmetric component (basis-following/phase) is correctly
  discarded by antisymmetrization, as the HST convention requires.

Combined with the backend-free exact proof in fd_nac_logic_check.py, this
validates the factor-2 / gap-sign correction.

https://claude.ai/code/session_0159ZCLgerRZ6JRj1gByyfdN
@karmachoi

Copy link
Copy Markdown
Contributor Author

@claude Please review this draft PR as a non-blocking code review. Focus on correctness of the numerical NAC factor/sign fix, tests, and OpenQP portability. Do not modify files; leave concise actionable findings or say no blocking issues found.

@claude

claude Bot commented May 31, 2026

Copy link
Copy Markdown

Claude finished @karmachoi's task in 3m 50s —— View job


Code Review — Numerical NAC Factor/Sign Fix

Tasks:

  • Load tools
  • Read CLAUDE.md and repo guidelines
  • Read diff of single_point.py
  • Read benchmark/test scripts
  • Analyze correctness of factor-2 and sign fix
  • Post findings

Summary

The core math is correct and well-verified. No blocking issues. Four minor actionable items below.


Factor-2 fix (NACME.nacme, line 1016)

Verdict: correct.

The Fortran get_dcv (get_states_overlap.F90, lines 890–894) is unambiguous:

"This routine returns d_IJ^a * a = F − B. The denominator MUST be provided in subsequent calculations because it can be 2 * a in the case of second-order numerical differentiation."

F − B = S − Sᵀ ≈ 2·dt·d_ij for a single forward-step overlap, so the correct denominator is 2*dt. The old / self.dt was missing the factor of 2 that comes from the antisymmetrization of the Taylor expansion. The new / (2.0 * self.dt) restores it.

Gap-sign unification (NAC.numerical_nac, lines 1231–1233)

Verdict: correct.

Old code had e_i.reshape((1,-1)) and e_j.reshape((-1,1)), so gap = e_j − e_i produced E_i − E_j. New code flips the reshapes to match NACME.nacme (e_i col-vector, e_j row-vector), giving gap[i,j] = E_j − E_i. Both paths now share the same orientation and h_ij = (E_j−E_i)·d_ij is symmetric as required.

Benchmark coverage

fd_nac_logic_check.py exercises the exact Python assembly with synthetic overlaps — no backend required. The proof that the old path returned 2× the true d and that the new path returns 1× is clean and self-contained. The live MRSF benchmark (fd_nac_benchmark.py) confirms the same ratio on a real H₂O calculation. Coverage is sufficient for a numerical method PR.


Non-blocking findings

1. Shape inconsistency passed to verify_nac_conventions (nacme path)

single_point.py:1025–1026:

nac_matrix = nac_matrix[0:self.nac_dim, 0:self.nac_dim]   # sliced
verify_nac_conventions(dc_matrix, nac_matrix, label='nacme')  # dc_matrix is full size

Since self.nac_dim = self.nstate (line 963) this is harmless today, but if nac_dim were ever set smaller (e.g., pair-specific), the guard compares arrays of different shape and the norm ratio would be meaningless. Consider moving the guard before the slice, or slicing dc_matrix to match. Fix this →

2. assert disabled under python -O

verify_nac_conventions (lines 54–58) uses Python assert for both checks. assert is a no-op when Python is run with -O (optimized mode), silently stripping the production guard it's meant to provide. For a physics invariant that must hold unconditionally, raise RuntimeError(...) is safer. Fix this →

3. fd_nac_logic_check.py not wired into a test runner

This script has no backend dependency and covers the exact assembly logic. It would be a low-effort but high-value addition to the project's test suite (e.g., as a pytest entry point), ensuring the fix is regression-tested on CI. As-is it can be lost or drift from the production code over time. Fix this →

4. Implicit data contract between NACME.nacme export and nacme_wrapper reader

The comment at line 1225–1227 correctly notes that each .dcme file now contains (S − Sᵀ)/(2·dx). This relies on NACME.nacme exporting dc_matrix (not nac_matrix) via dump_data(...,, ...). If that export target ever changes, numerical_nac would silently apply the gap factor to the wrong quantity. A brief inline note (or assertion on the value range) coupling these two sites would prevent future drift.


PyRAI2MD gating note

The PR correctly identifies the external blocking concern: PyRAI2MD reads NACME/DCME output and its expected denominator must be confirmed before merge to main. The draft status is appropriate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants