Skip to content

Fix QUBO local-minimum polish#32

Open
bernalde wants to merge 3 commits into
mainfrom
fix/issue-19-local-minima
Open

Fix QUBO local-minimum polish#32
bernalde wants to merge 3 commits into
mainfrom
fix/issue-19-local-minima

Conversation

@bernalde

@bernalde bernalde commented May 15, 2026

Copy link
Copy Markdown
Member

Summary

  • Add a bounded branch-and-bound polish step for QUBO matrix solves so DMRG samples that land in local minima can be improved to a better bitstring.
  • Expose polish controls through direct minimize keywords and JuMP raw optimizer attributes: polish, polish_max_variables, and polish_time_limit.
  • Merge current main into this branch, adopt QUBODrivers 0.6 attribute syntax, and keep main's single-variable solve path from minimize/maximize crash on single-variable (1x1) QUBO inputs #49.
  • Document that a successful polish rebuilds the returned Solution as a single polished bitstring distribution.
  • Add regression coverage for issue Getting stuck on local minima #19, using uncapped helper coverage for the exact 11.7099 result and bounded public-path coverage that asserts no regression from the incumbent.

Tests run

  • git diff --check --cached
    • Result: passed.
  • julia +1.12 --project=. -e 'using Pkg; Pkg.instantiate()'
    • Result: passed.
  • julia +1.12 --project=. -e 'using Test, Random, LinearAlgebra, TenSolver; include("test/utils.jl"); include("test/qubo.jl"); include("test/jump.jl")'
    • Result: QUBO passed (QUBO Correctness | 89/89); test/jump.jl then failed because JuMP is only available in the package test environment, not the base project.
  • julia +1.12 --project=. -e 'using Pkg; Pkg.test()'
    • Result: passed.
  • julia +1.12 --project=docs/ -e 'using Pkg; Pkg.instantiate()'
    • Result: passed.
  • julia +1.12 --project=docs/ docs/make.jl
    • Result: passed. Documenter skipped deployment because this was not running in CI.

Closes #19

@bernalde bernalde force-pushed the fix/issue-19-local-minima branch from c650b7d to 8ba41c9 Compare May 15, 2026 00:19

@bernalde bernalde left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found two blocking issues: the new polish phase is not bounded by the public solve time limit, and the issue regression bypasses the public solve path it is meant to protect. I would not merge this until those are addressed.

Comment thread src/solver.jl
x = sample(dist)
optimal = obj(x)

if !isnothing(postprocess)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: time_limit is only checked inside the DMRG loop, then polish runs with its own default 1s budget even if the caller's solve budget is already exhausted. This also affects the JuMP path because MOI.TimeLimitSec() maps to time_limit in QUBODrivers.sample, but the default polish_time_limit still runs afterwards. Users relying on time limits can now exceed them by up to the polish budget on every <=36 variable QUBO. Please cap the polish budget by the remaining solve time for finite time_limit, or skip polish when no time remains, and add a small test covering that interaction.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in b59237b. Polish now receives the remaining overall solve budget and is skipped when no time remains; polish_time_limit is only an additional cap. Added test/qubo.jl coverage in polish respects solve time limit.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 27e3f11. The remaining solve budget cap is preserved after merging current main, and test/qubo.jl still covers skipping polish when no solve time remains.

Comment thread test/qubo.jl Outdated
c = 641.2245
x0 = zeros(Int, 36)

polished = TenSolver._branch_bound_qubo(Q, l, c, x0, dot(x0, Q, x0) + dot(l, x0) + c)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: This regression test calls the private helper directly, so it would still pass if minimize stopped wiring postprocess, if polish default changed, or if the returned Solution did not sample the polished bitstring. Issue #19 is about the public solve path getting stuck in local minima. Please add a public integration test that exercises TenSolver.minimize(Q, l, c; polish=true, ...) and asserts both the returned objective and the sampled solution objective. The helper-level test can stay as unit coverage.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in b59237b. The issue #19 regression now also exercises TenSolver.minimize(Q, l, c; polish=true, ...) and checks both the returned objective threshold and the sampled bitstring objective; the helper-level coverage remains.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 27e3f11. The public TenSolver.minimize(Q, l, c; polish=true, ...) regression remains and now asserts bounded improvement relative to the incumbent plus sampled objective consistency.

Comment thread src/solver.jl Outdated
The variance test determines whether DMRG converged to an eigenstate (not necessarily the ground state),
but is expensive to calculate.
- `polish :: Bool` - Run bounded branch-and-bound postprocessing for QUBO matrix inputs. Defaults to `true`.
- `polish_max_variables :: Int` - Maximum number of variables eligible for exact polishing. Defaults to `36`.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nonblocking: exact polishing is misleading while polish_time_limit can expire; _branch_bound_qubo returns the best improved incumbent found so far without indicating that optimality was proven. Please describe this as bounded or best-effort polishing unless the search completes, or expose proof status if the exactness guarantee matters.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in b59237b. The docstring and regression test wording now describe this as bounded polishing, and the docs state that the step returns the best improved incumbent found within the budget.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 27e3f11. The docstring keeps the bounded/best-improved wording for time-limited polish; exactness is only asserted in the uncapped helper test.

@bernalde

Copy link
Copy Markdown
Member Author

Follow-up pushed: b59237b (Address QUBO polish review feedback (#19)).

Main changes made:

  • Capped QUBO polish by the remaining overall solve time_limit; if no solve time remains, polish is skipped.
  • Kept polish_time_limit as an additional cap inside the remaining solve budget.
  • Added a regression test that verifies postprocessing is not called after the solve time budget is exhausted.
  • Added a public TenSolver.minimize(Q, l, c; polish=true, ...) issue Getting stuck on local minima #19 regression that checks the returned objective is at least as good as the reported 11.7099 value and that the sampled bitstring objective matches E.
  • Updated polish documentation/test wording from “exact” to “bounded” because time-limited polish can return the best improved incumbent found so far.

Tests run and results:

  • julia +1.12 --project=. -e 'using Test, Random, LinearAlgebra, TenSolver; include("test/utils.jl"); include("test/qubo.jl")'
    • Passed: QUBO Correctness | 81/81.
  • julia +1.12 --project=. -e 'using Pkg; Pkg.test()'
    • Passed: TenSolver tests passed.
  • julia +1.12 --project=docs/ docs/make.jl
    • Passed; local Documenter deployment was skipped because no CI environment was detected.
  • git diff --check
    • Passed.

Comments intentionally not addressed:

  • None.

Remaining risks or follow-up items:

  • Polish remains bounded/best-effort. If the branch-and-bound search exhausts its time budget, it may improve the incumbent without proving global optimality.

@bernalde bernalde requested a review from iagoleal May 15, 2026 20:42

@bernalde bernalde left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Senior maintainer review. Note: I am the PR author, so GitHub only permits a COMMENT event from me; a formal approval or change-request must come from another maintainer. No blocking issues found.

Diff verification: merge-base ca27880; git diff <merge-base>..HEAD --stat = 3 files, +345 / -3, matching the PR's reported totals.

Issue intent (#19): #19 reports DMRG returning different results per run and frequently missing the known optimum (11.7). This PR does not change DMRG's local-minima behavior; it adds a deterministic depth-first branch-and-bound (B&B) over the binary cube as a post-DMRG "polish" (default on, n <= 36, <= 1 s) for QUBO matrix inputs. For the 36-variable instance it recovers 11.7099. So the symptom is addressed for small problems by effectively replacing the tensor-network result with an exact classical search; the underlying DMRG convergence issue is untouched and the polish does nothing for n > 36 (the regime where a tensor solver actually matters). This is mitigation, not a root-cause fix.

Correctness of the B&B (verified, no bug): sign/convention handling is correct (_qubo_coefficients/_qubo_value; maximize negates Q,l,c then negates E; QUBODrivers sense factor applied after minimize). The lower bound partial + sum_i min(0, coeff_i) + negative_suffix is admissible with incremental coeff updates correctly restored; pruning is sound; depth is bounded with a node-count time guard (no infinite loop). The polished solution only replaces the incumbent when strictly better, so it cannot worsen the result.

Nonblocking

  1. Timeout makes the result nondeterministic/suboptimal and the optimality test flaky. The B&B aborts on time() >= deadline and returns the best incumbent so far; the optimum guarantee holds only if the search completes within polish_time_limit (default 1.0 s). On a slow/loaded machine a 36-var instance can time out mid-search, re-introducing the machine-dependent nondeterminism #19 complains about. test/qubo.jl:266 (@test E ≈ 11.7099) therefore depends on wall-clock speed and is flaky in CI. Fix: give the issue-19 test an explicit large/uncapped budget and assert exactness only there, or assert improvement (E <= incumbent) for the time-bounded path; document that polish is best-effort under the budget.
  2. polish = true by default changes UX/perf for every QUBO with n <= 36: each minimize/Optimizer solve now runs up to 1 s of B&B unconditionally and silently swaps the solver's output for a classical search result. Consider defaulting polish = false, or at least calling this out in docs/changelog.
  3. New polish* attributes are documented only in the minimize docstring; docs/src/examples.md (which demonstrates set_attribute for other attributes) is not updated.
  4. test/qubo.jl calls Random.seed!(1) on the global RNG mid-suite, which can perturb determinism of later tests in the same process. Prefer a local RNG or restore state.
  5. Latent empty-sum for n=1 in the B&B (sum(... for j in 1:n if j != i) errors on an empty generator). Not reachable today because _minimize short-circuits n==1 first, but _branch_bound_qubo is otherwise callable with n=1. Add init = 0.0.

Questions
6. Given the fix only mitigates for n <= 36 and leaves DMRG's local-minima behavior unchanged, should this be Closes #19 or Refs #19? I'd recommend Refs #19 and keeping the issue open for the DMRG convergence work.
7. The n==1 special case in _minimize plus its test is a separate fix folded into this PR — the same single-variable fix appears across several open PRs (#33, #34, #41). Correct and harmless, but out of scope for "local-minimum polish" and worth de-duplicating across the PRs.

Tests run (Julia 1.10.11, clean worktree on the PR head):

  • julia --project=. -e 'using Pkg; Pkg.instantiate(); Pkg.test()' — passed (Testing TenSolver tests passed): QUBO Correctness 81/81 (incl. the 3 new subtests), QUBODrivers.jl 130, Aqua.jl 11, VRP 2, HDF5 16, Doctests 1.
  • The issue-19 test reached 11.7099 within the 1 s budget on this machine, but per finding 1 that pass is timing-dependent.
  • The PR body's claim that 1.10 instantiation fails (manifest resolved on 1.12, missing OpenSSL_jll) did not reproduce — instantiate + test succeeded on 1.10.11.

Merge readiness: The B&B is correct, sign-safe, cannot worsen or loop, and the suite passes on the target Julia. Before merge, most importantly make the optimality test budget-independent (finding 1) to avoid CI flakiness; reconsider the polish = true default (2); document the new attributes (3); and decide Closes vs Refs #19 (6). The formal verdict must come from another maintainer.

Comment thread test/qubo.jl

@test !isnothing(polished)
E, x = polished
@test E ≈ 11.7099

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nonblocking (flaky test): this asserts exact optimality (E ≈ 11.7099), but the branch-and-bound aborts at polish_time_limit (default 1.0 s) and returns the best incumbent found so far. On a slow/loaded CI runner the search can time out mid-traversal and return a worse, machine-dependent value — re-introducing the nondeterminism issue #19 reports. Give this specific call an explicit large/uncapped budget and assert exactness only there, or assert improvement (E <= incumbent) for the time-bounded path.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 27e3f11. The exact 11.7099 assertion now uses time_limit=Inf; the time-bounded public path asserts E <= incumbent and sampled objective consistency.

@bernalde bernalde left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of PR #32 (head b59237b). Note: I am the PR author, so GitHub will not accept an APPROVE/REQUEST_CHANGES from me; this is posted as a COMMENT and the formal verdict must come from another maintainer.

Scope reviewed: diff against the merge-base of origin/main and the PR head (merge-base ca27880, old main). The reported totals (3 files, +345/-3) match exactly. Intent: the PR targets issue #19 (DMRG getting stuck in local minima, returning 14.65/19.47/21.50/85.54 instead of the true 11.7) by adding a bounded branch-and-bound polish step. On the merits the polish algorithm and its issue-#19 regression test are a faithful and correct fix for the right problem.

Blocking

  1. Stale against current main; will not merge or compile as-is. origin/main has advanced to v0.2.0 (tip 9a3dd5c) and GitHub reports this PR as CONFLICTING / DIRTY. Two independent conflicts:

a. src/TenSolver.jl @setup block. main was migrated to QUBODrivers 0.6 (PR #53/#54), whose attribute syntax is now Cutoff["cutoff"] :: Float64 = 1e-8, Verbosity["verbosity"] :: Int = 1, etc. This PR adds the new attributes in the OLD 0.5 string-only form ("polish" :: Bool = true), which no longer exists in main. After rebase these must be re-expressed in the new symbol form (e.g. Polish["polish"] :: Bool = true). The QUBODrivers.sample body in main was also restructured (it destructures _, psi = results.value, not energy, psi), so the polish keyword forwarding has to be re-applied against the new call site.

b. src/solver.jl single-variable handling. PR #49 (merged, commit b67de66) already added _single_variable_minimize plus the if length(sites) == 1 dispatch in _minimize. This PR re-implements the same single-variable case inline in _minimize and adds a @testset "Single variable" in test/qubo.jl. main already carries four single-variable testsets from #49. The single-variable portion of this PR is therefore superseded and conflicting and should be dropped on rebase. Note the two implementations also diverge behaviorally: main's _single_variable_minimize logs an iteration row, records energy/bond_dim into the Solution, and invokes the on_iteration callback; this PR's inline block returns empty logs and never calls on_iteration. Keeping main's version is the right outcome.

Why it matters: the PR as-is cannot be merged, and if force-resolved it would either regress the #49 single-variable behavior or fail to compile against QUBODrivers 0.6. Concrete fix: rebase onto current origin/main; remove the inline single-variable block and the duplicate "Single variable" testset (keep #49's); re-express the three polish attributes in the 0.6 @setup symbol form and re-thread polish/polish_max_variables/polish_time_limit through the restructured minimize(Q, l, b; ...) call; keep Closes #19 only if you intend auto-close on merge (it is the correct issue), otherwise this is fine as the PR fully addresses #19.

Nonblocking

  1. polish=true is on by default for every QUBO matrix solve with n <= polish_max_variables (36), adding up to polish_time_limit (1.0s) of branch-and-bound to each solve. This is a default behavior/latency change for all existing matrix-input users. It only ever improves the incumbent (the result is gated by polished_optimal < optimal - sqrt(eps)), so it is safe, but the new default cost should be called out in the changelog/release notes. Consider whether default-on is desired for the JuMP path where users may expect pure-DMRG timing.

  2. When polish improves the incumbent, the returned Solution is rebuilt as MPS(sites, string.(Int.(polished_x))), collapsing the DMRG distribution to the single polished bitstring. That is reasonable (it is a strictly better point), but it changes the sampling semantics of the returned psi from a distribution to a delta; worth a one-line doc note.

Questions

  1. polish_time_limit defaults to 1.0s and the n=36 worst case is not guaranteed to finish exhaustive B&B within that budget. For the issue-#19 instance CI confirms it completes and recovers 11.7099, but in general a timeout makes the polish a best-effort heuristic rather than a guarantee. Is 1.0s/36 vars the intended default envelope, and should the docstring state explicitly that polish may return early without proving optimality?

  2. The minimize(p::AbstractPolynomial) method does not pass a postprocess, so polish never runs for polynomial inputs (only the matrix path). Intended? If so, the polish docstring already scopes it to "QUBO matrix inputs", which is consistent, but confirm polynomial users are not expected to benefit.

Tests run

I could not execute the suite. The review sandbox denied julia script/-e execution and git checkout/fetch/-C even with the sandbox override (only julia --version and read-only git diff/show/log/worktree add succeeded). Fallback verification: (1) static analysis of the full head sources in a worktree at the head SHA; (2) CI: all 11 checks (gh pr checks 32) and the check-runs API confirm Julia 1/lts/pre on ubuntu/macOS/windows plus build/docs all succeeded against the exact head SHA b59237b. The PR description reports QUBO Correctness | 78/78 and Pkg.test() passing locally. Important caveat: this CI ran against the PR head, which was built on OLD main; it does NOT exercise the conflicts with current main described above. The branch-and-bound lower bound (min(0, coeff[i]) over remaining vars plus negative_suffix) is a valid never-overestimating relaxation, and the polish objective via _qubo_coefficients matches obj's dot(x,Q,x)+l'x+c for binary x; maximize negates Q/l/c consistently so polish stays correct under maximization.

Merge-readiness

Not mergeable in its current state: CONFLICTING against v0.2.0 main, with the single-variable portion already superseded by #49 and the @setup attributes written in pre-0.6 syntax. The core #19 polish feature is sound and well-tested and is worth landing after a rebase that (a) drops the duplicated single-variable block/test and (b) ports the polish attributes/keywords to the QUBODrivers 0.6 + restructured-sample main.

I would not merge this until the blocking issues above are addressed.

Comment thread src/TenSolver.jl Outdated
"eigsolve_krylovdim" :: Int = 3
"eigsolve_maxiter" :: Int = 1
"eigsolve_tol" :: Float64 = 1e-14
"polish" :: Bool = true

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking (conflict): these attributes are in the pre-0.6 string-only @setup form. Current main uses QUBODrivers 0.6 syntax (Cutoff["cutoff"] :: Float64 = 1e-8, Verbosity["verbosity"] :: Int = 1). On rebase, re-express as e.g. Polish["polish"] :: Bool = true, PolishMaxVariables["polish_max_variables"] :: Int = 36, PolishTimeLimit["polish_time_limit"] :: Float64 = 1.0, and re-thread the forwarding into main's restructured minimize(Q, l, b; ...) call (which now destructures _, psi = results.value).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 27e3f11. The polish attributes now use QUBODrivers 0.6 @setup syntax and the forwarding is kept in the current main sample path with _, psi = results.value.

Comment thread src/solver.jl
# Quantization
sites = ITensorMPS.siteinds(first, H; plev=0)

if length(sites) == 1

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking (superseded/conflict): main already handles the single-variable case via _single_variable_minimize + if length(sites) == 1 dispatch (merged in #49, commit b67de66). This inline block conflicts and is behaviorally weaker (no iteration log row, empty Solution logs, on_iteration not called). Drop this block (and the duplicate @testset "Single variable" in test/qubo.jl) on rebase and keep #49's implementation. The polish feature below is the part of this PR worth landing.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 27e3f11. The branch now keeps the existing _single_variable_minimize path from main and drops the duplicate branch-local single-variable testset.

Comment thread src/solver.jl

if polished_optimal < optimal - sqrt(eps(Float64))
optimal = polished_optimal
psi = MPS(sites, string.(Int.(polished_x))) |> device

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nonblocking: rebuilding psi from the single polished bitstring collapses the returned distribution to a delta. Correct (it's a strictly better point), but it changes the sampling semantics of psi; consider a one-line docstring note so callers relying on a sampled distribution are not surprised.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 27e3f11. The polish_time_limit docstring now notes that a successful polish makes the returned Solution sample the single polished bitstring instead of the original DMRG distribution.

@bernalde

Copy link
Copy Markdown
Member Author

Review comments addressed in 27e3f11d932175af697c61e3874ae165e2c83818.

Commits pushed:

Main changes:

  • Re-expressed the polish optimizer attributes with the QUBODrivers 0.6 @setup syntax and kept the current main sample path that destructures _, psi = results.value.
  • Dropped this branch's duplicate single-variable solver block and duplicate testset, keeping main's _single_variable_minimize implementation and stronger tests.
  • Kept the polish time-limit cap from the earlier follow-up and retained coverage that skips polish when no solve time remains.
  • Made the issue Getting stuck on local minima #19 exact assertion deterministic by giving the helper call an uncapped budget; the bounded public solve path now asserts improvement relative to its incumbent and validates the sampled objective.
  • Added the docstring note that a successful polish rebuilds the returned Solution as a single polished bitstring distribution.

Tests run:

  • git diff --check --cached passed.
  • julia +1.12 --project=. -e 'using Pkg; Pkg.instantiate()' passed.
  • julia +1.12 --project=. -e 'using Test, Random, LinearAlgebra, TenSolver; include("test/utils.jl"); include("test/qubo.jl"); include("test/jump.jl")': QUBO passed (QUBO Correctness | 89/89); test/jump.jl then failed because JuMP is available in the package test environment, not the base project.
  • julia +1.12 --project=. -e 'using Pkg; Pkg.test()' passed.
  • julia +1.12 --project=docs/ -e 'using Pkg; Pkg.instantiate()' passed.
  • julia +1.12 --project=docs/ docs/make.jl passed; Documenter skipped deployment outside CI.

Comments intentionally not addressed:

  • None of the Blocking comments were declined.

Remaining risks or follow-up:

  • Post-push CI is still running.
  • Polish remains bounded/best-effort when its time budget expires.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Getting stuck on local minima

1 participant