Skip to content

Add opt-in checkpointing to run_MC and run_GA#37

Open
jarinfrench wants to merge 5 commits into
IdahoLabResearch:mainfrom
jarinfrench:feature/checkpointing
Open

Add opt-in checkpointing to run_MC and run_GA#37
jarinfrench wants to merge 5 commits into
IdahoLabResearch:mainfrom
jarinfrench:feature/checkpointing

Conversation

@jarinfrench
Copy link
Copy Markdown
Collaborator

@jarinfrench jarinfrench commented Apr 23, 2026

Summary

  • Introduces GBOpt/Checkpoint.py with CheckpointStore (null-object pattern — disabled() absorbs all calls so minimizer loops need no None guards), CandidateCheckpoint (per-candidate result cache for GA iterations, with .iter{N}.json sidecars), and shared constants CHECKPOINT_SCHEMA_VERSION and ENERGY_PENALTY. Replaces the earlier GenerationCheckpoint.py.
  • Adds checkpoint_file, checkpoint_format ("json" | "pickle", default "json"), and checkpoint_interval (default 1) keyword-only parameters to MonteCarloMinimizer.run_MC and GeneticAlgorithmMinimizer.run_GA. Omitting checkpoint_file preserves existing behavior exactly (no files written).
  • If checkpoint_file exists on entry, the run resumes from the saved state. The checkpoint is kept on normal completion so runs can be extended by calling the method again with the same file.
  • Adds min_steps parameter to run_MC to prevent early convergence before a user-specified floor is reached.
  • Fixes a bug in run_GA where unique_id was assigned before the checkpoint was loaded, making the resume branch unreachable.
  • Checkpoints are written atomically (.tmpshutil.move). JSON checkpoints recursively convert numpy types and Path objects for safe serialization; RNG state serializes cleanly as a plain dict in both formats. A standardized envelope schema (schema_version, minimizer, progress_unit, progress_index, best_energy, rng_state, run_params, state) is shared by both minimizers.
  • Adds GBMinimizerError, GBMinimizerTypeError, and GBMinimizerValueError exception classes following the module exception hierarchy used elsewhere in GBOpt.
  • Adds tests/test_checkpoint.py (21 tests covering CheckpointStore and CandidateCheckpoint), tests/test_mcminimizer.py (9 tests), and updates tests/test_gaminimizer.py. Removes tests/test_generation_checkpoint.py.

Test Plan

  • pytest tests/test_checkpoint.py tests/test_mcminimizer.py tests/test_gaminimizer.py -v — all new and updated tests pass
  • pytest -m 'not known_bug and not slow' — no regressions in existing suite
  • Manual smoke test: run MonteCarloMinimizer.run_MC(max_steps=10, checkpoint_file="cp.json"), interrupt after a few steps, confirm cp.json exists with correct progress_index and nested state, re-run with same checkpoint_file and confirm the run resumes from where it stopped

@jarinfrench
Copy link
Copy Markdown
Collaborator Author

This depends on #35

@jarinfrench jarinfrench marked this pull request as draft April 23, 2026 20:16
@jarinfrench jarinfrench marked this pull request as draft April 23, 2026 20:16
@jarinfrench
Copy link
Copy Markdown
Collaborator Author

jarinfrench commented May 19, 2026

As-is, this PR is ready to be merged, but I would like to make this more general and make a Checkpoint class that can be used.

@jarinfrench jarinfrench force-pushed the feature/checkpointing branch 2 times, most recently from 2cbd362 to 16f991f Compare May 27, 2026 17:18
@jarinfrench jarinfrench marked this pull request as ready for review May 27, 2026 21:49
@jarinfrench
Copy link
Copy Markdown
Collaborator Author

I'll work on getting things checked in actual runs, but this should be ready for review now.

@jarinfrench
Copy link
Copy Markdown
Collaborator Author

I'll work on getting things checked in actual runs, but this should be ready for review now.

Manual testing shows that this works.

…inimizer

Adds tests_mcminimizer.py (9 tests) covering: no-checkpoint baseline,
completion cleanup, JSON and pickle checkpoint content, resume from
both formats, corrupted checkpoint error, invalid format error, and
checkpoint_interval behavior.

Adds TestGeneticAlgorithmMinimizerCheckpointing (7 tests) to
test_gaminimizer.py covering the same scenarios for run_GA, using a
mock _save_checkpoint to simulate mid-run crashes.

Closes IdahoLabResearch#36
@jarinfrench jarinfrench force-pushed the feature/checkpointing branch from 5917fef to 05b6819 Compare May 28, 2026 20:59
@jarinfrench
Copy link
Copy Markdown
Collaborator Author

Commit 0e2a195 was a clunky first-pass at the checkpointing idea, and the more robust implementation is in f24bb21.

@jarinfrench jarinfrench force-pushed the feature/checkpointing branch from 05b6819 to 05042c2 Compare May 28, 2026 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant