make default_optimizer thread safe #249

nsiccha · 2025-05-13T09:07:03Z

Fixes #248 for me - I believe, I haven't actually run nor added any tests yet.

Also, maybe this shouldn't be just merged into main? How does this go, @sethaxen?

codecov · 2025-05-13T09:11:14Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.19%. Comparing base (20dd77e) to head (6971c21).

❗ There is a different number of reports uploaded between BASE (20dd77e) and HEAD (6971c21). Click for more details.

HEAD has 148 uploads less than BASE

Flag BASE (20dd77e) HEAD (6971c21)

154 6

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #249      +/-   ##
==========================================
- Coverage   82.08%   76.19%   -5.89%     
==========================================
  Files          13       13              
  Lines         586      584       -2     
==========================================
- Hits          481      445      -36     
- Misses        105      139      +34

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

nsiccha · 2025-05-13T09:17:39Z

I'm guessing this makes a bunch of tests fail which rely on the previous behavior?

sethaxen

Thanks for the PR! I made some notes in #248 (comment). I think a full fix should also deepcopy the optimizer in multipathfinder.

Basically, replace

Pathfinder.jl/src/multipath.jl

Line 176 in 20dd77e

iter_optimizers = fill(optimizer, nruns)

with
iter_optimizers = (deepcopy(optimizer) for _ in 1:nruns)

Can you also add a brief test that fails on main but would pass with this PR?

sethaxen · 2025-05-13T10:37:10Z

src/Pathfinder.jl

+const DEFAULT_LINE_SEARCH_CONSTRUCTOR = LineSearches.HagerZhang
+const DEFAULT_LINE_SEARCH_INIT_CONSTRUCTOR = LineSearches.InitialHagerZhang


Let's keep the constant name and just make it the constructor instead of the object.

Suggested change

const DEFAULT_LINE_SEARCH_CONSTRUCTOR = LineSearches.HagerZhang

const DEFAULT_LINE_SEARCH_INIT_CONSTRUCTOR = LineSearches.InitialHagerZhang

const DEFAULT_LINE_SEARCH = LineSearches.HagerZhang

const DEFAULT_LINE_SEARCH_INIT = LineSearches.InitialHagerZhang

sethaxen · 2025-05-13T10:38:26Z

src/Pathfinder.jl

+        linesearch=DEFAULT_LINE_SEARCH_CONSTRUCTOR(),
+        alphaguess=DEFAULT_LINE_SEARCH_INIT_CONSTRUCTOR(),


Suggested change

linesearch=DEFAULT_LINE_SEARCH_CONSTRUCTOR(),

alphaguess=DEFAULT_LINE_SEARCH_INIT_CONSTRUCTOR(),

linesearch=DEFAULT_LINE_SEARCH(),

alphaguess=DEFAULT_LINE_SEARCH_INIT(),

sethaxen · 2025-05-13T10:44:14Z

Also, maybe this shouldn't be just merged into main? How does this go, @sethaxen?

Pathfinder follows a continuous deployment model, so yes we'll merge this directly into main and immediately register a release. It's a non-breaking bug fix PR so just add a patch version bump.

sethaxen · 2025-05-13T10:48:16Z

I'm guessing this makes a bunch of tests fail which rely on the previous behavior?

Seems to only be 2 tests:

Pathfinder.jl/test/singlepath.jl

Lines 34 to 35 in 20dd77e

    
           @test result.optimizer === 
        
               Pathfinder.default_optimizer(Pathfinder.DEFAULT_HISTORY_LENGTH)

Pathfinder.jl/test/multipath.jl

Lines 38 to 39 in 20dd77e

    
           @test result.optimizer === 
        
               Pathfinder.default_optimizer(Pathfinder.DEFAULT_HISTORY_LENGTH)

Not certain if equality check would work here, but if not, just checking the optimizer is an LBFGS with the same m, linesearch and linesearch init types would be fine.

sethaxen · 2025-05-13T10:48:49Z

Docs build failure is unrelated, and I'll fix in a separate PR

nsiccha · 2025-05-13T10:56:50Z

Ah, okay! Will do. The new test would probably have to check that constructing a default_optimizer pre and post a pathfinder run results in identical-in-value but different-in-memory objects, if that makes sense. Will probably add this later today 👍

sethaxen · 2025-05-13T12:18:56Z

The new test would probably have to check that constructing a default_optimizer pre and post a pathfinder run results in identical-in-value but different-in-memory objects, if that makes sense.

I think that makes sense for testing that this particular approach we're using now works, but it would be even better to have a test that was independent of our default_optimizer but instead directly tested our invariants (things that we should be able to guarantee are true). Here our invariants would be:

if you didn't pass a stateful optimizer/log-density function and if your RNG is thread-safe, then pathfinder should be thread-safe.
multipathfinder with a thread-safe RNG and a multithreading executor should be thread-safe.

The only way I can think of to test thread-safeness is with reproducibility.

I'm thinking 2 tests:

Call pathfinder multiple times in a multi-threaded loop with identically seeded thread-safe RNG for a nontrivial model (maybe the banana) without specifying the optimizer. Verify that results (e.g. trace, trace gradient, log-density, and draws) are numerically identical (with ==)
Call multipathfinder with a user-constructed LBFGS (so they have a shared state), a thread-safe RNG, and PreferParallel executor. Reseed identically and re-run. Results should be identical.

We do have reproducibility tests here:

Pathfinder.jl/test/multipath.jl

Lines 67 to 82 in 20dd77e

    
           Random.seed!(rng, seed) 
        
           result2 = multipathfinder( 
        
               ℓ, ndraws; nruns, ndraws_elbo, ndraws_per_run, rng, executor 
        
           ) 
        
           @test result2.fit_distribution == result.fit_distribution 
        
           @test result2.draws == result.draws 
        
           @test result2.draw_component_ids == result.draw_component_ids 
        
           Random.seed!(rng, seed) 
        
           result3 = multipathfinder( 
        
               ℓ, ndraws; nruns, ndraws_elbo, ndraws_per_run, rng, executor 
        
           ) 
        
           for (c1, c2) in 
        
               zip(result.fit_distribution.components, result3.fit_distribution.components) 
        
               @test c1 ≈ c2 atol = 1e-6 
        
           end

.
My guess is that these are currently passing because the log-density is so trivial that very little time is spent in the linesearch, so the runs don't interfere with each other often.

Will probably add this later today 👍

I really appreciate it! Let me know if you'd like help with any of this.

nsiccha · 2025-05-13T13:38:01Z

My guess is that these are currently passing because the log-density is so trivial that very little time is spent in the linesearch, so the runs don't interfere with each other often.

Right, there was also no issue in my example in the github issue for the parallel standard normal run.

sethaxen · 2025-06-16T09:34:47Z

@nsiccha anything I can do to help out with this PR?

nsiccha · 2025-07-08T09:58:47Z

@sethaxen, right, so sorry, I've been a lot busier than expected. I'll try to do it now :)

nsiccha · 2025-07-08T14:50:56Z

Ah, I've been trying to construct a test that fails for multipathfinder, but was unable to. The reason being you already catching the multipathfinder+stateful optimizer case, see

Pathfinder.jl/src/multipath.jl

Lines 181 to 182 in f4ca90d

    
           # also support optimizers that store state 
        
           zip(_init, Iterators.map(deepcopy, iter_optimizers))

Which is probably why few if any other people have run into this issue...

I'll still finish the changes, but will only add a new test for a parallel pathfinder call without an explicitly passed optimizer, alright, @sethaxen?

sethaxen · 2025-07-14T08:08:03Z

Ah, I've been trying to construct a test that fails for multipathfinder, but was unable to. The reason being you already catching the multipathfinder+stateful optimizer case, see

Ah, yes, it seems we do catch that case! Okay good, this bug wasn't as severe as it initially looked.

I'll still finish the changes, but will only add a new test for a parallel pathfinder call without an explicitly passed optimizer, alright, @sethaxen?

Yes, that sounds like the right approach. Thanks!

nsiccha · 2025-07-14T08:51:56Z

this bug wasn't as severe as it initially looked.

Yeah, indeed. In retrospect, if it had affected everyone, I guess it would have been discovered earlier.

BTW, the reason why I was even using the parallel (single) pathfinder thing was because I wanted to initialize several chains for the same posterior in parallel, but I did not want all eventual initialization points to come from a single approximation, which AFAICT inevitably happens for high dimensions and importance resampling. I wanted to do something slightly more clever, but for that I'd needed to be able to match each draw to the approximation that generated it, but AFAICT that wasn't possible with multipathfinder. I guess in the end I could have used the fit_distributions from the PathfinderResult - I'm unsure why I didn't. Or actually, IIRC for some reason the multipathfinder method was actually much slower than my parallel pathfinder implementation?

I'm unsure, I eventually just stuck with the simple parallel pathfinder approach, which worked well for me (except for the race condition).

sethaxen · 2025-07-24T08:39:28Z

BTW, the reason why I was even using the parallel (single) pathfinder thing was because I wanted to initialize several chains for the same posterior in parallel, but I did not want all eventual initialization points to come from a single approximation, which AFAICT inevitably happens for high dimensions and importance resampling. I wanted to do something slightly more clever, but for that I'd needed to be able to match each draw to the approximation that generated it, but AFAICT that wasn't possible with multipathfinder

The multi-pathfinder result object stores the individual single-path results, each of which stores the draws the chain generated, so you can always access those or, as you said, use the fit distribution for each path. You can also disable importance resampling with importance=False. It still resamples with replacement but does not use importance weights to do so. Then the draw_component_ids field stores for each draw in draws the index of the single-path run that generated that specific draw.

nsiccha · 2025-07-24T08:54:39Z

Makes sense! I was in the end also affected by julia-vscode/julia-vscode#3853, but wasn't aware at the time.

And also wrapped the (single) pathfinder calls in another loop, retrying pathfinder until NUTS initialization worked for the returned (initialization) draw, which mainly means until the gradient evaluation does not error.

Maybe this is actually something that Pathfinder should (optionally) check? That the log density (gradient) can be evaluated for the returned draws?

sethaxen · 2025-07-24T09:01:10Z

Maybe this is actually something that Pathfinder should (optionally) check? That the log density (gradient) can be evaluated for the returned draws?

Oh that's interesting, can you open an issue for this feature?

make default_optimizer thread safe

6971c21

sethaxen requested changes May 13, 2025

View reviewed changes

		const DEFAULT_LINE_SEARCH_CONSTRUCTOR = LineSearches.HagerZhang
		const DEFAULT_LINE_SEARCH_INIT_CONSTRUCTOR = LineSearches.InitialHagerZhang

		linesearch=DEFAULT_LINE_SEARCH_CONSTRUCTOR(),
		alphaguess=DEFAULT_LINE_SEARCH_INIT_CONSTRUCTOR(),

make default_optimizer thread safe #249

Are you sure you want to change the base?

make default_optimizer thread safe #249

Uh oh!

Conversation

nsiccha commented May 13, 2025

Uh oh!

codecov bot commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

nsiccha commented May 13, 2025

Uh oh!

sethaxen left a comment

Choose a reason for hiding this comment

Uh oh!

sethaxen May 13, 2025

Choose a reason for hiding this comment

Uh oh!

sethaxen May 13, 2025

Choose a reason for hiding this comment

Uh oh!

sethaxen commented May 13, 2025

Uh oh!

sethaxen commented May 13, 2025

Uh oh!

sethaxen commented May 13, 2025

Uh oh!

nsiccha commented May 13, 2025

Uh oh!

sethaxen commented May 13, 2025

Uh oh!

nsiccha commented May 13, 2025

Uh oh!

sethaxen commented Jun 16, 2025

Uh oh!

nsiccha commented Jul 8, 2025

Uh oh!

nsiccha commented Jul 8, 2025

Uh oh!

sethaxen commented Jul 14, 2025

Uh oh!

nsiccha commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sethaxen commented Jul 24, 2025

Uh oh!

nsiccha commented Jul 24, 2025

Uh oh!

sethaxen commented Jul 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented May 13, 2025 •

edited

Loading

nsiccha commented Jul 14, 2025 •

edited

Loading