-
Notifications
You must be signed in to change notification settings - Fork 9
make default_optimizer thread safe #249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #249 +/- ##
==========================================
- Coverage 82.08% 76.19% -5.89%
==========================================
Files 13 13
Lines 586 584 -2
==========================================
- Hits 481 445 -36
- Misses 105 139 +34 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I'm guessing this makes a bunch of tests fail which rely on the previous behavior? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I made some notes in #248 (comment). I think a full fix should also deepcopy the optimizer in multipathfinder.
Basically, replace
Pathfinder.jl/src/multipath.jl
Line 176 in 20dd77e
| iter_optimizers = fill(optimizer, nruns) |
with
iter_optimizers = (deepcopy(optimizer) for _ in 1:nruns)
Can you also add a brief test that fails on main but would pass with this PR?
| const DEFAULT_LINE_SEARCH_CONSTRUCTOR = LineSearches.HagerZhang | ||
| const DEFAULT_LINE_SEARCH_INIT_CONSTRUCTOR = LineSearches.InitialHagerZhang |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep the constant name and just make it the constructor instead of the object.
| const DEFAULT_LINE_SEARCH_CONSTRUCTOR = LineSearches.HagerZhang | |
| const DEFAULT_LINE_SEARCH_INIT_CONSTRUCTOR = LineSearches.InitialHagerZhang | |
| const DEFAULT_LINE_SEARCH = LineSearches.HagerZhang | |
| const DEFAULT_LINE_SEARCH_INIT = LineSearches.InitialHagerZhang |
| linesearch=DEFAULT_LINE_SEARCH_CONSTRUCTOR(), | ||
| alphaguess=DEFAULT_LINE_SEARCH_INIT_CONSTRUCTOR(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| linesearch=DEFAULT_LINE_SEARCH_CONSTRUCTOR(), | |
| alphaguess=DEFAULT_LINE_SEARCH_INIT_CONSTRUCTOR(), | |
| linesearch=DEFAULT_LINE_SEARCH(), | |
| alphaguess=DEFAULT_LINE_SEARCH_INIT(), |
Pathfinder follows a continuous deployment model, so yes we'll merge this directly into main and immediately register a release. It's a non-breaking bug fix PR so just add a patch version bump. |
Seems to only be 2 tests: Pathfinder.jl/test/singlepath.jl Lines 34 to 35 in 20dd77e
Pathfinder.jl/test/multipath.jl Lines 38 to 39 in 20dd77e
Not certain if equality check would work here, but if not, just checking the optimizer is an LBFGS with the same |
|
Docs build failure is unrelated, and I'll fix in a separate PR |
|
Ah, okay! Will do. The new test would probably have to check that constructing a |
I think that makes sense for testing that this particular approach we're using now works, but it would be even better to have a test that was independent of our
The only way I can think of to test thread-safeness is with reproducibility. I'm thinking 2 tests:
We do have reproducibility tests here: Pathfinder.jl/test/multipath.jl Lines 67 to 82 in 20dd77e
My guess is that these are currently passing because the log-density is so trivial that very little time is spent in the linesearch, so the runs don't interfere with each other often.
I really appreciate it! Let me know if you'd like help with any of this. |
Right, there was also no issue in my example in the github issue for the parallel standard normal run. |
|
@nsiccha anything I can do to help out with this PR? |
|
@sethaxen, right, so sorry, I've been a lot busier than expected. I'll try to do it now :) |
|
Ah, I've been trying to construct a test that fails for Pathfinder.jl/src/multipath.jl Lines 181 to 182 in f4ca90d
Which is probably why few if any other people have run into this issue... I'll still finish the changes, but will only add a new test for a parallel |
Ah, yes, it seems we do catch that case! Okay good, this bug wasn't as severe as it initially looked.
Yes, that sounds like the right approach. Thanks! |
Yeah, indeed. In retrospect, if it had affected everyone, I guess it would have been discovered earlier. BTW, the reason why I was even using the parallel (single) I'm unsure, I eventually just stuck with the simple parallel |
The multi-pathfinder result object stores the individual single-path results, each of which stores the draws the chain generated, so you can always access those or, as you said, use the fit distribution for each path. You can also disable importance resampling with |
|
Makes sense! I was in the end also affected by julia-vscode/julia-vscode#3853, but wasn't aware at the time. And also wrapped the (single) pathfinder calls in another loop, retrying pathfinder until NUTS initialization worked for the returned (initialization) draw, which mainly means until the gradient evaluation does not error. Maybe this is actually something that Pathfinder should (optionally) check? That the log density (gradient) can be evaluated for the returned draws? |
Oh that's interesting, can you open an issue for this feature? |
Fixes #248 for me - I believe, I haven't actually run nor added any tests yet.
Also, maybe this shouldn't be just merged into main? How does this go, @sethaxen?