-
Notifications
You must be signed in to change notification settings - Fork 36
Add enzyme to benchmark tests #1039
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1039 +/- ##
==========================================
- Coverage 82.34% 82.24% -0.11%
==========================================
Files 38 38
Lines 3949 3949
==========================================
- Hits 3252 3248 -4
- Misses 697 701 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Pull Request Test Coverage Report for Build 17931429932Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
:reversediff => ADTypes.AutoReverseDiff(; compile=false), | ||
:reversediff_compiled => ADTypes.AutoReverseDiff(; compile=true), | ||
:mooncake => ADTypes.AutoMooncake(; config=nothing), | ||
:enzyme => ADTypes.AutoEnzyme(; config=nothing), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't work :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't work how? probably we need to use adtypes in the same way as in adtests (currently its a constructor error)
("Smorgasbord", smorgasbord_instance, :typed, :enzyme, true), | ||
("Loop univariate 1k", loop_univariate1k, :typed, :mooncake, true), | ||
("Loop univariate 1k", loop_univariate1k, :typed, :enzyme, true), | ||
("Multivariate 1k", multivariate1k, :typed, :mooncake, true), | ||
("Multivariate 1k", multivariate1k, :typed, :enzyme, true), | ||
("Loop univariate 10k", loop_univariate10k, :typed, :mooncake, true), | ||
("Loop univariate 10k", loop_univariate10k, :typed, :enzyme, true), | ||
("Multivariate 10k", multivariate10k, :typed, :mooncake, true), | ||
("Multivariate 10k", multivariate10k, :typed, :enzyme, true), | ||
("Dynamic", Models.dynamic(), :typed, :mooncake, true), | ||
("Dynamic", Models.dynamic(), :typed, :enzyme, true), | ||
("Submodel", Models.parent(randn(rng)), :typed, :mooncake, true), | ||
("Submodel", Models.parent(randn(rng)), :typed, :enzyme, true), | ||
("LDA", lda_instance, :typed, :reversediff, true), | ||
("LDA", lda_instance, :typed, :enzyme, true), | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the interests of not making this take too long, could we restrict the Enzyme ones to Smorgasboard, Dynamic, Submodel and LDA?
Might also depend on whether you want to test both forward- and reverse-mode.
The point of these simple benchmarks is to provide a ballpark indicator of whether PRs did something very bad for performance. Since these benchmarks capture general performance issues, such as allocation and type stability, we only need to run them on a few AD backends to avoid excessive CI time. |
My thought was to put some Enzyme integration tests here to make it easier for @penelopeysm & myself to identify if a PPL change caused an error which prevents differentiation (I believe a majority of recent issues stem from DynamicPPL changes). This seemed like a happy middle for not running all tests, but alternatively @yebai if you prefer we can also add Enzyme to the other AD tests (which frankly would also be good for users). |
@wsmoses DynamicPPL has already been tested against Enzyme in an isolated CI. See, eg, here. Speaking of experience, Enzyme is still fragile and breaks for innocent Julia code. This is consistent with a few other places where we test against Enzyme, such as Bijectors and AdvancedVI. To address these issues, the best approach is to have comprehensive unit tests inside Enzyme against Julia syntax, so one has the empirical guarantee of Julia syntax coverage. Then, for cases that Enzyme doesn't plan to support, one can provide clear docs so users can work around them. For these reasons, I'd suggest against adding Enzyme to the DynamicPPL benchmarking before Enzyme gets the comprehensive unit test on Julia syntax. |
I see, though the fact it's red and seemingly ignored atm seems like a bad sign. Though in particular looking at the history #1005 does seem to be the change which made it start failing, which makes sense given all the accumulator-derived minimizations @penelopeysm and I have been looking into as of late (hopefully they're now all resolved as of this morning). and that's quite strange @yebai do you have an example where you can point to a regression on those repos from Enzyme offhand? In particular we run the Bijectors tests you added on every commit of Enzyme and they haven't ever failed since you've added them. Of course they're fixed to whatever bijectors version you chose (unless updated), which is why I think it's more likely something in that code (or a depenency) changed. That said, like I mentioned in the other thread -- happy to have whatever additional integration tests you guys like/think are useful! |
I think as a happy middle ground let's add it to Smorgasbord (that model's meant, at least in theory, to pick up as many DynamicPPL features as we can squeeze in, so if there are severe regressions with Enzyme we would pick it up ... if somebody looks). I'm going to close this PR and reopen one myself so that the workflow can run |
No description provided.