Fix lambdify cache-miss on evaluate() (closes #194) + replace flaky timing tests#193
Open
lmoresi wants to merge 2 commits into
Open
Fix lambdify cache-miss on evaluate() (closes #194) + replace flaky timing tests#193lmoresi wants to merge 2 commits into
lmoresi wants to merge 2 commits into
Conversation
test_lambdify_caching and test_rbf_false_not_slow asserted one-off wall-clock timings (time2 <= time1*2, elapsed < 0.01s). These are inherently flaky on shared CI runners and test_lambdify_caching has been failing CI on unrelated PRs (0.0107s vs 0.0049s runner jitter). Replaced with deterministic, timing-free behaviour tests: - test_lambdify_cache_hit: exercises get_cached_lambdified directly -- identical request returns the SAME function object and adds no entry (true cache hit); a distinct expression is cached separately. This is the cache mechanism's actual contract. - test_rbf_modes_consistent: rbf=True and rbf=False must agree for a pure-sympy expression (meaningful, timing-free; replaces the "rbf=False not slow" wall-clock check). Note: writing the cache-hit test surfaced that the high-level uw.function.evaluate() path does NOT hit the lambdify cache on repeated identical calls -- _expr_hash(srepr) differs every call, so the cache grows by one entry per call and never returns a hit. The old loose timing tolerance was masking this. Not fixed here (evaluate/lambdify is a performance-critical hot path needing separate benchmarking); see PR description. Full module: 19 passed. Underworld development team with AI support from Claude Code
Contributor
There was a problem hiding this comment.
Pull request overview
This PR replaces flaky wall-clock assertions in lambdify optimization tests with deterministic behavioral checks for cache reuse and evaluation consistency.
Changes:
- Replaces timing-based lambdify cache test with a direct
get_cached_lambdifiedcache-contract test. - Replaces
rbf=Falsetiming assertion with a consistency check betweenrbf=Trueandrbf=False.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
_expr_hash used sympy.srepr(expr), which embeds the volatile global dummy_index of any sympy.Dummy. The evaluate() coordinate-substitution path mints a fresh Dummy per call, so an otherwise-identical expression hashed differently every call and the lambdify cache never matched -- _lambdify_cache grew one entry per call (1,2,3,4,...) and sympy.lambdify recompiled every time on a hot path. Fix: canonicalise Dummy -> name-stable Symbol in _expr_hash before srepr. This changes only the cache *key*; the real sympy.lambdify() call still uses the original expr/symbols, so numerics are unchanged. The cache key separately carries the symbol-name tuple, so name-keying is safe and deterministic. Verified: repeated identical evaluate() now holds cache size flat ([1,1,1,...] vs [1,2,3,...] before). test_0720 module 19 passed; test_0501_integrals 9 passed / 3 pre-existing xfail (unrelated CellWiseIntegral #172/#174). test_evaluate_cache_stable_across_calls added as the #194 regression guard (aggregate cache behaviour + result-consistency, no wall-clock). Underworld development team with AI support from Claude Code
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two coupled changes in the lambdify-evaluate caching area:
Fixes uw.function.evaluate() never hits the lambdify cache (fresh sympy.Dummy per call → +1 cache entry every call) #194 —
uw.function.evaluate()never hit the lambdify cache._expr_hashusedsympy.srepr(expr), which embeds the volatile globaldummy_indexof anysympy.Dummy. The evaluate() coordinate-substitution path mints a freshDummyper call, so an otherwise-identical expression hashed differently every call →_lambdify_cachegrew one entry per call (1,2,3,4,…) andsympy.lambdifyrecompiled every time on a hot path.Fix: canonicalise
Dummy → Symbol(d.name)in_expr_hashbeforesrepr. This changes only the cache key; the realsympy.lambdify()call still uses the originalexpr/symbols, so numerics are unchanged. The cache key separately carries the symbol-name tuple, so name-keying is safe and deterministic.Replaces the flaky wall-clock tests in
TestPerformanceExpectations(the original motivation).test_lambdify_caching'sassert time2 <= time1*2was failing CI on unrelated PRs (e.g. Refined-submesh (coarse/fine companion) investigation #192) from pure runner jitter, and its loose tolerance was masking bug uw.function.evaluate() never hits the lambdify cache (fresh sympy.Dummy per call → +1 cache entry every call) #194.Tests
test_lambdify_cache_hit— exercisesget_cached_lambdifieddirectly: identical request returns the same object, no new entry; distinct expression cached separately. Deterministic, timing-free.test_evaluate_cache_stable_across_calls— uw.function.evaluate() never hits the lambdify cache (fresh sympy.Dummy per call → +1 cache entry every call) #194 regression guard: repeated identicalevaluate()holds_lambdify_cachesize flat ([1,1,1,…], was[1,2,3,…]) and results identical across calls. Aggregate cache behaviour, no wall-clock.Validation
evaluate()calls:[1,1,1,1,1,1,1](before fix:[1,2,3,4,5,6,7]).tests/test_0720_lambdify_optimization_paths.py— 19 passed, 1 pre-existing unrelated xpass.tests/test_0501_integrals.pysmoke — 9 passed, 3 pre-existing xfail (unrelated CellWiseIntegral Perf: clone DM in Integral.evaluate() (3× speedup; closes #171 narrative) #172/Revert PR #172 (integrals regression) + xfail CellWiseIntegral parity tests #174). No numerical regression from the cache-key change.Test plan
TestPerformanceExpectations— deterministic, no timingtest_0720module — 19 passedtest_0501_integralssafety smoke — 9 passedCloses #194.
Underworld development team with AI support from Claude Code