Fix golden cache codepath bugs and raise BuildKit GC limit#1788
Merged
lukemarsden merged 1 commit intomainfrom Mar 2, 2026
Merged
Fix golden cache codepath bugs and raise BuildKit GC limit#1788lukemarsden merged 1 commit intomainfrom
lukemarsden merged 1 commit intomainfrom
Conversation
Root cause of cache misses: BuildKit doesn't refresh LastUsedAt on cache hits, so golden build entries look "stale" to GC and get evicted when later session builds temporarily push the cache over the default 93 GiB limit. Raise to 300 GiB so GC never runs. Golden cache codepath fixes: - Remove dead GoldenBuildRunning/SetGoldenBuildRunning file lock (written but never read; golden_build_service uses in-memory map instead) - Add per-project RWMutex to prevent race between PromoteSessionToGolden and SetupGoldenCopy (promotion could rename dir mid-copy) - Add golden-version.json with generation number, timestamp, session ID so containers can identify which golden cache they're running from - Check .golden-build-result removal errors instead of silently ignoring (failed removal causes premature promotion) - Short-circuit parallelCopyDir on first error instead of continuing to launch doomed cp jobs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
LastUsedAton cache hits, so golden build entries look "stale" to GC and get evicted when session builds temporarily exceed the default 93 GiB limit. Raised to 300 GiB so GC never triggers.PromoteSessionToGoldencould rename the golden directory whileSetupGoldenCopywas reading from it. Added per-projectRWMutex(read lock for copies, write lock for promotion).golden-version.jsonwritten on each promotion with generation number, timestamp, and session ID. Logged during copy so containers can identify which golden they're running from.GoldenBuildRunning/SetGoldenBuildRunningfile-based lock was written but never read (golden build service uses its own in-memory map)..golden-build-resultremoval now checks and logs errors instead of ignoring them (failed removal causes premature promotion).parallelCopyDirstops launching newcpjobs once one fails.Test plan
golden-version.jsonis writtenbuildkitd.tomlwith 300 GiB limit on API restartgo build ./api/pkg/hydra/ ./api/pkg/services/ ./api/pkg/external-agent/ ./api/pkg/server/passes🤖 Generated with Claude Code