perf(provider): optimize RocksDB history pruning with changeset-based approach by meetrick · Pull Request #20674 · paradigmxyz/reth

meetrick · 2025-12-30T03:32:07Z

Summary

Optimizes StoragesHistory and AccountsHistory pruning in RocksDB by using
MDBX changesets instead of iterating the entire history tables.

Closes Use changeset-based pruning for StoragesHistory and AccountHistory in RocksDB #20417

Problem

During consistency checks (e.g. after crash recovery), RocksDB history pruning
currently scans the entire StoragesHistory / AccountsHistory tables to
delete entries above a given block.

On mainnet-scale databases, this results in unnecessary full table scans and
slow startup times.

Solution

Use MDBX StorageChangeSets and AccountChangeSets to identify only the
(address, key) pairs that changed in the excess block range, and prune
corresponding history entries using iter_from seeks.

If changeset data is unavailable or incomplete, the logic safely falls back
to a full table scan to preserve correctness.

Key Changes

Add iter_from to RocksDBProvider for efficient seek-based iteration
Prune history entries based on MDBX changesets instead of full scans
Retain a full-scan fallback for safety
Add a defensive check to catch missed entries in edge cases

Testing

All existing RocksDB invariant tests pass
Added tests to verify the defensive fallback when changesets are incomplete:
- test_storages_history_defensive_check_catches_missed_entries
- test_accounts_history_defensive_check_catches_missed_entries

gakonst · 2025-12-30T09:21:49Z

@Rjected shouldn't this work w static file changesets too?

Rjected · 2025-12-30T19:55:55Z

@Rjected shouldn't this work w static file changesets too?

Yes, I think you would just need to use static file changeset iterators for this

… approach Use MDBX changesets to identify affected (address, storage_key) pairs instead of iterating the entire StoragesHistory/AccountsHistory tables. This significantly improves pruning performance for mainnet-scale databases. Closes paradigmxyz#20417 Signed-off-by: Hwangjae Lee <meetrick@gmail.com>

Verify no excess entries remain after optimized pruning via `last()`. Falls back to full scan if entries are missed due to incomplete changesets. Signed-off-by: Hwangjae Lee <meetrick@gmail.com>

Add tests verifying the defensive check catches RocksDB entries missed by changeset-based pruning when MDBX data is incomplete. - test_storages_history_defensive_check_catches_missed_entries - test_accounts_history_defensive_check_catches_missed_entries Signed-off-by: Hwangjae Lee <meetrick@gmail.com>

Signed-off-by: Hwangjae Lee <meetrick@gmail.com>

meetrick · 2025-12-31T02:06:01Z

@Rjected @gakonst

I took a closer look at the codebase and here's what I found.

Current static files only contain Headers, Transactions, Receipts, and TransactionSenders
AccountChangeSets / StorageChangeSets exist only as MDBX tables, not as static files
AccountsTrieChangeSets / StoragesTrieChangeSets represent Merkle trie node state, which is a different kind of data

In this PR, the optimized path relies on MDBX changesets, with a fallback to a full table scan.

Could you clarify what you meant by "static file changeset iterators" in this context?

Is there a plan to introduce a new static file segment for changesets?
Or is there an existing mechanism I might be overlooking?

Thanks!

Signed-off-by: Hwangjae Lee <meetrick@gmail.com>

Rjected · 2025-12-31T05:09:04Z

@meetrick I'm referring to this PR #18882 which we will merge soon, and we will be rolling it out with the rocksdb storage code. Since it's not merged yet I don't expect this PR to change, but I would expect the direct DB calls to be replaced with methods that use providers once it is merged

meetrick · 2025-12-31T05:18:02Z

@Rjected Got it, thanks for the clarification.
I’ll keep this PR scoped as-is for now and am happy to follow up with a provider-based refactor after #18882 is merged if needed.

meetrick · 2026-01-08T16:26:04Z

@Rjected @gakonst I see that #18882 has been merged. I took a look at the code, and it seems that no additional refactoring is needed for this PR.

prune_accounts_history_above() already uses provider.changed_accounts_with_range() via the AccountExtReader trait. With the recent changes, this now internally goes through EitherReader::new_account_changesets(), so when account_changesets_in_static_files is enabled, it should automatically query from static files.

Please let me know if I’m missing anything, and I’d appreciate your thoughts on this.

The iter_from method was incorrectly accessing `.db` field directly on RocksDBProviderInner, which is an enum and doesn't expose db as a public field. This was introduced during merge conflict resolution. Use the existing iterator_cf() helper method that properly handles both ReadWrite and ReadOnly variants. Signed-off-by: Hwangjae Lee <meetrick@gmail.com>

Fixes several issues in changeset-based pruning: 1. Use HashSet instead of Vec for O(1) deduplication (was O(n²)) 2. Fix defensive check - the last() approach only checks lexicographically last entry, missing entries for earlier addresses. Now always does full verification scan after optimized path. 3. Remove verbose AI-generated comments that just restate the code 4. Simplify doc comments to be concise Net reduction: 60 lines of code (-100/+40) Addresses review feedback from PR paradigmxyz#20674

yongkangc · 2026-01-23T12:14:48Z

hey @meetrick for this pr, and rocks db we have decided to take inhouse. sorry about that

meetrick · 2026-01-23T12:19:47Z

@yongkangc Understood, thanks for the update.
Please let me know if there’s any future direction where external input would be useful.

meetrick requested review from joshieDo, rakita and shekhirin as code owners December 30, 2025 03:32

github-project-automation bot added this to Reth Tracker Dec 30, 2025

github-project-automation bot moved this to Backlog in Reth Tracker Dec 30, 2025

meetrick mentioned this pull request Dec 30, 2025

Use changeset-based pruning for StoragesHistory and AccountHistory in RocksDB #20417

Closed

meetrick marked this pull request as draft December 30, 2025 03:45

meetrick marked this pull request as ready for review December 30, 2025 04:03

meetrick added 4 commits December 31, 2025 10:15

fix(provider): add defensive check to RocksDB history pruning

19c7739

Verify no excess entries remain after optimized pruning via `last()`. Falls back to full scan if entries are missed due to incomplete changesets. Signed-off-by: Hwangjae Lee <meetrick@gmail.com>

fixed clippy error for CI

443efc6

Signed-off-by: Hwangjae Lee <meetrick@gmail.com>

meetrick force-pushed the reth/storage-optimize_RocksDB branch from 1394843 to 443efc6 Compare December 31, 2025 01:15

Code clean: unnecessary comments

95524ed

Signed-off-by: Hwangjae Lee <meetrick@gmail.com>

meetrick force-pushed the reth/storage-optimize_RocksDB branch from 3722adc to 95524ed Compare December 31, 2025 02:22

Merge branch 'main' into reth/storage-optimize_RocksDB

f353d22

joshieDo added the A-rocksdb Related to rocksdb integration label Jan 8, 2026

mattsse force-pushed the main branch from 3cad676 to 0f3d369 Compare January 10, 2026 15:16

meetrick added 2 commits January 23, 2026 11:10

Merge branch 'main' into reth/storage-optimize_RocksDB

586a2d6

gakonst requested a review from yongkangc January 23, 2026 11:54

paradigmxyz deleted a comment from gakonst Jan 23, 2026

yongkangc closed this Jan 23, 2026

github-project-automation bot moved this from Backlog to Done in Reth Tracker Jan 23, 2026

gakonst mentioned this pull request Jan 23, 2026

perf(provider): optimize RocksDB history pruning with changeset-based approach #21358

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(provider): optimize RocksDB history pruning with changeset-based approach#20674

perf(provider): optimize RocksDB history pruning with changeset-based approach#20674
meetrick wants to merge 8 commits intoparadigmxyz:mainfrom
meetrick:reth/storage-optimize_RocksDB

meetrick commented Dec 30, 2025

Uh oh!

gakonst commented Dec 30, 2025

Uh oh!

Rjected commented Dec 30, 2025

Uh oh!

meetrick commented Dec 31, 2025

Uh oh!

Rjected commented Dec 31, 2025 •

edited

Loading

Uh oh!

meetrick commented Dec 31, 2025

Uh oh!

meetrick commented Jan 8, 2026 •

edited

Loading

Uh oh!

yongkangc commented Jan 23, 2026

Uh oh!

meetrick commented Jan 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

meetrick commented Dec 30, 2025

Summary

Problem

Solution

Key Changes

Testing

Uh oh!

gakonst commented Dec 30, 2025

Uh oh!

Rjected commented Dec 30, 2025

Uh oh!

meetrick commented Dec 31, 2025

Uh oh!

Rjected commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

meetrick commented Dec 31, 2025

Uh oh!

meetrick commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yongkangc commented Jan 23, 2026

Uh oh!

meetrick commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Rjected commented Dec 31, 2025 •

edited

Loading

meetrick commented Jan 8, 2026 •

edited

Loading

meetrick commented Jan 23, 2026 •

edited

Loading