Skip to content

perf(provider): optimize RocksDB history pruning with changeset-based approach#20674

Closed
meetrick wants to merge 8 commits intoparadigmxyz:mainfrom
meetrick:reth/storage-optimize_RocksDB
Closed

perf(provider): optimize RocksDB history pruning with changeset-based approach#20674
meetrick wants to merge 8 commits intoparadigmxyz:mainfrom
meetrick:reth/storage-optimize_RocksDB

Conversation

@meetrick
Copy link
Contributor

Summary

Optimizes StoragesHistory and AccountsHistory pruning in RocksDB by using
MDBX changesets instead of iterating the entire history tables.

Problem

During consistency checks (e.g. after crash recovery), RocksDB history pruning
currently scans the entire StoragesHistory / AccountsHistory tables to
delete entries above a given block.

On mainnet-scale databases, this results in unnecessary full table scans and
slow startup times.

Solution

Use MDBX StorageChangeSets and AccountChangeSets to identify only the
(address, key) pairs that changed in the excess block range, and prune
corresponding history entries using iter_from seeks.

If changeset data is unavailable or incomplete, the logic safely falls back
to a full table scan to preserve correctness.

Key Changes

  • Add iter_from to RocksDBProvider for efficient seek-based iteration
  • Prune history entries based on MDBX changesets instead of full scans
  • Retain a full-scan fallback for safety
  • Add a defensive check to catch missed entries in edge cases

Testing

  • All existing RocksDB invariant tests pass
  • Added tests to verify the defensive fallback when changesets are incomplete:
    • test_storages_history_defensive_check_catches_missed_entries
    • test_accounts_history_defensive_check_catches_missed_entries

@gakonst
Copy link
Member

gakonst commented Dec 30, 2025

@Rjected shouldn't this work w static file changesets too?

@Rjected
Copy link
Member

Rjected commented Dec 30, 2025

@Rjected shouldn't this work w static file changesets too?

Yes, I think you would just need to use static file changeset iterators for this

… approach

Use MDBX changesets to identify affected (address, storage_key) pairs
instead of iterating the entire StoragesHistory/AccountsHistory tables.

This significantly improves pruning performance for mainnet-scale databases.

Closes paradigmxyz#20417

Signed-off-by: Hwangjae Lee <meetrick@gmail.com>
Verify no excess entries remain after optimized pruning via `last()`.
Falls back to full scan if entries are missed due to incomplete changesets.

Signed-off-by: Hwangjae Lee <meetrick@gmail.com>
Add tests verifying the defensive check catches RocksDB entries
missed by changeset-based pruning when MDBX data is incomplete.
- test_storages_history_defensive_check_catches_missed_entries
- test_accounts_history_defensive_check_catches_missed_entries

Signed-off-by: Hwangjae Lee <meetrick@gmail.com>
Signed-off-by: Hwangjae Lee <meetrick@gmail.com>
@meetrick meetrick force-pushed the reth/storage-optimize_RocksDB branch from 1394843 to 443efc6 Compare December 31, 2025 01:15
@meetrick
Copy link
Contributor Author

@Rjected @gakonst

I took a closer look at the codebase and here's what I found.

  1. Current static files only contain Headers, Transactions, Receipts, and TransactionSenders
  2. AccountChangeSets / StorageChangeSets exist only as MDBX tables, not as static files
  3. AccountsTrieChangeSets / StoragesTrieChangeSets represent Merkle trie node state, which is a different kind of data

In this PR, the optimized path relies on MDBX changesets, with a fallback to a full table scan.

Could you clarify what you meant by "static file changeset iterators" in this context?

  • Is there a plan to introduce a new static file segment for changesets?
  • Or is there an existing mechanism I might be overlooking?

Thanks!

Signed-off-by: Hwangjae Lee <meetrick@gmail.com>
@meetrick meetrick force-pushed the reth/storage-optimize_RocksDB branch from 3722adc to 95524ed Compare December 31, 2025 02:22
@Rjected
Copy link
Member

Rjected commented Dec 31, 2025

@meetrick I'm referring to this PR #18882 which we will merge soon, and we will be rolling it out with the rocksdb storage code. Since it's not merged yet I don't expect this PR to change, but I would expect the direct DB calls to be replaced with methods that use providers once it is merged

@meetrick
Copy link
Contributor Author

@Rjected Got it, thanks for the clarification.
I’ll keep this PR scoped as-is for now and am happy to follow up with a provider-based refactor after #18882 is merged if needed.

@meetrick
Copy link
Contributor Author

meetrick commented Jan 8, 2026

@Rjected @gakonst I see that #18882 has been merged. I took a look at the code, and it seems that no additional refactoring is needed for this PR.

prune_accounts_history_above() already uses provider.changed_accounts_with_range() via the AccountExtReader trait. With the recent changes, this now internally goes through EitherReader::new_account_changesets(), so when account_changesets_in_static_files is enabled, it should automatically query from static files.

Please let me know if I’m missing anything, and I’d appreciate your thoughts on this.

@joshieDo joshieDo added the A-rocksdb Related to rocksdb integration label Jan 8, 2026
The iter_from method was incorrectly accessing `.db` field directly on
RocksDBProviderInner, which is an enum and doesn't expose db as a public
field. This was introduced during merge conflict resolution. Use the
existing iterator_cf() helper method that properly handles both ReadWrite
and ReadOnly variants.

Signed-off-by: Hwangjae Lee <meetrick@gmail.com>
gakonst added a commit to tempoxyz/reth-1 that referenced this pull request Jan 23, 2026
Fixes several issues in changeset-based pruning:

1. Use HashSet instead of Vec for O(1) deduplication (was O(n²))
2. Fix defensive check - the last() approach only checks lexicographically
   last entry, missing entries for earlier addresses. Now always does
   full verification scan after optimized path.
3. Remove verbose AI-generated comments that just restate the code
4. Simplify doc comments to be concise

Net reduction: 60 lines of code (-100/+40)

Addresses review feedback from PR paradigmxyz#20674
@gakonst gakonst requested a review from yongkangc January 23, 2026 11:54
@paradigmxyz paradigmxyz deleted a comment from gakonst Jan 23, 2026
@yongkangc
Copy link
Member

hey @meetrick for this pr, and rocks db we have decided to take inhouse. sorry about that

@yongkangc yongkangc closed this Jan 23, 2026
@github-project-automation github-project-automation bot moved this from Backlog to Done in Reth Tracker Jan 23, 2026
@meetrick
Copy link
Contributor Author

meetrick commented Jan 23, 2026

@yongkangc Understood, thanks for the update.
Please let me know if there’s any future direction where external input would be useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-rocksdb Related to rocksdb integration

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Use changeset-based pruning for StoragesHistory and AccountHistory in RocksDB

5 participants