LedgerDB: prune on garbage collection instead of on every change #1513

amesgen · 2025-05-19T12:26:24Z

This is in preparation for #1424

This PR is intended to be reviewed commit-by-commit.

Currently, we prune the LedgerDB (ie remove all but the last k+1 states) every time we adopt a longer chain. This means that we can not rely on the fact that other threads (like the copyAndSnapshot ChainDB background) actually observe all immutable ledger states, just as described in the caveats of our Watcher abstraction.

However, a predictable ledger snapshotting rule (#1424) requires this property; otherwise, when the node is under high load and/or we are adopting multiple blocks in quick succession, the node might not be able to create a snapshot for its desired block.

This PR changes this fact: Now, when adopting new blocks, the LedgerDB is not immediately pruned. Instead, the a new dedicated background thread for ledger maintenance tasks (flushing/snapshotting/garbage collection) in the ChainDB will periodically (on every new immutable block) wake up and (in particular) garbage collect the LedgerDB based on a slot number.

Also, this makes the semantics more consistent with the existing garbage collection of previously-applied blocks in the LedgerDB, and also with how the ChainDB works, where we also don't immediately delete blocks from the VolatileDB once they are buried beneath k+1 blocks.

See #1513 (comment) for benchmarks demonstrating that the peak memory usage does not increase while syncing (where we now briefly might hold more than k+1 ledger states in memory).

jasagredo

Looks good.

ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/V2.hs

amesgen · 2025-06-05T14:17:03Z

Sync benchmarks are looking good (mainnet, first 1e6 slots/blocks):

LMDB benchmark (of course, this is a bit degenerate as Byron doesn't have tables, but this still serves as a regression test for the DbChangelog aspects which are touched by this PR).

Note that d1b6215 is crucial; otherwise, there is a significant (2x) regression in max heap size.

amesgen · 2025-07-02T08:19:05Z

Thanks for the great reviews, I hope I addressed your comments. Interesting changes:

d627e84 for removing LedgerDbPruneKeeping prompted by LedgerDB: prune on garbage collection instead of on every change #1513 (comment)
Found a bug in the LedgerDB state machine test (LedgerDB.StateMachine test: actually test rollbacks #1576) which this PR now depends on as I enriched it due to LedgerDB: prune on garbage collection instead of on every change #1513 (comment)

It is not necessary to perform the garbage collection of the LedgerDB and the map of invalid blocks in the same STM transaction. In the past, this was important, but it is not anymore, see #1507.

Primarily, this is an optimization to reduce the maximum memory usage (more relevant with the in-memory backend) when pruning happens on garbage collection instead of while adding new blocks to the LedgerDB, see the added commit and the benchmark in the pull request. Previously, LedgerDB garbage collection happened as part of VolatileDB garbage collection, which was intentionally rate-limited. Also, it resolves the current (somewhat weird) behavior that we do not copy any blocks to the ImmutableDB when we are taking a snapshot (which can take >2 minutes), and consequently also not garbage-collecting the VolatileDB. It also synergizes with the planned feature to add a random delay when taking snapshots.

Also make sure to account for the fact that the DbChangelog might have gotten pruned between opening and committing the forker.

regarding the previous few commits

@k

It was already superseded in the most important places due to `LedgerDbPruneBeforeSlot`. Its remaining use cases are non-essential: - Replay on startup. In this case, we never roll back, so not maintaining k states is actually an optimization here. We can also remove the now-redundant `InitDB.pruneDb` function. - Internal functions used for db-analyser. Here, we can just as well use `LedgerDbPruneAll` (which is used by `pruneToImmTipOnly`) as we never need to roll back. - Testing. In particular, we remove some DbChangelog tests that previously ensured that only at most @k@ states are kept. This is now no longer true; that property is instead enforced by the LedgerDB built on top of the DbChangelog. A follow-up commit in this PR enriches the LedgerDB state machine test to make sure that the public API functions behave appropriately, ensuring that we don't lose test coverage (and also testing V2, which previously didn't have any such tests).

Make sure that we correctly fail when trying to roll back too far.

jasagredo

I gave another look to the PR. I therefore approve it again.

amesgen changed the base branch from cardano-node-10.4-backports to main May 20, 2025 15:03

amesgen force-pushed the amesgen/ledgerdb-garbage-collect-states branch 2 times, most recently from 8b48bb3 to 045f1cc Compare May 20, 2025 15:15

amesgen changed the base branch from main to amesgen/v2-ledgerseq-close May 20, 2025 15:15

amesgen force-pushed the amesgen/ledgerdb-garbage-collect-states branch 4 times, most recently from 13e5533 to 68402ed Compare May 20, 2025 17:25

jasagredo approved these changes May 21, 2025

View reviewed changes

ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/V2.hs Show resolved Hide resolved

amesgen force-pushed the amesgen/v2-ledgerseq-close branch from 981971e to 0c5b137 Compare May 28, 2025 12:00

amesgen force-pushed the amesgen/ledgerdb-garbage-collect-states branch from 68402ed to 4d6fd67 Compare May 28, 2025 12:00

amesgen mentioned this pull request May 28, 2025

LedgerDB.V2: make sure to actually close handles #1516

Merged

jasagredo mentioned this pull request May 29, 2025

Consensus release for node 10.6 #1541

Closed

jasagredo added this to Consensus Team Backlog Jun 5, 2025

jasagredo moved this to 🏗 In progress in Consensus Team Backlog Jun 5, 2025

jasagredo assigned amesgen Jun 5, 2025

amesgen mentioned this pull request Jun 5, 2025

LedgerDB V2: prevent race conditions between using (duplicating) and closing LedgerTableHandle s #1551

Closed

amesgen force-pushed the amesgen/v2-ledgerseq-close branch from 0c5b137 to 7900088 Compare June 5, 2025 11:28

amesgen force-pushed the amesgen/ledgerdb-garbage-collect-states branch from 4d6fd67 to 7049fd4 Compare June 5, 2025 14:16

Base automatically changed from amesgen/v2-ledgerseq-close to main June 5, 2025 21:18

amesgen force-pushed the amesgen/ledgerdb-garbage-collect-states branch 2 times, most recently from 2e01b1c to b9e25f5 Compare June 10, 2025 15:47

amesgen changed the base branch from main to amesgen/ledgerdb-v2-locking June 10, 2025 15:49

amesgen force-pushed the amesgen/ledgerdb-v2-locking branch from 19faf20 to 4010598 Compare June 10, 2025 17:54

amesgen mentioned this pull request Jun 10, 2025

LedgerDB.V2: opportunistically reduce lock contention when closing a Forker #1557

Open

amesgen force-pushed the amesgen/ledgerdb-garbage-collect-states branch from b9e25f5 to 894940c Compare June 10, 2025 18:09

Base automatically changed from amesgen/ledgerdb-v2-locking to main June 11, 2025 09:07

amesgen force-pushed the amesgen/ledgerdb-garbage-collect-states branch from 894940c to a8fa7e2 Compare June 30, 2025 08:11

amesgen marked this pull request as ready for review June 30, 2025 08:22

amesgen force-pushed the amesgen/ledgerdb-garbage-collect-states branch from b503dc3 to 6c78fad Compare July 2, 2025 08:14

amesgen changed the base branch from main to amesgen/ledgerdb-state-machine-precondition-bug July 2, 2025 08:16

amesgen mentioned this pull request Jul 2, 2025

Optional random delay when creating snapshots #1573

Open

Base automatically changed from amesgen/ledgerdb-state-machine-precondition-bug to main July 2, 2025 14:48

amesgen force-pushed the amesgen/ledgerdb-garbage-collect-states branch 2 times, most recently from 48bb1fe to ad7acfa Compare July 9, 2025 12:26

amesgen mentioned this pull request Jul 23, 2025

Weighted chain comparisons and weighted chain selection tweag/cardano-peras#62

Open

amesgen force-pushed the amesgen/ledgerdb-garbage-collect-states branch from ad7acfa to d791dfb Compare August 4, 2025 11:27

amesgen mentioned this pull request Aug 4, 2025

Minimize exposure to exact immutability criterion #1619

Merged

amesgen force-pushed the amesgen/ledgerdb-garbage-collect-states branch from d791dfb to 247c489 Compare August 8, 2025 14:13

amesgen added 8 commits August 13, 2025 16:39

LedgerDB.garbageCollect: allow (non-STM) effectful cleanup

c7632eb

It is not necessary to perform the garbage collection of the LedgerDB and the map of invalid blocks in the same STM transaction. In the past, this was important, but it is not anymore, see #1507.

LedgerDB: introduce slot-based pruning

211941a

LedgerDB.V1: prune on garbage collection instead of on every change

f4f8214

Also make sure to account for the fact that the DbChangelog might have gotten pruned between opening and committing the forker.

LedgerDB.V1: adapt queries for DbChangelog of length >k

56d18ce

LedgerDB.V2: prune on garbage collection instead of on every change

5810be8

LedgerDB.V2: adapt queries for DbChangelog of length >k

afeac4e

LedgerDB.garbageCollect: update documentation

86887dc

regarding the previous few commits

amesgen force-pushed the amesgen/ledgerdb-garbage-collect-states branch from 247c489 to 08bda65 Compare August 13, 2025 14:41

amesgen added 3 commits August 13, 2025 17:27

LedgerDB.StateMachine test: test invalid rollbacks

8059f84

Make sure that we correctly fail when trying to roll back too far.

Add changelogs

dec284f

amesgen force-pushed the amesgen/ledgerdb-garbage-collect-states branch from 08bda65 to dec284f Compare August 13, 2025 15:27

jasagredo approved these changes Sep 2, 2025

View reviewed changes

amesgen added this pull request to the merge queue Sep 2, 2025

Merged via the queue into main with commit df88019 Sep 2, 2025
16 of 17 checks passed

amesgen deleted the amesgen/ledgerdb-garbage-collect-states branch September 2, 2025 14:46

github-project-automation bot moved this from 👀 In review to ✅ Done in Consensus Team Backlog Sep 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LedgerDB: prune on garbage collection instead of on every change #1513

LedgerDB: prune on garbage collection instead of on every change #1513

Uh oh!

amesgen commented May 19, 2025 •

edited

Loading

Uh oh!

jasagredo left a comment

Uh oh!

Uh oh!

amesgen commented Jun 5, 2025 •

edited

Loading

Uh oh!

amesgen commented Jul 2, 2025

Uh oh!

jasagredo left a comment

Uh oh!

Uh oh!

Uh oh!

LedgerDB: prune on garbage collection instead of on every change #1513

LedgerDB: prune on garbage collection instead of on every change #1513

Uh oh!

Conversation

amesgen commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jasagredo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

amesgen commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amesgen commented Jul 2, 2025

Uh oh!

jasagredo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

amesgen commented May 19, 2025 •

edited

Loading

amesgen commented Jun 5, 2025 •

edited

Loading