chore[storage]: Limit the total size of RocksDB WAL files #1518
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
We have been observing cases of long-running fullnodes where the size of the RocksDB WAL files (the .log files) grow indefinitely, usually up to fill most of the available space in the disk. In one of the cases, we found 13 GB of .log files.
Acceptance Criteria
max_total_wal_sizeto 3 GB, which will cause RocksDB to flush all in-memory memtable data to .sst files and clean up the .log files when they reach this size in totalstorage.rocksdb.flush=: This will make RocksDB flush the memory (and consequently clean up the .log files). It's a way to trigger the process manually.storage.rocksdb.wal_stats: This will show information about the size each column-family is holding in WAL files. It's useful to check whether the other comand worked for all column-families, for instanceTesting
To test this, you will need to:
Run `poetry lock && poetry update rocksdb
Change the max_total_wal_size to 3 MB, so that we can observe RocksDB behavior with it:
Run
mkdir data-testnet-indiaRun this command to start the node and sync with testnet-india:
Stop the fullnode, undo the change we did to
max_total_wal_sizeso it gets back to 3 GBStart the fullnode again
Run
nc -U /tmp/sysctl.sockand send the commandstorage.rocksdb.flush=. You should check the .log files before and after running it, and they should decrease in size and get rotated. You may want to wait a little for them to grow bigger before doing so, they should grow quickly during the sync with the network.Checklist
master, confirm this code is production-ready and can be included in future releases as soon as it gets merged