Skip to content

Conversation

@luislhl
Copy link
Contributor

@luislhl luislhl commented Dec 16, 2025

Motivation

We have been observing cases of long-running fullnodes where the size of the RocksDB WAL files (the .log files) grow indefinitely, usually up to fill most of the available space in the disk. In one of the cases, we found 13 GB of .log files.

Acceptance Criteria

  • We should set max_total_wal_size to 3 GB, which will cause RocksDB to flush all in-memory memtable data to .sst files and clean up the .log files when they reach this size in total
  • We should have new sysctl commands:
    • storage.rocksdb.flush=: This will make RocksDB flush the memory (and consequently clean up the .log files). It's a way to trigger the process manually.
    • storage.rocksdb.wal_stats: This will show information about the size each column-family is holding in WAL files. It's useful to check whether the other comand worked for all column-families, for instance

Testing

To test this, you will need to:

  1. Apply this git diff locally, since we depend on changes in the python-bindings:
diff --git a/pyproject.toml b/pyproject.toml
index 9787888b..0f0930c2 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -72,7 +72,7 @@ service_identity = "~21.1.0"
 pexpect = "~4.8.0"
 sortedcontainers = "~2.4.0"
 structlog = "~22.3.0"
-rocksdb = {git = "https://github.com/hathornetwork/python-rocksdb.git"}
+rocksdb = {git = "https://github.com/hathornetwork/python-rocksdb.git", branch = "chore/max_total_wal_size_option"}
 aiohttp = "~3.10.3"
 idna = "~3.4"
 setproctitle = "^1.3.3"
  1. Run `poetry lock && poetry update rocksdb

  2. Change the max_total_wal_size to 3 MB, so that we can observe RocksDB behavior with it:

diff --git a/hathor/storage/rocksdb_storage.py b/hathor/storage/rocksdb_storage.py
index 640dbb06..4ddb19b1 100644
--- a/hathor/storage/rocksdb_storage.py
+++ b/hathor/storage/rocksdb_storage.py
@@ -50,7 +50,7 @@ class RocksDBStorage:
             # This limits the total size of WAL files (the .log files) in RocksDB.
             # When reached, a flush is triggered by RocksDB to free up space.
             # This was added because we had cases where these files would accumulate and use too much disk space.
-            max_total_wal_size=3 * 1024 * 1024 * 1024,  # 3GB
+            max_total_wal_size=3 * 1024 * 1024,  # 3MB
         )
 
         cf_names: list[bytes]
  1. Run mkdir data-testnet-india

  2. Run this command to start the node and sync with testnet-india:

./hathor-cli run_node \
  --testnet \
  --data ./data-testnet-india \
  --listen tcp:40405 \
  --status 8081 \
  --wallet-index \
  --nc-indexes \
  --sysctl "unix:/tmp/sysctl.sock"
  1. Observe how the .log file(s) in the data folder will never grow bigger than 3 MB, and will get rotated constantly (you can see this by the number in their name):
# Run this multiple times
ls -lah data-testnet-india/data_v2.db/*.log
  1. Stop the fullnode, undo the change we did to max_total_wal_size so it gets back to 3 GB

  2. Start the fullnode again

  3. Run nc -U /tmp/sysctl.sock and send the command storage.rocksdb.flush=. You should check the .log files before and after running it, and they should decrease in size and get rotated. You may want to wait a little for them to grow bigger before doing so, they should grow quickly during the sync with the network.

Checklist

  • If you are requesting a merge into master, confirm this code is production-ready and can be included in future releases as soon as it gets merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress (WIP)

Development

Successfully merging this pull request may close these issues.

2 participants