Skip to content

Conversation

@Scut-Corgis
Copy link
Contributor

@Scut-Corgis Scut-Corgis commented Nov 16, 2025

Problem

When executing FLUSHALL ASYNC on a writable replica that has
a large number of expired keys directly written to it, the main thread
gets blocked for an extended period while synchronously releasing the
replicaKeysWithExpire dictionary.

Root Cause

FLUSHALL ASYNC is designed for asynchronous lazy freeing of core data
structures, but the release of replicaKeysWithExpire (a dictionary tracking
expired keys on replicas) still happens synchronously in the main thread.
This synchronous operation becomes a bottleneck when dealing with massive
key volumes, as it cannot be offloaded to the lazyfree background thread.

This PR addresses the issue by moving the release of replicaKeysWithExpire
to the lazyfree background thread, aligning it with the asynchronous design
of FLUSHALL ASYNC and eliminating main thread blocking.

User scenarios

In some operations, people often need to do primary-replica switches.
One goal is to avoid noticeable impact on the business—like key loss
or reduced availability (e.g., write failures).

Here is the process: First, temporarily switch traffic to writable replicas.
Then we wait for the primary pending replication data to be fully synced
(so primry and replicas are in sync), before finishing the switch. We don't
usually need to do the flush in this case, but it's an optimization that can
be done.

@codecov
Copy link

codecov bot commented Nov 19, 2025

Codecov Report

❌ Patch coverage is 17.64706% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.43%. Comparing base (86db609) to head (2a63009).
⚠️ Report is 12 commits behind head on unstable.

Files with missing lines Patch % Lines
src/lazyfree.c 0.00% 11 Missing ⚠️
src/expire.c 25.00% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #2849      +/-   ##
============================================
+ Coverage     72.26%   72.43%   +0.17%     
============================================
  Files           128      128              
  Lines         70370    70428      +58     
============================================
+ Hits          50851    51017     +166     
+ Misses        19519    19411     -108     
Files with missing lines Coverage Δ
src/db.c 93.15% <100.00%> (ø)
src/expire.c 97.29% <25.00%> (-0.53%) ⬇️
src/lazyfree.c 85.41% <0.00%> (-7.07%) ⬇️

... and 23 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Scut-Corgis Scut-Corgis force-pushed the lazyfree-replicaKeysWithExpire branch from a428436 to 7fa9355 Compare November 19, 2025 12:06
@enjoy-binbin
Copy link
Member

enjoy-binbin commented Nov 24, 2025

We generally discourage the use of writable replicas. What user scenarios would you use it in?

The async way code is pretty simple, so i guess there is no harm to use it. @zuiderkwast WDYT?

@Scut-Corgis
Copy link
Contributor Author

We generally discourage the use of writable replicas. What user scenarios would you use it in?

In our daily operations, we often need to do master-slave switches. Our goal is to avoid noticeable impact on the business—like key loss or reduced availability (e.g., write failures).
Here’s our process: First, we temporarily switch traffic to writable replicas. Then we wait for the master’s pending replication data to be fully synced (so master and replicas are in sync), before finishing the full master-slave switch.

Co-authored-by: Binbin <[email protected]>
Signed-off-by: jiegang0219 <[email protected]>
@zuiderkwast
Copy link
Contributor

We generally discourage the use of writable replicas. What user scenarios would you use it in?

In our daily operations, we often need to do master-slave switches. Our goal is to avoid noticeable impact on the business—like key loss or reduced availability (e.g., write failures). Here’s our process: First, we temporarily switch traffic to writable replicas. Then we wait for the master’s pending replication data to be fully synced (so master and replicas are in sync), before finishing the full master-slave switch.

Yeah, this method is described here: https://valkey.io/topics/admin/#upgrading-or-restarting-a-valkey-instance-without-downtime. Is this where you found the recommendation?

I think there is no need to use writeable replicas. We should instead recommend the FAILOVER command, which does a coordinated failover between the primary and the replica. Is there any benefit in using writeable replicas during this switch-over or is the documentation simply outdated?

@Scut-Corgis
Copy link
Contributor Author

Yeah, this method is described here: https://valkey.io/topics/admin/#upgrading-or-restarting-a-valkey-instance-without-downtime. Is this where you found the recommendation?

I think there is no need to use writeable replicas. We should instead recommend the FAILOVER command, which does a coordinated failover between the primary and the replica. Is there any benefit in using writeable replicas during this switch-over or is the documentation simply outdated?

The method in the linked document aligns with what I described — we use writeable replicas for primary-replica switching.

The FAILOVER command has two modes: either it makes the primary lose some written data, or it disables writes for a period to align their offsets before switching. Both modes are actually business-perceptible for us. @zuiderkwast

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FAILOVER command has two modes: either it makes the primary lose some written data, or it disables writes for a period to align their offsets before switching. Both modes are actually business-perceptible for us.

Interesting. So if you (and other users) really need writable replicas, then we can't deprecate them. The implementation is very simple so I want to accept it. @enjoy-binbin Do you agree?

The reason we are skeptical to writable replicas in general is that it can cause data inconsistency. If some data written directly to the replica (such as SET k v) and some data is replicated from the primary (such as HSET k f v), the replication can fail if the key is of a different type than what the replication expected (for example repliction of HSET would fail if it's not a hash). It is a theoretical problem. In practice, it's easy to avoid it, but it's good to be aware of it.

@enjoy-binbin enjoy-binbin changed the title Implement Lazyfree for replicaKeysWithExpire to Avoid Main Thread Blocking in FLUSHALL ASYNC Add support for asynchronous release to replicaKeysWithExpire on writable replica Nov 25, 2025
@enjoy-binbin enjoy-binbin merged commit dd2827a into valkey-io:unstable Nov 25, 2025
55 checks passed
@enjoy-binbin enjoy-binbin added the release-notes This issue should get a line item in the release notes label Nov 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes This issue should get a line item in the release notes

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants