Skip to content

standardize kvstore scan cursor#3588

Open
JimB123 wants to merge 1 commit intovalkey-io:unstablefrom
JimB123:kvstore-cursor
Open

standardize kvstore scan cursor#3588
JimB123 wants to merge 1 commit intovalkey-io:unstablefrom
JimB123:kvstore-cursor

Conversation

@JimB123
Copy link
Copy Markdown
Member

@JimB123 JimB123 commented Apr 29, 2026

The cursor in kvstore is composed of a hashtable cursor + an index of the hashtable within the kvstore. 48 bits are reserved for the hashtable cursor, and UP TO 16 bits are used for the hashtable index. The hashtable cursor is shifted by the number of bits actually needed for the hashtable index in a given kvstore (0-16).

This update shifts by a constant 16 bits. This eliminates variability in format. It also simplifies the cursor access functions as they no longer need to reference the specific kvstore in order to encode/decode the cursor. Reduced parameters and eliminated in/out parameters to cursor encode/decode functions.

Updated kvstoreScan to more clearly show when a kvstore cursor was used vs. a hashtable cursor, rather than modifying it in-place in the same variable.

Copy link
Copy Markdown
Contributor

@rainsupreme rainsupreme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

While this is a worthy simplification, my understanding is that this is also part of the forkless work, as your plan is to use kvstore scan for the order that items are traversed, and it will need to understand whether incoming changes are before or after the cursor position.

Signed-off-by: Jim Brunner <brunnerj@amazon.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 29, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.62%. Comparing base (8091c6c) to head (2322d19).
⚠️ Report is 5 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #3588      +/-   ##
============================================
+ Coverage     76.42%   76.62%   +0.20%     
============================================
  Files           159      160       +1     
  Lines         80113    80458     +345     
============================================
+ Hits          61225    61650     +425     
+ Misses        18888    18808      -80     
Files with missing lines Coverage Δ
src/kvstore.c 96.61% <100.00%> (-0.03%) ⬇️

... and 24 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the cursor values so it affects cursor compatibilty between versions.

We need to bump the fingerprint in the CLUSTERSLOTS cursor introduced in #2934.

It breaks SCANs that are performed over a period involving a failover in a mixed-versions cluster such as during a rolling upgrade, even if the nodes are configured with a fixed hash seed as introduced in #2608.

Given these problems, I'm not sure it's a good idea to accept this change.

Copy link
Copy Markdown
Contributor

@murphyjacob4 murphyjacob4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more readable to me now

@zuiderkwast
Copy link
Copy Markdown
Contributor

We need cross-version tests for the cross-node SCAN features mentioned above.

@murphyjacob4
Copy link
Copy Markdown
Contributor

It breaks SCANs that are performed over a period involving a failover in a mixed-versions cluster such as during a rolling upgrade, even if the nodes are configured with a fixed hash seed as introduced in #2608.

Good point on CLUSTERSCAN

I specifically remember in the CLUSTERSCAN design we wanted to encode the "memory layout version" to prevent the need for the cursor to maintain cross version stability. I think we should be okay making the cursor reset across versions, it wouldn't be breaking, just less efficient. Otherwise, it is very restrictive. We should be okay re-fingerprinting on each minor version, if needed.

@murphyjacob4
Copy link
Copy Markdown
Contributor

We need cross-version tests for the cross-node SCAN features mentioned above.

Yeah, we just need a test to catch when the cursor or memory layout changes. The test can be updated whenever we make those changes to make it pass

@madolson
Copy link
Copy Markdown
Member

I specifically remember in the CLUSTERSCAN design we wanted to encode the "memory layout version" to prevent the need for the cursor to maintain cross version stability. I think we should be okay making the cursor reset across versions, it wouldn't be breaking, just less efficient. Otherwise, it is very restrictive. We should be okay re-fingerprinting on each minor version, if needed.

We wanted to encode so that we could change it as needed, I don't think readability is reason to change it. I agree with Viktor that I would rather not take this. Also, agree on the cross version scan test. We already have cross version testing, we can add SCAN to that.

Copy link
Copy Markdown
Member

@roshkhatri roshkhatri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just suggestion not different from whats already said:

Add to CLUSTERSCAN fingerprint

as introduced in #2934, the fingerprint in clusterscanFingerprint() only hashes the see, we can incorporate a cursor layout version so the mismatch is detected and the scan can restarts cleanly:

in src/cluster.c

#define CURSOR_LAYOUT_VERSION 1
static const char *clusterscanFingerprint(void) {
.....
uint64_t hash = wangHash64(seed[0] ^ seed[1] ^ CURSOR_LAYOUT_VERSION);
.....
}

like @zuiderkwast @murphyjacob4 mentioned.

WDYT?

Comment thread src/kvstore.c
Comment on lines 414 to 419
unsigned long long kvstoreScan(kvstore *kvs,
unsigned long long cursor,
unsigned long long kvs_cursor,
int onlydidx,
kvstoreScanFunction scan_cb,
kvstoreScanShouldSkipHashtable *skip_cb,
void *privdata) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to update these in kvstore.h

@murphyjacob4
Copy link
Copy Markdown
Contributor

We wanted to encode so that we could change it as needed, I don't think readability is reason to change it.

Yeah this was one thing I was worried about with the CLUSTERSCAN feature - I don't like us adding additional contracts (whether implicit, best effort, or otherwise) to the stability of the cursor over versions. I would almost prefer that we always break it over minor versions just so nobody takes a dependency on this and then gets broken when a bump does happen.

Also CLUSTERSCAN hasn't GA'd yet - so we could always merge this into 9.1 before GA and it should be fine?

@zuiderkwast
Copy link
Copy Markdown
Contributor

Yeah this was one thing I was worried about with the CLUSTERSCAN feature - I don't like us adding additional contracts (whether implicit, best effort, or otherwise) to the stability of the cursor over versions. I would almost prefer that we always break it over minor versions just so nobody takes a dependency on this and then gets broken when a bump does happen.

I agree the contracts should be explicit.

Also CLUSTERSCAN hasn't GA'd yet - so we could always merge this into 9.1 before GA and it should be fine?

Good point that CLUSTERSCAN isn't GA yet. Nether is hash-seed (#2608). We do have a strategy for CLUSTERSCAN with the fingerprint, but what can we do for hash-seed in the future?

We discussed at some point in the past that we have two spare bits in the cursor (we only need 14 bits for the cluster slot, not 16) and that we can use these as a version of some kind, but that'd be internal and clients shouldn't be aware of the cursor representation. Perhaps we should just explicitly mention that it's not stable accross version in its documentation in valkey.conf, which currently doesn't cover versions:

# Use a fixed hash seed for hashtable instead of a random one.
# Setting this option makes commands like SCAN return keys in a consistent
# order across restarts and failovers. The seed can be any string up to 256 characters.
# The value is immutable and must be provided only at server startup.
#
# hash-seed example-seed-val

Another thing I remember we discussed when we introduced kvstore is that we can avoid returning very large numbers for the cursor. IIRC, that's why we didn't shift it in standalone mode and also why we put the slot in the lower bits instead of in the highest 16 bits. A small hash table will have a relatively small cursor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants