Skip to content

Conversation

@zhijun42
Copy link
Contributor

@zhijun42 zhijun42 commented Oct 26, 2025

By default, servers in a cluster don't have a human-readable nodename unless explicitly set by user. This can be annoying when reading the server logs during local development, since we would need to manually lookup the unique node ID (40 chars long) to figure out its port number. Most logging statements in the cluster are in the form of serverLog(LL_NOTICE, "Something.. %.40s (%s) something..", node->name, node->human_nodename); , and thus this would always result in an empty () like the following:

2832:M 26 Oct 2025 18:00:14.033 * Node dd4f43d9e4a1a2875a514733c08d4870c21db682 () is now a replica of node 1b5d2a4865c04aaa1ccf06db66f22a043411ded7 () in shard d3fc557d12ab3c5a8850c7b0c9afb252b473cad1
2832:M 26 Oct 2025 18:00:14.033 * Clear FAIL state for node dd4f43d9e4a1a2875a514733c08d4870c21db682 (): replica is reachable again.
2832:M 26 Oct 2025 19:21:07.191 * NODE 524978a9924b5841c4c93f83581d202cfbce1443 () possibly failing.
2832:M 26 Oct 2025 19:21:07.838 * FAIL message received from 747e0f97a58cd5a7ac4c2130417fc47a807802ed () about 524978a9924b5841c4c93f83581d202cfbce1443 ()
2832:M 26 Oct 2025 19:22:40.434 * Clear FAIL state for node 524978a9924b5841c4c93f83581d202cfbce1443 (): replica is reachable again.
2832:M 26 Oct 2025 19:46:43.539 * FAIL message received from 747e0f97a58cd5a7ac4c2130417fc47a807802ed () about 1b5d2a4865c04aaa1ccf06db66f22a043411ded7 ()
2832:M 26 Oct 2025 19:46:43.539 # Cluster state changed: fail

Assigning a default human-readable nodename solves the annoyance:

7614:S 26 Oct 2025 19:52:21.642 * NODE 03b3811da7b33451f6e4988295b0158f410d2c00 (127.0.0.1:7001) possibly failing.
7614:S 26 Oct 2025 19:52:22.550 * Marking node 03b3811da7b33451f6e4988295b0158f410d2c00 (127.0.0.1:7001) as failing (quorum reached).
7614:S 26 Oct 2025 19:54:15.915 * NODE b2f0fc93632d12c06039335b025365d484ff9049 (127.0.0.1:7002) possibly failing.
7614:S 26 Oct 2025 19:54:16.117 * Marking node b2f0fc93632d12c06039335b025365d484ff9049 (127.0.0.1:7002) as failing (quorum reached).
7614:S 26 Oct 2025 19:54:16.117 # Cluster state changed: fail
7614:S 26 Oct 2025 19:54:16.117 # Cluster is currently down: At least one hash slot is not served by any available node. Please check the 'cluster-require-full-coverage' configuration.
7614:S 26 Oct 2025 19:54:16.574 * Reconfiguring node 524978a9924b5841c4c93f83581d202cfbce1443 (127.0.0.1:7005) as primary for shard 8488c1c467066206db60b136e9a38e1f583eb1f8

For now I use the node's IP and port combination as the nodename. If users have personal preference on the format, they can always update nodename via CONFIG SET cluster-announce-human-nodename MYNAME command or edit the config file.

If users prefer nodenames remain empty, they can run CONFIG SET cluster-announce-human-nodename "" to reset.

Signed-off-by: Zhijun <[email protected]>
@zhijun42 zhijun42 marked this pull request as draft October 26, 2025 14:14
@codecov
Copy link

codecov bot commented Oct 27, 2025

Codecov Report

❌ Patch coverage is 90.47619% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.40%. Comparing base (e19ceb7) to head (59b9805).

Files with missing lines Patch % Lines
src/cluster_legacy.c 90.47% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #2777      +/-   ##
============================================
- Coverage     72.45%   72.40%   -0.05%     
============================================
  Files           128      128              
  Lines         70414    70427      +13     
============================================
- Hits          51019    50995      -24     
- Misses        19395    19432      +37     
Files with missing lines Coverage Δ
src/config.c 78.44% <ø> (ø)
src/cluster_legacy.c 87.36% <90.47%> (-0.04%) ⬇️

... and 14 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@zhijun42 zhijun42 marked this pull request as ready for review October 27, 2025 05:40
Copy link
Collaborator

@hpatro hpatro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally use the first/last few characters of node-id while searching and I've never seen it conflicting with other node's id.

@zhijun42 Wouldn't you be able to set this externally on the engine by yourself?

@hpatro
Copy link
Collaborator

hpatro commented Oct 28, 2025

On this note, Docker's approach of generating random name for resource has always been interesting, easier to spot/remember.

https://github.com/moby/moby/blob/master/pkg/namesgenerator/names-generator.go

@zhijun42
Copy link
Contributor Author

@hpatro Thanks for reviewing!

I generally use the first/last few characters of node-id while searching and I've never seen it conflicting with other node's id.

Yeah the first/last few characters of node ID are unique enough. I assume your use case is perhaps like this: You have a node that did something interesting and/or you're interested in it, then you search through the logs using that node ID to see how this node interacted with others.

My use case is a bit different though. When developing locally (non Docker/K8s), I have a mini cluster running where each node is running in a terminal tab. I'd like to fully observe all behaviors of the cluster, so inside each terminal tab I'm reading through this node’s log stream and would like to instantly recognize peers it’s communicating with. A 40-hex node ID is cognitively heavy; 127.0.0.1:7002 is instant.

Note that I'm not searching for any particular node, I'm simply reading the entire log stream. And having logs like these would be very useful for me:

* FAIL message received from 26492268f63f184e0221402c4f6cb08ed7e941ea (127.0.0.1:7008) about 747e0f97a58cd5a7ac4c2130417fc47a807802ed (127.0.0.1:7004)
* FAIL message received from 26492268f63f184e0221402c4f6cb08ed7e941ea (127.0.0.1:7008) about b2f0fc93632d12c06039335b025365d484ff9049 (127.0.0.1:7002)
* FAIL message received from 26492268f63f184e0221402c4f6cb08ed7e941ea (127.0.0.1:7008) about 524978a9924b5841c4c93f83581d202cfbce1443 (127.0.0.1:7005)
* FAIL message received from 26492268f63f184e0221402c4f6cb08ed7e941ea (127.0.0.1:7008) about 03b3811da7b33451f6e4988295b0158f410d2c00 (127.0.0.1:7001)

Wouldn't you be able to set this externally on the engine by yourself?

Yeah we can always manually assign names to nodes, but I don't wanna bother doing this every time I start up a cluster. The default combination of IP + port proposed here is pretty recognizable for me.

On this note, Docker's approach of generating random name for resource has always been interesting, easier to spot/remember.

TIL! Browsed through the list and just found a lot of great guys in our history. We can probably do the same thing here in Valkey. My approach of using IP + port is functionally the same as using scientists' names - just helping developers associate a server node with some string they can immediately recognize. Since I can easily recognize IP + port, that is what I propose here.

The thing is Valkey cluster has most logs print out the node ID and nodename, but the nodename is always empty unless explicitly set, which is not great user experience for me.

@hpatro
Copy link
Collaborator

hpatro commented Nov 19, 2025

Fair. However, I don't see a mechanism for a user to disable printing the human nodename i.e. one can't set to empty. Rather setting it to empty would lead to printing ip:port. Would this be considered a breaking change?

@zuiderkwast / @enjoy-binbin thoughts ?

@zhijun42
Copy link
Contributor Author

Fair. However, I don't see a mechanism for a user to disable printing the human nodename i.e. one can't set to empty. Rather setting it to empty would lead to printing ip:port.

Good catch! I just pushed some changes to allow resetting the human nodename by running CONFIG SET cluster-announce-human-nodename "". Also added a test for this.

Would this be considered a breaking change?

Human nodename is a relatively new feature introduced in June 2023 and it's mostly used in logging. The only usage not in logging that I'm aware of is in the nodes.conf file like this:

f130ef4be2a5a3bdad469bba0f82e8f27e5411f6 127.0.0.1:7004@17004,,tls-port=0,nodename=127.0.0.1:7004,shard-id=b5551cc30b41cbdbc78628e77d0d63e10543a1e7 slave 9a4b1f71ed90fd482db902949169e846e600ce30 0 1763554378000 4 connected
vars currentEpoch 5 lastVoteEpoch 5

but the human nodename won't show up there if it's empty string. So overall I think we're pretty safe here.

@zuiderkwast
Copy link
Contributor

Fair. However, I don't see a mechanism for a user to disable printing the human nodename i.e. one can't set to empty. Rather setting it to empty would lead to printing ip:port. Would this be considered a breaking change?

@zuiderkwast / @enjoy-binbin thoughts ?

I don't think changes to the log messages are considered breaking changes. I think it's safe to change what's being logged.

However, if we store "127.0.0.1:7008" as a node's human node name internally, it also affects the output of commands like CLUSTER SHARDS and CLUSTER NODES where this would be appear as the human nodename, which it isn't really. Instead, I think we should just change what's being logged.

I do like to see this in the log as a fallback, instead of just and empty () as you would for any user that's not using the human nodename feature.

Can't we just change the logging, so wherever we log like ...

serverLog(LL_NOTICE, "Something.. %.40s (%s) something..", node->name, node->human_nodename);

... we change it to something like ...

serverLog(LL_NOTICE, "Something.. %.40s (%s) something..", node->name, humanReadableName(node));

... which would expand to either the configured human nodename or ip:port.

@zhijun42
Copy link
Contributor Author

@zuiderkwast

it also affects the output of commands like CLUSTER SHARDS and CLUSTER NODES

Actually it won't. CLUSTER SHARDS shows the node's hostname but not the human nodename. CLUSTER NODES doesn't show human nodename either, according to function clusterGenNodeDescription.

Can't we just change the logging, which would expand to either the configured human nodename or ip:port.

This is a good idea, but to do so we can't support the feature "Users may not like ip:port and want to reset human nodes to empty string" (although I'm not sure if this feature is actually desirable). From node A's point of view, node B's human nodename is empty string, and node A couldn't tell whether that's because of user not using the human nodename feature, or user explicitly setting human nodename to empty string.

In my current implementation, node A knows node B's human nodename is "127.0.0.1:7008", and if node A later sees this name become empty string, it knows the user sets so.

@zuiderkwast
Copy link
Contributor

zuiderkwast commented Nov 19, 2025

Actually it won't. CLUSTER SHARDS shows the node's hostname but not the human nodename.

Oh, right. But it's not unlikely that we'll add human nodename to CLUSTER SHARDS in the future. We added shard-id to CLUSTER SHARDS just recently in #2568.

Can't we just change the logging, which would expand to either the configured human nodename or ip:port.

This is a good idea, but to do so we can't support the feature "Users may not like ip:port and want to reset human nodes to empty string" (although I'm not sure if this feature is actually desirable).

To me, it is not desirable. If we do this a conditional in the logging, then we'll not say that we automatically assign a human nodename. We just log something useful instead of the human nodename when it's missing.

@zhijun42
Copy link
Contributor Author

@zuiderkwast Sounds good! I just added the helper function humanNodename and removed all changes previously used for feature "Users reset human nodename to empty".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants