peer+lnd: add new CLI option to control if we D/C on slow pongs #9801

Roasbeef · 2025-05-09T23:16:28Z

In this commit, we add a new CLI option to control if we D/C on slow pongs or not. Due to the existence of head-of-the-line blocking at various levels of abstraction (app buffer, slow processing, TCP kernel buffers, etc), if there's a flurry of gossip messages (eg: 1K channel updates), then even with a reasonable processing latency, a peer may still not read our ping in time.

To combat this, we change the default behavior to just logging for slow pongs, and add a new CLI option to re-enable the old behavior.

Along the way, we also add some more enhanced logging, so we can tell when the last successful ping was, and also the deadline reached.

coderabbitai · 2025-05-09T23:16:34Z

Important

Review skipped

Auto reviews are limited to specific labels.

🏷️ Labels to auto review (1)

llm-review

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

djkazic · 2025-05-10T22:50:36Z

tACK, this PR improves connection reliability with my peers compared to just 0.19.0-beta-rc4.

Some slight weirdness that I figured would be good to document:

Pong size did not match expected size:

2025-05-10 12:15:51.125 [WRN] PEER: Peer(03f311f6bb92443e93ee75af7b8cabe0b8bd49f7eaa2757086124c1a75cbe2bfcd): pong response failure for 03f311f6bb92443e93ee75af7b8cabe0b8bd49f7eaa2757086124c1a75cbe2bfcd@96.230.252.205:9734: pong response does not match expected size. Expected: 1486, Got: 788. Time waited for this pong: 28.401370887s. Last successful RTT: 209.450171ms. -- not disconnecting due to config
2025-05-10 13:04:19.667 [WRN] PEER: Peer(0206dfc7fa5f56f3ce752c6621b1d03223c11b16cffb05835b67481c12c6d87031): pong response failure for 0206dfc7fa5f56f3ce752c6621b1d03223c11b16cffb05835b67481c12c6d87031@96.230.252.205:16079: pong response does not match expected size. Expected: 1402, Got: 1846. Time waited for this pong: 23.744974397s. Last successful RTT: 2.675905ms. -- not disconnecting due to config
2025-05-10 13:55:26.524 [WRN] PEER: Peer(021d2436cab847373a4212bf6d754ead5304f5d0791479643893a837b295f3441c): pong response failure for 021d2436cab847373a4212bf6d754ead5304f5d0791479643893a837b295f3441c@10.21.21.11:56890: pong response does not match expected size. Expected: 1164, Got: 3340. Time waited for this pong: 1.071593118s. Last successful RTT: 882.700042ms. -- not disconnecting due to config

Back-to-back repeated Pongs (remote is CLN):

2025-05-10 14:59:23.851 [DBG] PEER: Peer(03f311f6bb92443e93ee75af7b8cabe0b8bd49f7eaa2757086124c1a75cbe2bfcd): Received Pong(len(pong_bytes)=2146) from 03f311f6bb92443e93ee75af7b8cabe0b8bd49f7eaa2757086124c1a75cbe2bfcd@96.230.252.205:9734
2025-05-10 14:59:23.851 [DBG] PEER: Peer(03f311f6bb92443e93ee75af7b8cabe0b8bd49f7eaa2757086124c1a75cbe2bfcd): Received Pong(len(pong_bytes)=1267) from 03f311f6bb92443e93ee75af7b8cabe0b8bd49f7eaa2757086124c1a75cbe2bfcd@96.230.252.205:9734
2025-05-10 14:59:23.851 [DBG] PEER: Peer(03f311f6bb92443e93ee75af7b8cabe0b8bd49f7eaa2757086124c1a75cbe2bfcd): Received Pong(len(pong_bytes)=1052) from 03f311f6bb92443e93ee75af7b8cabe0b8bd49f7eaa2757086124c1a75cbe2bfcd@96.230.252.205:9734
2025-05-10 14:59:23.851 [WRN] PEER: Peer(03f311f6bb92443e93ee75af7b8cabe0b8bd49f7eaa2757086124c1a75cbe2bfcd): pong response failure for 03f311f6bb92443e93ee75af7b8cabe0b8bd49f7eaa2757086124c1a75cbe2bfcd@96.230.252.205:9734: pong response does not match expected size. Expected: 3013, Got: 2146. Time waited for this pong: 1.127449045s. Last successful RTT: 209.450171ms. -- not disconnecting due to config

guggero

I think the flag could be useful!

peer/ping_manager.go

config.go

ziggie1984 · 2025-05-12T16:24:01Z

I propose adding a Connection-Failure-Treshold (N) which only disconnects after several failed ping/pongs in a row rather than just adding a connection flag wdyt ?

Roasbeef · 2025-05-12T23:02:26Z

I propose adding a Connection-Failure-Treshold (N) which only disconnects after several failed ping/pongs in a row rather than just adding a connection flag wdyt ?

I had also considered making it a sort of peer level healthcheck, to inherit that threshold logic, but instead went in this direction as I started to second guess the design rationale in disconnecting in the first place. I think if we add a threshold flag, then we'd also want to add a flag to tune what the timeout value should be.

When I started to run this on my node (even w/ the super prio queue), I noticed some nodes that were just persistently slow in replying.

Ultimately slow nodes do affect payment latency e2e. There's a credible design direction here where we start to factor it in at the first hop level, but then also have the link sample the ping RTT of a peer, and decide if the link is even eligible to send based on that.

ellemouton

the last commit does not change existing behaviour afaict

peer/brontide.go

peer/ping_manager.go

Roasbeef · 2025-05-14T21:14:19Z

The latest version inverts the original PR: default stays, but users have an option to turn off the disconnect behavior.

guggero

Pushed up a fix for the sample config and also made the behavior of return vs. continue in the ping manager more clear.

With that, LGTM 🎉

peer/ping_manager.go

ellemouton

just one comment - otherwise lgtm!

peer/ping_manager.go

In this commit, we add a new CLI option to control if we D/C on slow pongs or not. Due to the existence of head-of-the-line blocking at various levels of abstraction (app buffer, slow processing, TCP kernel buffers, etc), if there's a flurry of gossip messages (eg: 1K channel updates), then even with a reasonable processing latency, a peer may still not read our ping in time. To give users another option, we add a flag that allows users to disable this behavior. The default remains.

ziggie1984

LGTM

ziggie1984 · 2025-05-15T15:16:03Z

peer/brontide.go

+			// If NoDisconnectOnPongFailure is true, we don't
+			// disconnect. Otherwise (if it's false, the default),
+			// we disconnect.


Nit: Comment can be removed, describes the code.

ziggie1984 · 2025-05-15T15:18:42Z

peer/ping_manager.go

 }

+// getLastRTT safely retrieves the last known RTT, returning 0 if none exists.
+func (m *PingManager) getLastRTT() time.Duration {


Nit: Use fn.Option here as well similar as below pendingPingWait ?

ziggie1984 · 2025-05-15T15:57:12Z

peer/ping_manager_test.go

+						close(pingSent)
+					})
+				},
+				OnPongFailure: func(err error,


Nit: should be protected by a Once struct maybe as well, because it is closing a channel ?

Roasbeef force-pushed the pong-relax branch 2 times, most recently from eeb7897 to 8c23404 Compare May 9, 2025 23:53

guggero reviewed May 12, 2025

View reviewed changes

peer/ping_manager.go Outdated Show resolved Hide resolved

peer/ping_manager.go Show resolved Hide resolved

peer/ping_manager.go Show resolved Hide resolved

config.go Outdated Show resolved Hide resolved

saubyk assigned Roasbeef May 12, 2025

saubyk added this to the v0.19.0 milestone May 12, 2025

ziggie1984 self-requested a review May 12, 2025 09:08

Roasbeef force-pushed the pong-relax branch from 8c23404 to 5a2fbed Compare May 12, 2025 23:04

ellemouton reviewed May 13, 2025

View reviewed changes

peer/brontide.go Show resolved Hide resolved

peer/ping_manager.go Outdated Show resolved Hide resolved

peer/ping_manager.go Outdated Show resolved Hide resolved

Roasbeef force-pushed the pong-relax branch 3 times, most recently from 3c75628 to f0fcffa Compare May 14, 2025 21:12

Roasbeef force-pushed the pong-relax branch from f0fcffa to 748c3fe Compare May 14, 2025 21:15

guggero force-pushed the pong-relax branch 2 times, most recently from 1f498d3 to d6d25a9 Compare May 15, 2025 07:49

guggero approved these changes May 15, 2025

View reviewed changes

peer/ping_manager.go Show resolved Hide resolved

guggero requested a review from ellemouton May 15, 2025 07:51

ellemouton approved these changes May 15, 2025

View reviewed changes

peer/ping_manager.go Show resolved Hide resolved

Roasbeef added 2 commits May 15, 2025 16:36

docs/release-notes: add release notes entry for pong default change

e2c56af

guggero force-pushed the pong-relax branch from d6d25a9 to e2c56af Compare May 15, 2025 14:37

guggero mentioned this pull request May 15, 2025

build: bump version to v0.19.0-beta.rc5 #9811

Merged

ziggie1984 approved these changes May 15, 2025

View reviewed changes

Roasbeef merged commit 71dbc18 into lightningnetwork:master May 15, 2025
32 of 36 checks passed

peer+lnd: add new CLI option to control if we D/C on slow pongs #9801

peer+lnd: add new CLI option to control if we D/C on slow pongs #9801

Uh oh!

Conversation

Roasbeef commented May 9, 2025

Uh oh!

coderabbitai bot commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

djkazic commented May 10, 2025

Uh oh!

guggero left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ziggie1984 commented May 12, 2025

Uh oh!

Roasbeef commented May 12, 2025

Uh oh!

ellemouton left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Roasbeef commented May 14, 2025

Uh oh!

guggero left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ellemouton left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ziggie1984 left a comment

Choose a reason for hiding this comment

Uh oh!

ziggie1984 May 15, 2025

Choose a reason for hiding this comment

Uh oh!

ziggie1984 May 15, 2025

Choose a reason for hiding this comment

Uh oh!

ziggie1984 May 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot commented May 9, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)