-
Notifications
You must be signed in to change notification settings - Fork 418
Simplify Quiescence and prepare for splice integration #4007
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify Quiescence and prepare for splice integration #4007
Conversation
👋 Thanks for assigning @wpaulino as a reviewer! |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4007 +/- ##
==========================================
- Coverage 88.82% 88.64% -0.18%
==========================================
Files 175 174 -1
Lines 127767 127719 -48
Branches 127767 127719 -48
==========================================
- Hits 113486 113221 -265
- Misses 11712 12021 +309
+ Partials 2569 2477 -92
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
I was envisioning we'd have a Supporting quiescence throughout disconnects makes sense, but we may want to have a fixed number of retries or a timer so that we don't go on forever? |
This still has the persistence issue, though. The user could track that they no longer want a splice in a channel and refuse to sign, which always works (and maybe is simply what we should do and document that users should rely on it) but we can't provide a built-in "cancel" function that actually is guaranteed to cancel. I think maybe this is fine, though, we don't really need to provide a way to cancel upgrading a channel type (just let it happen when the peer finally comes online?), so the other use of quiescence is fine, I guess?
You mean because the other end may be rejecting the splice for some reason (why would they do that?)? |
Even then, I don't think it's possible to cancel because the counterparty may have already provided its
Could be that the peer is not around long enough to finish the negotiation before disconnecting again. Or maybe they require confirmed inputs (we don't support this at the moment), and we keep providing an unconfirmed one. |
Hmmmmmm. Maybe we do that then? Not supporting splice init during disconnection seems like a pretty terrible API but also not supporting cancel wouldn't be an option. This is more complicated (storing the user's signatures for later use, I guess, though we already have to support not sending CS until the monitor completes) but it seems not crazy bad for a much better API. It also doesn't have to happen to enable splicing and get started testing, just to ship.
I imagine in this case we want to keep retrying until it succeeds or the user cancels.
Should the peer not send a |
In any case, note that the quiesence action is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically LGTM
👋 The first review has been submitted! Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer. |
🔔 1st Reminder Hey @jkczyz! This PR has been waiting for your review. |
76210a8
to
b15949b
Compare
LGTM after squash |
b15949b
to
4a47f19
Compare
Squashed without further changes. |
In the case where we prepare to initiate quiescence, but cannot yet send our `stfu` because we're waiting on some channel operations to settle, and our peer ultimately sends their `stfu` before we can, we would detect this case and, if we were able, send an `stfu` which would allow us to send "something fundamental" first. While this is a nifty optimization, its a bit overkill - the chance that both us and our peer decide to attempt something fundamental at the same time is pretty low, and worse this required additional state tracking. We simply remove this optimization here, simplifying the quiescence state machine a good bit.
When we initiate quiescence, it should always be because we're trying to accomplish something (in the short term only splicing). In order to actually do that thing, we need to store the instructions for that thing somewhere the splicing logic knows to look at once we reach quiescence. Here we add a simple enum which will eventually store such actions.
There are a number of things in LDK where we've been lazy and not allowed the user to initiate an action while a peer is disconnected. While it may be accurate in the sense that the action cannot be started while the peer is disconnected, it is terrible dev UX - these actions can fail without the developer being at fault and the only way for them to address it is just try again. Here we fix this dev UX shortcoming for splicing, keeping any queued post-quiescent actions around when a peer disconnects and retrying the action (and quiescence generally) when the peer reconnects.
Rebased to address a conflict. |
4a47f19
to
c7e4887
Compare
if self.quiescent_action.is_some() { | ||
// If we're trying to get quiescent to do something, try again when we | ||
// reconnect to the peer. | ||
channel_state.set_awaiting_quiescence(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate logic for handling quiescent_action during disconnection
There appears to be duplicate logic for handling the quiescent_action
state during disconnection in two places:
- Lines 8209-8212:
if self.quiescent_action.is_some() {
// If we were trying to get quiescent, try again after reconnection.
self.context.channel_state.set_awaiting_quiescence();
}
- This code block (lines 12864-12867):
if self.quiescent_action.is_some() {
// If we're trying to get quiescent to do something, try again when we
// reconnect to the peer.
channel_state.set_awaiting_quiescence();
}
This duplication increases maintenance burden and risk of inconsistencies. Consider extracting this logic into a helper method that can be called from both locations to ensure consistent behavior.
if self.quiescent_action.is_some() { | |
// If we're trying to get quiescent to do something, try again when we | |
// reconnect to the peer. | |
channel_state.set_awaiting_quiescence(); | |
} | |
if self.quiescent_action.is_some() { | |
// If we're trying to get quiescent to do something, try again when we | |
// reconnect to the peer. | |
self.handle_quiescent_action_on_disconnect(channel_state); | |
} |
Spotted by Diamond
Is this helpful? React 👍 or 👎 to let us know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its one line of code...I'm not convinced this is worth it.
🔔 2nd Reminder Hey @jkczyz! This PR has been waiting for your review. |
Merging with just my approval as the changes here are pretty straightforward and well covered by tests. |
Mostly small tweaks to our quiescence logic,
but ultimately integrates splicing with quiescence.IMO its important that we allow quiescence-init while disconnected, but the only awkward part of the API is we can't cancel a splice once its started (except by FC'ing the channel). I started writing a cancel API but then realized its not possible because its only actually cancelled once the
ChannelManager
is persisted which may be a while. The other option we could consider is dropping the pending splice on restart and giving the user an API to see whether a splice happened or not (I guess its via theChannelReady
event?) and telling them to start again.WDYT @wpaulino and @jkczyz?
Updated to not touch splicing yet, just simplify the quiescence logic and prep for splice-integration.