Skip to content

Ensure partial MPP claims continue to blocks channels on restart #3928

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

TheBlueMatt
Copy link
Collaborator

In 9cc6e08, we started using the RAAMonitorUpdateBlockingAction logic to block RAAs which may remove a preimage from one ChannelMonitor if it isn't durably stored in another that is a part of the same MPP claim.

At the time, when we claimed a payment, if we saw that the HTLC was already claimed in the channel, we'd simply drop the new RAA blocker. This can happen on reload when replaying MPP claims.

However, just because an HTLC is no longer present in ChannelManager's Channel, doesn't mean that the ChannelMonitorUpdate which stored the preimage actually made it durably into the ChannelMonitor on disk.

We could begin an MPP payment, have one channel get the preimage durably into its ChannelMonitor, then step forward another update with the peer. Then, we could reload, causing the MPP claims to be replayed across all chanels, leading to the RAA blocker(s) being dropped and all channels being unlocked. Finally, if the first channel managed to step forward a further update with its peer before the (now-replayed) ChannelMonitorUpdates for all MPP parts make it to disk we could theoretically lose the preimage.

This is, of course, a somewhat comically unlikely scenario, but I had an old note to expand the test and it turned up the issue, so we might as well fix it.

@TheBlueMatt TheBlueMatt added this to the 0.2 milestone Jul 15, 2025
@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Jul 15, 2025

👋 Thanks for assigning @wpaulino as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@ldk-reviews-bot ldk-reviews-bot requested a review from jkczyz July 15, 2025 00:47
@ldk-reviews-bot
Copy link

🔔 1st Reminder

Hey @jkczyz! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@TheBlueMatt
Copy link
Collaborator Author

Rebased.

@TheBlueMatt TheBlueMatt force-pushed the 2024-09-future-async-tests branch from 490d6e4 to 6c34c56 Compare July 17, 2025 15:39
@wpaulino wpaulino self-requested a review July 17, 2025 15:50
@ldk-reviews-bot
Copy link

🔔 2nd Reminder

Hey @jkczyz @wpaulino! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 1st Reminder

Hey @jkczyz @wpaulino! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 3rd Reminder

Hey @jkczyz @wpaulino! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 2nd Reminder

Hey @jkczyz @wpaulino! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 4th Reminder

Hey @jkczyz @wpaulino! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 3rd Reminder

Hey @jkczyz @wpaulino! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 5th Reminder

Hey @jkczyz @wpaulino! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 4th Reminder

Hey @jkczyz @wpaulino! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 6th Reminder

Hey @jkczyz @wpaulino! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 5th Reminder

Hey @jkczyz @wpaulino! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

wpaulino
wpaulino previously approved these changes Jul 28, 2025
@wpaulino
Copy link
Contributor

Needs a rebase

@ldk-reviews-bot
Copy link

🔔 7th Reminder

Hey @jkczyz! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 8th Reminder

Hey @jkczyz! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 9th Reminder

Hey @jkczyz! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@TheBlueMatt
Copy link
Collaborator Author

Rebased.

Copy link

codecov bot commented Aug 4, 2025

Codecov Report

❌ Patch coverage is 93.10345% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.00%. Comparing base (61e5819) to head (583a9a3).
⚠️ Report is 14 commits behind head on main.

Files with missing lines Patch % Lines
lightning/src/ln/chanmon_update_fail_tests.rs 93.05% 3 Missing and 2 partials ⚠️
lightning/src/ln/channelmanager.rs 93.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3928      +/-   ##
==========================================
+ Coverage   88.94%   89.00%   +0.05%     
==========================================
  Files         174      174              
  Lines      124201   124299      +98     
  Branches   124201   124299      +98     
==========================================
+ Hits       110472   110632     +160     
+ Misses      11251    11187      -64     
- Partials     2478     2480       +2     
Flag Coverage Δ
fuzzing 22.22% <0.00%> (-0.41%) ⬇️
tests 88.83% <93.10%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jkczyz jkczyz requested review from valentinewallace and removed request for jkczyz August 4, 2025 16:15
Comment on lines +8299 to +8302
// If there are monitor updates in flight, we may be in the case
// described above, replaying a claim on startup which needs an RAA
// blocker to remain blocked. Thus, in such a case we simply push the
// post-update action to the blocked list and move on.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on an existing test, it seems like we can also hit this case when !during_init, but the test failure is test_glacial_peer_cant_hang so probably rare enough to not need to document.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, if I put an assert!(during_init) in the Some(raa_blocker) branch I don't see it get hit at all? I think it also makes sense - raa blockers are only used when claiming payments, not forwarded HTLCs, and hitting a DuplicateClaim for a payment at runtime seems weird - we shouldn't be double-claiming something cause we deduplicate on the claimable_payments set.

Copy link
Contributor

@valentinewallace valentinewallace Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm talking about adding an assert here:

diff --git a/lightning/src/ln/channelmanager.rs b/lightning/src/ln/channelmanager.rs
index b60b0419a..8b7f89119 100644
--- a/lightning/src/ln/channelmanager.rs
+++ b/lightning/src/ln/channelmanager.rs
@@ -8302,6 +8302,7 @@ where
                                                        // post-update action to the blocked list and move on.
                                                        let in_flight_mons = peer_state.in_flight_monitor_updates.get(&chan_id);
                                                        if in_flight_mons.map(|(_, mons)| !mons.is_empty()).unwrap_or(false) {
+                                                               assert!(during_init);
                                                                peer_state
                                                                        .monitor_update_blocked_actions
                                                                        .entry(chan_id)

So the case where we have (1) a duplicate claim, (2) in-flight monitor updates present and (3) a post-update action whilst !during_init.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and tweaked the comment, but it shouldn't matter too much

$ git diff-tree -U1 c15c1ecbd6 55eea49cd0
diff --git a/lightning/src/ln/channelmanager.rs b/lightning/src/ln/channelmanager.rs
index b60b0419a2..229d594d06 100644
--- a/lightning/src/ln/channelmanager.rs
+++ b/lightning/src/ln/channelmanager.rs
@@ -8302,2 +8302,4 @@ where
 							// post-update action to the blocked list and move on.
+							// In any case, we should air on the side of caution and not process
+							// the post-update action no matter the situation.
 							let in_flight_mons = peer_state.in_flight_monitor_updates.get(&chan_id);

@TheBlueMatt TheBlueMatt force-pushed the 2024-09-future-async-tests branch from 12c37b2 to c15c1ec Compare August 5, 2025 11:52
@TheBlueMatt
Copy link
Collaborator Author

Tweaked the commit message.

wpaulino
wpaulino previously approved these changes Aug 5, 2025
In 9cc6e08, we started using the
`RAAMonitorUpdateBlockingAction` logic to block RAAs which may
remove a preimage from one `ChannelMonitor` if it isn't durably
stored in another that is a part of the same MPP claim.

Then, in 254b78f, when we claimed
a payment, if we saw that the HTLC was already claimed in the
channel, we'd simply drop the new RAA blocker. This can happen on
reload when replaying MPP claims.

However, just because an HTLC is no longer present in
`ChannelManager`'s `Channel`, doesn't mean that the
`ChannelMonitorUpdate` which stored the preimage actually made it
durably into the `ChannelMonitor` on disk.

We could begin an MPP payment, have one channel get the preimage
durably into its `ChannelMonitor`, then step forward another update
with the peer. Then, we could reload, causing the MPP claims to be
replayed across all chanels, leading to the RAA blocker(s) being
dropped and all channels being unlocked. Finally, if the first
channel managed to step forward a further update with its peer
before the (now-replayed) `ChannelMonitorUpdate`s for all MPP parts
make it to disk we could theoretically lose the preimage.

This is, of course, a somewhat comically unlikely scenario, but I
had an old note to expand the test and it turned up the issue, so
we might as well fix it.
@TheBlueMatt
Copy link
Collaborator Author

I believe its safe to backport this as-is.

@TheBlueMatt TheBlueMatt merged commit 869866f into lightningdevkit:main Aug 5, 2025
24 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants