-
Notifications
You must be signed in to change notification settings - Fork 422
Reconstruct ChannelManager forwarded HTLCs maps from Channels
#4227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Reconstruct ChannelManager forwarded HTLCs maps from Channels
#4227
Conversation
|
👋 Thanks for assigning @joostjager as a reviewer! |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4227 +/- ##
==========================================
- Coverage 89.33% 89.33% -0.01%
==========================================
Files 180 180
Lines 139042 139248 +206
Branches 139042 139248 +206
==========================================
+ Hits 124219 124403 +184
- Misses 12196 12218 +22
Partials 2627 2627
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Nvm, our test coverage for reload of these maps is just pretty incomplete. |
0b4eb68 to
ce2ccac
Compare
|
Updated with new testing and a few tweaks: diff Will rebase next |
ce2ccac to
1e86521
Compare
joostjager
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did an initial review pass. Even though the change is pretty contained, I still found it difficult to fully follow what's happening.
Overall I think comments are very much on the light side in LDK, and the code areas touched in this PR are no exception in my opinion. Maybe, now that you've invested the time to build understanding, you can liberally sprinkle comments on your changes and nearby code?
| payment_hash: PaymentHash(Sha256::hash(&[42; 32]).to_byte_array()), | ||
| cltv_expiry: 300000000, | ||
| state: InboundHTLCState::Committed, | ||
| state: InboundHTLCState::Committed { update_add_htlc_opt: None }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it ok to never populate update_add_htlc_opt in the tests in this commit, if that isn't supposed to happen in prod?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, because that field is only used by the manager on restart. Hopefully it's clearer with docs are on the field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought maybe the copying of the update_add message into InboundHTLCState needs to be covered in this commit. It isn't currently. But coverage is probably obtained later on.
| args_b_c.send_announcement_sigs = (true, true); | ||
| reconnect_nodes(args_b_c); | ||
|
|
||
| // Forward the HTLC and ensure we can claim it post-reload. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would happen again without the changes in this PR? I assume the htlc wouldn't be forwarded here, but would it be failed back instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean if we did the 3-line fix discussed offline? I don't think so, because the manager will only fail HTLCs that it knows about. I think we just wouldn't handle the HTLC and the channel would FC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes indeed, if we just discard the forward htlcs. I thought there was still some inbound htlc timer somewhere, but I guess not then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how fixing the FC by failing back the htlc and then completely forgetting about forwards on restart would compare to the current approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... seems like a significant UX downgrade for routing nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But is it significant? Restart is probably rare on a routing node, and then also doing it right when a forward is being processed is even more rare?
Similar to the other /target directories we ignore where a bunch of files are generated during testing.
1e86521 to
26c6dc7
Compare
|
I still need to finish adding some comments and responding to feedback, but I believe the tests are done and this should be mostly there so pushed up the latest state. |
We had discussed whether to split out |
26c6dc7 to
ad79155
Compare
Probably not in this PR since it's already decently sized and has a clear scope, but I will look into it for follow-up! |
ad79155 to
6b919ce
Compare
We have an overarching goal of (mostly) getting rid of ChannelManager persistence and rebuilding the ChannelManager's state from existing ChannelMonitors, due to issues when the two structs are out-of-sync on restart. The main issue that can arise is channel force closure. As part of this, we plan to store at least parts of Channels in ChannelMonitors, and that Channel data will be used in rebuilding the manager. Once we store update_adds in Channels, we can use them on restart when reconstructing ChannelManager maps such as forward_htlcs and pending_intercepted_htlcs. Upcoming commits will start doing this reconstruction.
We have an overarching goal of (mostly) getting rid of ChannelManager persistence and rebuilding the ChannelManager's state from existing ChannelMonitors, due to issues when the two structs are out-of-sync on restart. The main issue that can arise is channel force closure. As part of rebuilding ChannelManager forward HTLCs maps, we will also add a fix that will regenerate HTLCIntercepted events for HTLC intercepts that are present but have no corresponding event in the queue. That fix will use this new method.
6b919ce to
0b83e80
Compare
SGTM! |
|
Btw, feel free to ping me whenever you think this is ready for a secondary reviewer! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the new commit structure. Left a few remarks and questions, some probably due to a few pieces of understanding missing in my mental model.
| // received on an old pre-0.1 version of LDK. In this case, we can't set | ||
| // `update_add_htlc_opt` here and need to rely on the HTLC being present in | ||
| // the legacy `ChannelManager` maps instead for it to be handled properly, if | ||
| // the HTLC is still around the next time we restart. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if the HTLC isn't around, I don't think this is making the situation worse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify what you mean, or a suggested phrasing?
| payment_hash: PaymentHash(Sha256::hash(&[42; 32]).to_byte_array()), | ||
| cltv_expiry: 300000000, | ||
| state: InboundHTLCState::Committed, | ||
| state: InboundHTLCState::Committed { update_add_htlc_opt: None }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought maybe the copying of the update_add message into InboundHTLCState needs to be covered in this commit. It isn't currently. But coverage is probably obtained later on.
| // Reconstruct `ChannelManager::decode_update_add_htlcs` from the serialized | ||
| // `Channel`, as part of removing the requirement to regularly persist the | ||
| // `ChannelManager`. | ||
| decode_update_add_htlcs.insert( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this be done in the next loop over the monitors where the update_add_htlcs are deduped? This loop seems to be about payments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, IMO the first loop is about repopulating HTLC maps from monitors, and the second loop is about pruning maps from monitors (hence having two loops so we don't prune before the maps are completely filled out), so this approach seems to fit.
|
|
||
| forward_htlcs: Mutex::new(forward_htlcs_legacy), | ||
| decode_update_add_htlcs: Mutex::new(decode_update_add_htlcs_legacy), | ||
| decode_update_add_htlcs: Mutex::new(decode_update_add_htlcs), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the final commit of this PR, and there it seems that decode_update_add_htlcs_legacy isn't used at all anymore. Shouldn't it still be input in case we don't have data from monitor(s)? Maybe I am overlooking something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I fixed this and added a test.
|
👋 The first review has been submitted! Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer. |
We have an overarching goal of (mostly) getting rid of ChannelManager persistence and rebuilding the ChannelManager's state from existing ChannelMonitors, due to issues when the two structs are out-of-sync on restart. The main issue that can arise is channel force closure. We'll use this new util when reconstructing the ChannelManager::decode_update_add_htlcs map from Channel data in upcoming commits. While the Channel data is not included in the monitors yet, it will be in future work.
We have an overarching goal of (mostly) getting rid of ChannelManager persistence and rebuilding the ChannelManager's state from existing ChannelMonitors, due to issues when the two structs are out-of-sync on restart. The main issue that can arise is channel force closure. Soon we'll be reconstructing these now-legacy maps from Channel data (that will also be included in ChannelMonitors in future work), so rename them as part of moving towards not needing to persist them in ChannelManager.
Makes an upcoming commit cleaner
We have an overarching goal of (mostly) getting rid of ChannelManager persistence and rebuilding the ChannelManager's state from existing ChannelMonitors, due to issues when the two structs are out-of-sync on restart. The main issue that can arise is channel force closure. Here we start this process by rebuilding ChannelManager::decode_update_add_htlcs from the Channels, which will soon be included in the ChannelMonitors as part of a different series of PRs. The newly built map is not yet used but will be in the next commit.
We have an overarching goal of (mostly) getting rid of ChannelManager persistence and rebuilding the ChannelManager's state from existing ChannelMonitors, due to issues when the two structs are out-of-sync on restart. The main issue that can arise is channel force closure. Here we start this process by rebuilding ChannelManager::decode_update_add_htlcs, forward_htlcs, and pending_intercepted_htlcs from Channel data, which will soon be included in the ChannelMonitors as part of a different series of PRs. We also fix the reload_node test util to use the node's pre-reload config after restart. The previous behavior was a bit surprising and led to one of this commit's tests failing.
We have an overarching goal of (mostly) getting rid of ChannelManager persistence and rebuilding the ChannelManager's state from existing ChannelMonitors, due to issues when the two structs are out-of-sync on restart. The main issue that can arise is channel force closure. In the previous commit we started this process by rebuilding ChannelManager::decode_update_add_htlcs, forward_htlcs, and pending_intercepted_htlcs from the Channel data, which will soon be included in the ChannelMonitors as part of a different series of PRs. Here we test that HTLC forwards that were originally received on 0.2 can still be successfully forwarded using the new reload + legacy handling code that will be merged for 0.3.
0b83e80 to
dfef41a
Compare
|
I think the sermver CI failure is spurious |
We have an overarching goal of (mostly) getting rid of
ChannelManagerpersistence and rebuilding theChannelManager's state from existingChannelMonitors, due to issues when the two structs are out-of-sync on restart. The main issue that can arise is channel force closure.Here we start this process by rebuilding
ChannelManager::decode_update_add_htlcs,forward_htlcs, andpending_intercepted_htlcsfrom theChannels, which will soon be included in theChannelMonitors as part of #4218.Channels will not contain committedupdate_add_htlcsdecode_update_add_htlcsmap and ignoring the legacy maps. This may indicate missing test coverage, since in theory we need to re-forward these HTLCs sometimes so they go back in theforward_htlcsmap for processing