Skip to content

Fix initial commitment_signed for splicing #4014

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

jkczyz
Copy link
Contributor

@jkczyz jkczyz commented Aug 15, 2025

When sending or handling the initial commitment_signed message for splicing, the previous commitment point and number must be used. Track this and fix the current behavior which was using the next commitment point and number by mistake. Also, refactor HolderCommitmentPoint as the use of "current" is confusing when prefixing holder_commitment_point fields with next_ and previous_.

Found when reviewing #3886 (comment).

The only difference between the two variants is the next point, which
can be stored using an Option for simplicity. The naming of the
Available variant is also confusing as it refers to the next commitment
point. But HolderCommitmentPoint is typically used to represent the next
point, which is actually stored in the current field. Drop the "current"
nomenclature to avoid confusion.
@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Aug 15, 2025

👋 Thanks for assigning @TheBlueMatt as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@jkczyz jkczyz self-assigned this Aug 15, 2025
@jkczyz jkczyz requested review from TheBlueMatt and wpaulino August 15, 2025 00:49
Copy link

codecov bot commented Aug 15, 2025

Codecov Report

❌ Patch coverage is 85.95041% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.84%. Comparing base (3b939c0) to head (13c81a1).
⚠️ Report is 51 commits behind head on main.

Files with missing lines Patch % Lines
lightning/src/ln/channel.rs 85.95% 11 Missing and 6 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4014      +/-   ##
==========================================
- Coverage   88.91%   88.84%   -0.07%     
==========================================
  Files         174      175       +1     
  Lines      125066   127775    +2709     
  Branches   125066   127775    +2709     
==========================================
+ Hits       111197   113523    +2326     
- Misses      11358    11684     +326     
- Partials     2511     2568      +57     
Flag Coverage Δ
fuzzing 21.75% <39.83%> (-0.34%) ⬇️
tests 88.67% <85.95%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jkczyz jkczyz force-pushed the 2025-08-splice-initial-commitment-signed branch 2 times, most recently from cb521b4 to e2ed042 Compare August 15, 2025 15:30
@jkczyz jkczyz requested a review from wpaulino August 15, 2025 15:32
@@ -14005,6 +14025,7 @@ where
is_holder_quiescence_initiator: None,
},
interactive_tx_signing_session,
previous_holder_commitment_point,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems really weird to store a { transaction_number, point, next_point } in FundedChannel when two of the three fields are redundant with next_holder_commitment_point. Why can't we just store the one extra key in HolderCommitmentPoint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are places in the code that take a HolderCommitmentPoint, so it wouldn't be clear which point is needed. Currently, next_point is simply used as a means to determine if we can advance. It's never actually accessed outside serialization.

Also, the code wouldn't be very readable as next_holder_commitment_point.previous_point() to get the current point. And if we keep the field named holder_commitment_point, it's "current" is the "next", and it's previous is the "current". The primary motivation if this approach is to avoid confusion.

That said, we can get away with only persisting the new point, even there is some in-memory duplication. We could probably clear the "next" point form current, too, since that shouldn't be needed after advancing the point.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are places in the code that take a HolderCommitmentPoint, so it wouldn't be clear which point is needed.

Its not really clear which point is needed already? We need two different points depending on if we're validating a commitment_signed for a splice or during normal operation. I don't really see how passing a u64 and a PublicKey makes those places any less readable.

Also, the code wouldn't be very readable as next_holder_commitment_point.previous_point() to get the current point. And if we keep the field named holder_commitment_point, it's "current" is the "next", and it's previous is the "current". The primary motivation if this approach is to avoid confusion.

Seems like one more rename of the method would do it? next and current on a holder_commitment_point would be correct, and as you note we don't even access the after-next, so it would read fine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not really clear which point is needed already? We need two different points depending on if we're validating a commitment_signed for a splice or during normal operation. I don't really see how passing a u64 and a PublicKey makes those places any less readable.

I suppose it's more about mistakenly passing a point and number that don't correspond to each other.

Seems like one more rename of the method would do it? next and current on a holder_commitment_point would be correct, and as you note we don't even access the after-next, so it would read fine?

What's the field of third point named? next_next_point?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it's more about mistakenly passing a point and number that don't correspond to each other.

Hmm, then maybe its most readable to pass a HolderCommitmentPoint with a enum { Splicing, Normal } as a separate argument?

What's the field of third point named? next_next_point?

Sure? Given its not used very many places that seems fine.

Copy link
Contributor Author

@jkczyz jkczyz Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it's more about mistakenly passing a point and number that don't correspond to each other.

Hmm, then maybe its most readable to pass a HolderCommitmentPoint with a enum { Splicing, Normal } as a separate argument?

Looks like it's only one place, which immediately takes each part to pass to another method. So probably ok just to pass the parts.

Sure? Given its not used very many places that seems fine.

Went with pending_next_point instead.

One other thing about using current_point and next_point naming is transaction_number is not for current (which is an Option anyway), so I'll need to rename it to next_transaction_number and have both current_transaction_number and next_transaction_number methods, presumably.

So we're pushing Option into the type. I think I'm fine with this as we keep the advance functionality encapsulated in the type. However, now UnfundedChannelContext::initial_holder_commitment_point is used as initial_holder_commitment_point.next_point(), which is a little confusing. Maybe we drop initial_ from that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, now UnfundedChannelContext::initial_holder_commitment_point is used as initial_holder_commitment_point.next_point(), which is a little confusing. Maybe we drop initial_ from that?

sure, makes sense to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it's more about mistakenly passing a point and number that don't correspond to each other.

Hmm, then maybe its most readable to pass a HolderCommitmentPoint with a enum { Splicing, Normal } as a separate argument?

Looks like it's only one place, which immediately takes each part to pass to another method. So probably ok just to pass the parts.

Was actually a few places, but I think it's fine as is. We need the current one only for the initial commitment signed for a splice, so checking the FundingScope wouldn't work either.

next_holder_commitment_point.transaction_number() + 1;
let previous_point = holder_signer
.get_per_commitment_point(previous_holder_commitment_transaction_number, &secp_ctx)
.expect(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't always rely on this on startup, we need to persist the new point. The logic we had relied on was "you have to do a sync-signer load once, then you can switch to async signing", but now we require you to have a sync signer always on startup, which isn't acceptable.

I'm also kinda not a fan of requiring a sync signer again once after we told people they can do async signing after loading once with a previous version. Given this is only used in splicing is it that much work to make the new key Optional and just fail splices until its filled in (which should happen any time we step the state machine forward)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that should be simple enough to do. Though, we'd advertise to our peer that channel supports splicing but may fail on the first attempt.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's fine. "If you're using async signing then splicing may fail until you do a channel update" seems like a rare enough thing that we can just document it and live with it. Not many of our users do async signing anyway.

@jkczyz jkczyz force-pushed the 2025-08-splice-initial-commitment-signed branch from e2ed042 to 96a7870 Compare August 15, 2025 19:56
@jkczyz jkczyz requested a review from TheBlueMatt August 15, 2025 21:49
Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit uncertain about the design for the access to the previous point - #4014 (comment)

@ldk-reviews-bot
Copy link

🔔 1st Reminder

Hey @wpaulino! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

jkczyz added 6 commits August 18, 2025 15:54
HolderCommitmentPoint::next_point represents the point to use after
advancing the point. However, HolderCommitmentPoint::point actually
represents the next point we'd expect to receive. This will be renamed
in the next commit, so to avoid clashing the existing next_point field
needs to be renamed.
HolderCommitmentPoint::point represents the next point we expect to
receive. Rename it to next_point to reflect that and similarly rename
transaction_number. The next commit will add the current point, which is
needed in splicing.
When splicing a channel, sending and receiving the initial
commitment_signed message should use the current commitment point
rather then the next one. Store this in HolderCommitmentPoint whenever
it is advanced.
When splicing a channel, the initial commitment_signed received should
use the same commitment number and point previously received prior to
splicing the channel. Account for this by checking the commitment_signed
against that one instead, which is now stored separately in
FundedChannel.
When splicing a channel, the initial commitment_signed should use the
same commitment number and point previously sent. Account for this by
adjusting these to use the previous commitment number and point, since
the next expected one is stored.
@jkczyz jkczyz force-pushed the 2025-08-splice-initial-commitment-signed branch from 96a7870 to 13c81a1 Compare August 18, 2025 21:15
@jkczyz jkczyz requested a review from TheBlueMatt August 18, 2025 21:17
if self.holder_commitment_point.current_point().is_none() {
return Err(APIError::APIMisuseError {
err: format!(
"Channel {} cannot be spliced, commitment point needs to be advanced once",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe phrase this so it's helpful to the user? Something like "a payment needs to be sent/received first"

@@ -1271,6 +1273,14 @@ impl HolderCommitmentPoint {
self.pending_next_point.is_some()
}

pub fn current_transaction_number(&self) -> u64 {
self.next_transaction_number + 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit weird this could give you an invalid commitment number if the HolderCommitmentPoint hasn't been advanced past INITIAL_COMMITMENT_NUMBER but it seems pre-existing anyway

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's preexisting and I think the only time we see that when get_cur_holder_commitment_transaction_number is called from ChannelManager::read.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a few things to fix, but I'm gonna go ahead and land this to get the bulk of the code landed so there isn't rebase and cause it already has an ACK.

@@ -10707,6 +10729,13 @@ where

// TODO(splicing): Add check that we are the quiescence acceptor

if self.holder_commitment_point.current_point().is_none() {
return Err(ChannelError::Warn(format!(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this need to WarnAndDisconnect to actually fail the splice?

match (holder_commitment_point_next_opt, holder_commitment_point_pending_next_opt) {
(Some(next_point), pending_next_point) => HolderCommitmentPoint {
next_transaction_number: holder_commitment_next_transaction_number,
current_point: None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this use the value read from disk?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah.. I think this was missed when resolving a merge conflict after rebasing to run rustfmt. :(

);
HolderCommitmentPoint {
next_transaction_number: holder_commitment_next_transaction_number,
current_point: holder_commitment_point_current_opt,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try to ask the signer for this too. If the signer is sync, we should support splicing from the first load, only in the async case should we hit the "sorry, need to advance once first" error.

let commitment_point = self
.holder_commitment_point
.current_point()
.expect("current should be set after receiving the initial commitment_signed");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong for splicing cause we could have restarted, no? I mean we do check on the init so its probably safe to unwrap, but might as well just return an error rather than panicing, and even if we dont the message is wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the point about restarts. The message was more phrased for new channels where once the initial commitment_signed is received, we will have advanced the commitment point and thus current_point will be Some. Otherwise, right, we check when a splice is initiated that current_point is set.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, my point is that on restart with an async signer this won't be filled in and the panic message would be wrong. Its okay cause its checked during splice_init handling, but the "comment" here explaining why the "unwrap" is safe is wrong.

@TheBlueMatt
Copy link
Collaborator

Also, I don't think it breaks non-splicing, just doesn't quite fix splicing :).

@TheBlueMatt TheBlueMatt merged commit 72fd628 into lightningdevkit:main Aug 19, 2025
25 checks passed
@jkczyz jkczyz mentioned this pull request Aug 20, 2025
@jkczyz
Copy link
Contributor Author

jkczyz commented Aug 20, 2025

There's a few things to fix, but I'm gonna go ahead and land this to get the bulk of the code landed so there isn't rebase and cause it already has an ACK.

Follow-ups in #4023.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants