Skip to content

Conversation

@t-bast
Copy link
Collaborator

@t-bast t-bast commented Sep 9, 2025

When disconnecting in the middle of the signing steps of an interactive-tx transaction, we must retransmit signatures on reconnection to complete the interactive-tx protocol.

Nodes first exchange commitment_signed, followed by tx_signatures once they have both sent and received commitment_signed.

We previously always retransmitted commitment_signed, even when our peer had already received it. We now include an explicit bitfield that lets nodes request commitment_signed if they haven't received it.

Note that this is a breaking change, and we are thus changing the TLV type to make it easier to detect. In practice it should be fine, since only eclair and cln have shipped support for dual funding, and those two implementations are already incompatible on reconnection because eclair implements #1214 but cln doesn't. This edge case only creates an issue when nodes disconnect after exchanging tx_complete but before receiving signatures, which should happen very infrequently.

Replaces #1214.

When disconnecting in the middle of the signing steps of an
`interactive-tx` transaction, we must retransmit signatures
on reconnection to complete the `interactive-tx` protocol.

Nodes first exchange `commitment_signed`, followed by
`tx_signatures` once they have both sent and received
`commitment_signed`.

We previously always retransmitted `commitment_signed`, even
when our peer had already received it. We now include an explicit
bitfield that lets nodes request `commitment_signed` if they
haven't received it.

Note that this is a breaking change, and we are thus changing
the TLV type to make it easier to detect. In practice it should
be fine, since only `eclair` and `cln` have shipped support for
dual funding, and those two implementations are already incompatible
on reconnection because `eclair` implements lightning#1214 but `cln` doesn't.
This edge case only creates an issue when nodes disconnect after
exchanging `tx_complete` but before receiving signatures, which should
happen very infrequently.

Replaces lightning#1214.
@niftynei
Copy link
Collaborator

We previously always retransmitted commitment_signed, even when our peer had already received it. We now include an explicit bitfield that lets nodes request commitment_signed if they haven't received it.

Can you remind me again why we can't just retransmit the commitment_signed message? It seems much simpler to implement and requires less logic in general, but maybe I'm missing an edge case here where having a duplicate message is problematic.

1. type: 1 (`next_funding`)
2. data:
* [`sha256`:`next_funding_txid`]
* [`byte`:`retransmit_flags`]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #1160 the diff currently shows:

    1. type: 0 (`next_funding`)
    2. data:
        * [`sha256`:`next_funding_txid`]
    1. type: 5 (`my_current_funding_locked`)
    2. data:
        * [`sha256`:`my_current_funding_locked_txid`]
        * [`byte`:`retransmit_flags`]

While this PR is showing

    1. type: 1 (`next_funding`)
    2. data:
        * [`sha256`:`next_funding_txid`]
        * [`byte`:`retransmit_flags`]

So the final goal is to have:

    1. type: 1 (`next_funding`)
    2. data:
        * [`sha256`:`next_funding_txid`]
        * [`byte`:`retransmit_flags`]
    1. type: 5 (`my_current_funding_locked`)
    2. data:
        * [`sha256`:`my_current_funding_locked_txid`]
        * [`byte`:`retransmit_flags`]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's exactly it! I'll rebase #1160 once this PR and #1236 have been merged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we still decrementing the sent next_commitment_number when splice needs commitments sent?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context, this is what we currently do when sending channel_reestablish:

			/* Eclair wants us to decrement commitment number to
			 * indicate that we would like them to re-send
			 * commitment signatures */
			if (!inflight->last_tx)
				send_next_commitment_number--;

When recieving channel_reestablish, we use this as the condition for when we resent the commitment_siged:

next_commitment_number == peer->next_index[REMOTE] - 1

When we send (and need more info to build splice tx), do we:

  1. Continue decrementing commitment number AND set retransmit_flags bit 0.
  2. Stop decrementing commitment number and ONLY set retransmit_flags bit 0.

When receiving, do we:

  1. Resend commitment_signed when noticing a decremented commitment number OR retransmit_flags bit 0 being set
  2. Resend commitment_signed ONLY when retransmit_flags bit 0 is set

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it appears we're dropping your_last_funding_locked_txid entirely -- correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were using your_last_funding_locked_txid to determine if we needed to resend splice_locked. It looks like this new spec does not allow re-sending of splice_locked at all. Is that correct?

Copy link
Collaborator Author

@t-bast t-bast Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we still decrementing the sent next_commitment_number when splice needs commitments sent?

No we're not, that has been deprecated in favor of this PR! So we're back to the simpler case where next_commitment_number really is the next commitment number :)

Also it appears we're dropping your_last_funding_locked_txid entirely -- correct?

Yes, this has been removed entirely, as it's now unnecessary. You don't need to re-send splice_locked at all, it is "contained" in my_current_funding_locked, which is simpler: you always send in my_current_funding_locked the latest thing you've locked.

I think your comment here is thus obsolete, the logic becomes much simpler now that we don't need to decrement the commitment number? You simply re-send commitment_signed if your peer requests it in the retransmit_flags you receive. And you simply ask for retransmission in the retransmit_flags you send if you haven't received the remote commit_sig yet for this interactive-tx.

@t-bast
Copy link
Collaborator Author

t-bast commented Oct 24, 2025

Can you remind me again why we can't just retransmit the commitment_signed message? It seems much simpler to implement and requires less logic in general, but maybe I'm missing an edge case here where having a duplicate message is problematic.

Because it's hacky, why retransmit a message that doesn't need retransmission? Also, it's trivial without taproot, because the sender could simply store their commit_sig and re-send it on reconnection, but with taproot it's much more annoying:

  • nonces need to be shared before-hand (your peer must send you nonces for this specific commitment)
  • you cannot simply retransmit, you have to re-sign because nonces have changed

If we can avoid an unnecessary musig2 round, I think it's worth it, especially since it makes the protocol conceptually cleaner.

@niftynei
Copy link
Collaborator

why retransmit a message that doesn't need retransmission?

Well, without a signal you don't know if it doesn't need retransmission. You can leave the signal out and retransmit the message and the protocol will work. It's simpler and requires less if statements/state checks to take the same action on every reconnection.

avoid an unnecessary musig2 round

It's only "unnecessary" in some fraction of cases; in others it will be necessary. You can easily remove an if switch by taking the same action on every reconnection, which makes the protocol simpler to write and verify.

ddustin added a commit to ddustin/lightning that referenced this pull request Oct 31, 2025
Updating splice related reestablish code to
lightning/bolts#1289
and
lightning/bolts#1160

Changelog-Changed: Breaking change -- if you have splicing enabled on a channel both nodes must upgrade in unison due to updating `channel_reestablish` for to new splice specifications
ddustin added a commit to ddustin/lightning that referenced this pull request Oct 31, 2025
Updating splice related reestablish code to
lightning/bolts#1289
and
lightning/bolts#1160

Changelog-Changed: Breaking change -- if you have splicing enabled on a channel both nodes must upgrade in unison due to updating `channel_reestablish` for to new splice specifications
ddustin added a commit to ddustin/lightning that referenced this pull request Oct 31, 2025
Updating splice related reestablish code to
lightning/bolts#1289
and
lightning/bolts#1160

Changelog-Changed: Breaking change -- if you have splicing enabled on a channel both nodes must upgrade in unison due to updating `channel_reestablish` for to new splice specifications
ddustin added a commit to ddustin/lightning that referenced this pull request Nov 5, 2025
Updating splice related reestablish code to
lightning/bolts#1289
and
lightning/bolts#1160

Changelog-Changed: Breaking change -- if you have splicing enabled on a channel both nodes must upgrade in unison due to updating `channel_reestablish` for to new splice specifications
ddustin added a commit to ddustin/lightning that referenced this pull request Nov 10, 2025
Updating splice related reestablish code to
lightning/bolts#1289
and
lightning/bolts#1160

Changelog-Changed: Breaking change -- if you have splicing enabled on a channel both nodes must upgrade in unison due to updating `channel_reestablish` for to new splice specifications
rustyrussell pushed a commit to ElementsProject/lightning that referenced this pull request Nov 13, 2025
Updating splice related reestablish code to
lightning/bolts#1289
and
lightning/bolts#1160

Changelog-Changed: Breaking change -- if you have splicing enabled on a channel both nodes must upgrade in unison due to updating `channel_reestablish` for to new splice specifications
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants