-
Notifications
You must be signed in to change notification settings - Fork 512
Channel Splicing (feature 62/63) #1160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Can I suggest we do this as an extension BOLT rather than layering it in with the existing BOLT2 text? It makes it easier to implement when all of the requirements deltas are in a single document than when it is inlined into the original spec. Otherwise, the PR/branch-diff itself is the only way to see the diff and that can get very messy during the review process as people's commentary comes in. While there are other ways to get at this diff without the commentary, it would make the UX of getting at this diff rather straightforward. Given that the change is gated behind a feature bit anyway it also makes it easier for a new implementation to bootstrap itself without the splice feature by just reading the main BOLTs as is. At some point in the future when splicing support becomes standard across the network we can consolidate the extension BOLT into the main BOLTs if people still prefer. |
Why not, if others also feel that it would be better as an extension bolt. I prefer it directly in Bolt 2, because of the following reasons:
But if I'm the only one thinking this is better, I'll move it to a separate document! One thing to note is that we already have two implementations ( |
One thing I've been thinking about is with large splices across many nodes, if some node fails to send signatures (likely because two nodes in the cluster demand to sign last) than splice will hang one I believe we need two things to address this:
Currently CLN fails the channel in this case as taking signatures and not responding is rather rude but this is bad because it could lead to clusters of splice channels being closed. The unfortunate side effect of this is we have to be comfortable sending out signatures with no recourse for not getting any back. I believe long term the solution is to maintain a signature-sending reputation for each peer and eventually blacklist peers from doing splices and / or fail your channels with that peer. A reputation system may be beyond the needs of the spec but what to do with hanging |
This is already covered at the quiescence level: quiescence will timeout if the splice doesn't complete (e.g. because we haven't received
I don't think this is necessary, and I think we should really require people to send
It seems like we've discussed this many times already: this simply cannot happen because ordering based on contributed amount fixes this? Can you detail a concrete scenario where |
- Either side has added an output other than the channel funding output | ||
and the balance for that side is less than the channel reserve that | ||
matches the new channel capacity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean to have a channel reserve to "match the new channel capacity". AFAICT the channel_reserve is specified in satoshis and reading the negotiation process of this proposal doesn't seem to indicate that there is any change happening to that parameter during negotiation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT the channel_reserve is specified in satoshis
Not with dual-funding, where the channel reserve is 1% of the channel capacity. That's why this is potentially changing "automatically" when splicing on top of a dual-funded channel if we want to keep using 1%.
But you're right to highlight this: the channel reserve behavior is very loosely specified for now, and there were a lot of previous discussions with @morehouse regarding what we should do when splicing. Another edge case that we must better specify is what happens when splicing on top of a non-dual-funded channel, where the channel reserve was indeed a static value instead of a proportional one!
The channel reserve behavior is IMO the only missing piece of this specification, that we should discuss, thanks for bringing it up!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be a good thing to discuss in Tokyo!
Also worth stepping back and double checking the reserve requirement makes sense in its current form generally 👀.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of the following behavior for handling channel reserves:
- Whenever a splice happens, the channel is automatically enrolled into the 1% reserve policy, even if it wasn't initially a dual-funded channel (unless 0-reserve is used of course, see Add
option_zero_reserve
(FEATURE 64/65) #1140) - Splice-out is not allowed if you end up below your pre-splice reserve (your peer will reject that splice with
tx_abort
) - Otherwise, it's ok if one side ends up below the channel reserve after a splice: this is the same behavior as when a new channel is created. If we get into that state, the peer that is below the channel reserve:
- is not allowed to send outgoing HTLCs
- is allowed to receive incoming HTLCs
- if it is paying the commit fees, it is allowed to dip further into its channel reserve to receive HTLCs (because of the added weight of the HTLC output), because we must be able to move liquidity to their side to get them above their reserve
- When there are multiple unconfirmed splices, we use the highest channel reserve of all pending splices (ie requirements must be satisfied for all pending splice transactions)
As discussed during yesterday's meeting, there are subtle edge cases due to concurrent updates: this is inherent to the current commitment protocol, but will eventually become much simpler with #867
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
related: ACINQ/eclair#2899 (comment), tries to specify the concurrent edge cases and also the requirement when we would already (without splicing) allow the peer paying the fees being dipped below its reserve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That all seems reasonable to me. The one part where we could get into trouble is:
if it is paying the commit fees, it is allowed to dip further into its channel reserve to receive HTLCs (because of the added weight of the HTLC output), because we must be able to move liquidity to their side to get them above their reserve
This allows the reserve to be violated, potentially all the way down to 0. In that situation, there is ~zero incentive to broadcast the latest commitment on force close.
That said, I know the implementation details are hairy to do things completely safely. And we can also look forward to zero-fee commitments with TRUC and ephemeral anchors, which would obsolete the "dip-into-reserve to pay fees" exception entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allows the reserve to be violated, potentially all the way down to 0. In that situation, there is ~zero incentive to broadcast the latest commitment on force close.
Since we only allow this to happen when the node paying the fee receives HTLCs, the other node sending that HTLC can limit the exposure by controlling how many HTLCs they send in a batch (or keep pending the commit tx) when we're in this state.
There are unfortunately cases where even a single HTLC would make the node paying the fee have no output (small channels with high feerate), but when that happens you really don't have any other option, the channel is otherwise unusable, so your only other option is to force-close anyway which isn't great...
And we can also look forward to zero-fee commitments with TRUC and ephemeral anchors, which would obsolete the "dip-into-reserve to pay fees" exception entirely.
Exactly, this is coming together (look at this beautiful 0-fee commitment transaction: https://mempool.space/testnet4/tx/85f2256c8d6d61498c074d53912d1f0ef907ee508bb06f5701f3826432ba53b8) which will finally get rid of this kind of mess: I'm fine with using an imperfect but simple work-around in the meantime!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this requirement would solely be used for the splicing case, allowing HTLC which dip the opener into its reserve or should we make this an overall requirement. If so there is the problem with backwards compatibility, because older nodes (speaking for LND nodes) will force close if the opener dips below its reserve. So maybe it makes sense to only activate it for splicing use cases so that we don't run into the backwards compatibility issues ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea!
We add more test vectors to describe how reconnection should be handled after a splice, before receiving `splice_locked`, when there are pending updates.
@t-bast @ddustin not sure if this was also an issue for your implementations: LDK does not track the full funding transaction of an inbound channel (only the funding outpoint), so it cannot provide the |
And would prefer to not start storing the full funding transaction for every inbound channel :) |
That is exactly why we've introduced the I don't think anyone stores the whole funding transaction, everyone just tracks the outpoint and as you mention, we know segwit is used so we're good. On top of that, if we had to transmit the funding transaction in the |
Huh... not sure how we missed that. Leaving some clarifying comments. |
It is always helpful to reference the `funding_txid` that is spent by a `commit_sig`, even when there are no pending splices. It's also easier for implementation to always include it.
We add a `message_type` TLV to `start_batch` that must be used when the batch contains only messages of the same type, which is how it is used for splicing (where we send a batch of `commitment_signed` messages).
As suggested by @jkczyz, we clarify requirements around: - the `shared_funding_txid` field - `start_batch` maximum size and RBF attempts - `channel_reestablish` ordering with `splice_locked`
What is the expected behavior for unrecognized
|
I think that generally, we should just ignore |
Ah yeah that makes total sense 👍 |
As proposed by @ddustin, we explicitly narrow the requirements for `start_batch` to match our only use-case for it (splicing). We can change that in the future if we use this messages for other features.
Some of those requirements shouldn't be gated on announcing the channel, and we clarify that we retransmit once per connection.
- if it receives `channel_ready` for that transaction after exchanging `channel_reestablish`: | ||
- MUST retransmit `channel_ready` in response, if not already sent since reconnecting. | ||
- if it receives `splice_locked` for that transaction after exchanging `channel_reestablish`: | ||
- MUST retransmit `splice_locked` in response, if not already sent since reconnecting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As written these requirements are dependent on receiving other messages. This seems more complicated than it needs to be. Instead, can't the retransmission requirements be entirely within the channel_reestablish
's last "A receiving node" section? There's already a requirement there to retransmit splice_locked
, so the one here seems redundant. We'd just need to add a requirement there for retransmitting channel_ready
.
Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it would be clearer, I like keeping all of those requirements under the if
option_splice was negotiated
...overall I think that channel_reestablish
requirements deserve a refactoring, but nobody was interested in it (see #1049 and #1051) so I gave up 🤷♂️
Can you try refactoring like what you suggest? If it's better I'll include that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the spec already states that an redundant channel_ready
messages must be ignored, any reason why we don't just do the same with splice_locked
and always re-send channel_ready
/ splice_locked
upon connection? Seems that would simplify the requirements quite a bit.
The sending node: | ||
- MUST NOT send `splice_init` if the channel is not quiescent. | ||
- MUST NOT send `splice_init` if it is not the quiescence initiator. | ||
- MUST NOT send `splice_init` before sending and receiving `channel_ready`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems redundant given that the channel must be quiescent - we could move this requirement under stfu
if we want to be more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it isn't redundant, this is important to separate RBF-ing the initial funding transaction (using tx_init_rbf
) from splicing. You can only splice after exchanging channel_ready
: before that, you must RBF the funding transaction (which lets you achieve the same thing as splicing, since it lets you change your funding contribution).
You can send stfu
before channel_ready
(this is necessary to RBF the funding transaction), but you cannot send splice_init
before channel_ready
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool gotya, didn't realize there's another channel_ready
here, but given it's used in the new funding tx then yeah it makes sense.
- MUST NOT send `splice_init` while another splice is being negotiated. | ||
- MUST NOT send `splice_init` if another splice has been negotiated but | ||
`splice_locked` has not been sent and received. | ||
- MUST NOT send `splice_init` if it has previously sent `shutdown`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here - basically I view quiescence as a layer of abstraction so we don't need to consider these cases in the dependent protocols.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one may be trickier, but I think the current requirements shouldn't be changed: you clearly cannot splice after sending shutdown
, but you may want to quiesce after sending shutdown
(you may still have HTLCs that need to be settled at that point, which you may want to pause if you need to quiesce for some reason).
Is there a compelling reason to disallow quiescing after shutdown
? We cannot know at that point if future scenarios will need this, and it doesn't hurt to allow it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually close-to-splice may want to do things during shutdown as well 👀
the amount that will be added to its current channel balance. | ||
- If it requires the receiving node to only use confirmed inputs: | ||
- MUST set `require_confirmed_inputs`. | ||
- SHOULD use a different `funding_pubkey` than the one used for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably need to be a MUST
, at least when we are upgrading the channel to use a taproot output, to avoid using the same key for both ECDSA and Schnorr signatures, as suggested in BIP340.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, thanks for highlighting this! Eclair always uses a different funding_pubkey
, but IIRC cln
didn't use public key rotation here. Let's discuss this with them during the next spec meeting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah CLN doesn't rotate funding keys but it supports its peer rotating them.
@@ -0,0 +1,992 @@ | |||
# Splicing Tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙏
Splicing allows spending the current funding transaction to replace it with a new one that changes the capacity of the channel, allowing both peers to add or remove funds to/from their channel balance.
Splicing takes place while a channel is quiescent, to ensure that both peers have the same view of the current commitments.
We don't want channels to be unusable while waiting for transactions to confirm, so channel operation returns to normal once the splice transaction has been signed and we're waiting for it to confirm. The channel can then be used for payments, as long as those payments are valid for every pending splice transactions. Splice transactions can be RBF-ed to speed up confirmation.
Once one of the pending splice transactions confirms and reaches acceptable depth, peers exchange
splice_locked
to discard the other pending splice transactions and the previous funding transaction. The confirmed splice transaction becomes the channel funding transaction.Nodes then advertise this spliced channel to the network, so that nodes keep routing payments through it without any downtime.
This PR replaces #863 which contains a lot of legacy mechanisms for early versions of splicing, which didn't work in some edge cases (detailed in the test vectors provided in this PR). It can be very helpful to read the protocol flows described in the test vector: they give a better intuition of how splicing works, and how it deals with message concurrency and disconnections.
This PR requires the quiescence feature (#869) to start negotiating a splice.
Credits to @rustyrussell and @ddustin will be added in the commit messages once we're ready to merge this PR.