fix(transport-webrtc): handle terminal peer connection states and sig…#3406
Open
paschal533 wants to merge 5 commits intolibp2p:mainfrom
Open
fix(transport-webrtc): handle terminal peer connection states and sig…#3406paschal533 wants to merge 5 commits intolibp2p:mainfrom
paschal533 wants to merge 5 commits intolibp2p:mainfrom
Conversation
…naling stream errors
…n stream muxer close test
|
minor nit but looks like pre-existing TODO: were stripped? |
Contributor
Author
Thanks for catching that... those slipped in during reformatting. I fogot to restored them in the latest commit. I will do that right away. Thank you sm |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix: WebRTC terminal peer connection states, signaling race condition, and stream muxer test reliability
Problems fixed
When a WebRTC private-to-private connection fails to establish (WiFi disabled, NAT traversal fails, ICE times out),
the dialing side could remain in a "connected" state while the listening side correctly detected the failure. This
left a leaked, half-open connection the connection manager never cleaned up.
The root cause:
RTCPeerConnectionMultiaddrConnectiononly checkedconnectionStatechanges viaonconnectionstatechangewhich does not fire for state transitions that already occurred before the handler wasregistered. Added an explicit check of the initial
connectionStateat construction time.Closes #2646
UnexpectedEOFErrorinreadCandidatesUntilConnectedA race condition caused the WebRTC "simple" transport compliance test to fail intermittently: the listener closes its
signaling stream just before the initiator's
connectionstatechange: connectedevent fires. The initiator thenreceives an EOF on the signaling stream and throws an
UnexpectedEOFError, discarding a perfectly good peerconnection.
Fixed by moving the
connectedPromiseoutside the try block inreadCandidatesUntilConnected. If the signaling streamthrows EOF and the peer connection isn't yet 'connected', we now await the
connectedPromise, if the PC connectsanyway (just after the EOF), the error is swallowed; if it fails, the error is re-thrown.
The
onconnectionstatechangehandler treated 'disconnected' the same as 'failed' and 'closed', callingonTransportClosed()andpeerConnection.close()immediately. WebRTC's 'disconnected' state is transient and the ICEagent may recover to 'connected' without intervention
In Firefox, this caused a spurious DataChannel close during large data transfers, which raced against
sendCloseWrite'spEvent(channel, 'close')listener registration. If the channel closed before the listener wasregistered, the FIN_ACK was missed, the
finAckTimeoutfired, and all branches of Promise.any rejected with anAggregateError.Fixed by only treating 'failed' and 'closed' as terminal states.
The test raced a 70ms timeout against pipes using
await delay(10)per iteration which is leaving only a 20ms window afterdialer.abort()at t=50ms for the abort to propagate. On loaded CI runners, NodeJS timer imprecision routinely exceedsthis margin.
Increased the timeout to 2000ms. On success, the test exits well under 100ms; the 2000ms only applies when abort
genuinely fails to propagate.
Test plan