-
Notifications
You must be signed in to change notification settings - Fork 355
Stale check command for async connections #547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
4f0eea7
to
e67636b
Compare
Do you have the corresponding client changes? I tried integrating this into I think there's might be a race condition here involving command execution and connection closure. I remember a few years ago there was an issue where enabling inactive connection validation would cause the client to hang by submitting a |
httpcore5/src/main/java/org/apache/hc/core5/http/nio/command/StaleCheckCommand.java
Show resolved
Hide resolved
callback.completed(false); | ||
} | ||
final ByteBuffer buffer = ByteBuffer.allocate(0); | ||
final int bytesRead = ioSession.channel().read(buffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this read is actually redundant. Command
is modeled as a write event, and reads are processed before writes. So if the connection is closed, we'll already know by the time we get here, provided that we include the changes from #543.
The end result ends up being very similar to the "synchronization barrier" between the event loop and the connection pool that I was talking about in that PR. The internal race condition basically goes away as long as connection reuse is completed through the event loop, which then has a chance to update all the relevant bookkeeping with respect to whatever IO events are pending.
I made the following changes locally:
When I do all of this, the results are dramatic:
When I disable inactive connection validation, I get the same results I've been getting:
We're definitely on the right track. If I'm right about the IO in |
I think I was mistaken about this. The actual issue might have been the no-op implementation of |
@rschmitt The results looks encouraging. |
@rschmitt I will fix the problem with |
e67636b
to
f70f0fd
Compare
Copying a comment from #543:
I think here, "the protocol requirements with regards of the connection persistence" refers to my test server in This means that what is happening in my reproducer is actually pretty representative of what happens on an HTTP/2 connection, in the sense that recognizing the closure of the connection requires the client to read beyond the last byte of the current response. Instead of sending a We may also want to turn |
@rschmitt They are and for a good reason: multiplexing connections cannot permit a single message exchange to shut down the connection potentially shared by many concurrent exchanges.
I cannot agree with that. GOAWAY(NO_ERROR) signals initiation of connection shutdown. A well behaved endpoint should not just write out GOAWAY(NO_ERROR) and immediately drop connection. It can, however, half-close it on its end and stop processing incoming requests which can be safely retried over another connection. Please do add |
@rschmitt I looked at various ways of improving the logic in the stale connection command but could not figure out anything useful on top of what we already have. I do not think the check can be made 100% reliable and this is probably as good as it gets for now. Do you want me to commit the change-set, keep it open or drop it? |
And a well-behaved client should not attempt to reuse a connection on which it has already received a Overall, I see this as a more modern approach. The HTTP 1.x connection headers are legacy stuff from the '90s, and we shouldn't rely on them exclusively to prevent stale/closed connection reuse. I actually think that
I'm happy with the version of this change I tested locally. The client-server dyad is a distributed system, so connection closure race conditions are always a possibility, but I think the changes I described earlier fix the purely internal race condition within the client. I'll submit a branch with those changes, and any further simplifications. |
|
cdb3ae3
to
8464d89
Compare
@rschmitt This should be the case already. Connections should immediately go into graceful shutdown state immediately upon the receipt of GOAWAY. I will double-check. though
Agreed.
I have incorporated your changes into the change-set. In fact the change-set is now in your name as you have contributed most to it. |
@arturobernalg Good catch. Corrected. |
@arturobernalg @rschmitt Please do one more pass. |
8464d89
to
0ba75a8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ok2c LGTM
It'll take me a few hours to add HTTP/2 support to my tester. I'll follow up with that later, as well as with the conversion of |
It looks like I was right about HTTP/2. The race condition statistics are exactly the same as for HTTP/1.1:
Additionally, there are two problems:
Remember, the contract for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0ba75a8
to
1bf3859
Compare
bb50a17
to
e1b344b
Compare
The `StaleCheckCommand`, like all `Command`s, is modeled as a write operation, and `InternalDataChannel::onIOEvent` processes reads before writes. Therefore, by the time the stale check command is processed, the client's view of the connection is already up-to-date; any server-initiated connection closure (FIN, RST, GOAWAY) has already been read and processed.
e1b344b
to
0e22b40
Compare
@rschmitt This change-set introduces a relatively cheap 'stale' connection check command that works with both HTTP/1.1 and H2 protocols and can be used instead of a more expensive Ping command.
Please take a look.