-
Notifications
You must be signed in to change notification settings - Fork 596
Refactor HTTP connection handling and some handlers to stream responses #10516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, chapeau for your great work and the effort you put into setup the unit tests. I didn't review all the tests in detail yet, but they look very comprehensive and reusable for future tests as well. But for the time being, I have a few comments regarding the non-test code part, so here we go :)!
d587bca
to
5f8e238
Compare
|
5f8e238
to
d5bde69
Compare
|
f1fe71f
to
55b0bd1
Compare
|
55b0bd1
to
e22b3a1
Compare
3cf3765
to
4b1020e
Compare
Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LFTM!
PS: I've updated the PR description to include before and after MEM usage graphs that demonstrate the difference in memory usage with this change. So, please have look at them :)!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far, these are my comments for everything except the tests. I'll have a look at them next.
/* Preferably, we would return an ostream object here instead. However | ||
* there seems to be a bug in boost::beast where if the ostream, or rather its | ||
* streambuf object is moved into the return value, the chunked encoding gets | ||
* mangled, leading to the client disconnecting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a known issue? If so, please add a reference, otherwise, is this something that would be worth reporting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, definitely. I mentioned in the PR description that once I have some spare time I'll see if I can put together a minimal reproducer and report the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And now my comments for the test fixtures. The actual tests are a task for another day 😅
For the record: apart from that the CPU usage has to be looked into, I'm happy with this PR now. And hopefully that just needs some tweaking of some thresholds. Or maybe some profiling that hints something that can easily be tweaked.
I'm not sure if the connection speed is much of a factor here. Just checked: for my test requests, the response size is about 30 MiB, so stretched over 7 seconds, we aren't talking about massive transfer rates here. |
Ok, so from my preliminary testing, I can only reproduce a difference of this magnitude without optimization. Enabling Maybe you can repeat the test with optimization on, on your end and see if it's still a problem. I'll do a few more tests tomorrow and find the ideal value for the flush threshold (and maybe prepare some charts to show the impact of this value). |
I have also tested this as described in #10516 (comment) and got almost the same results! Request GET /v1/objects/hosts (from [::1]:38850, user: root, agent: curl/8.14.1, status: OK) took total 5076ms.
Request GET /v1/objects/hosts (from [::1]:38818, user: root, agent: curl/8.14.1, status: OK) took total 5113ms.
Request GET /v1/objects/hosts (from [::1]:38840, user: root, agent: curl/8.14.1, status: OK) took total 5185ms And profiling it with dtrace on my Mac shows that the ![]() ![]() ![]() |
Sorry, didn’t see your comment yesterday while I was submitting mine! As you can see in the screenshot I posted yesterday, doing I/O ops too often isn't the problem but filling the actual buffer via the Expand mediff --git a/lib/remote/httpmessage.cpp b/lib/remote/httpmessage.cpp
index 19011e432..53855ff9a 100644
--- a/lib/remote/httpmessage.cpp
+++ b/lib/remote/httpmessage.cpp
@@ -25,9 +25,6 @@ public:
explicit HttpResponseJsonWriter(HttpResponse& msg) : m_Message{msg}
{
m_Message.body().Start();
-#if BOOST_VERSION >= 107000
- m_Message.body().Buffer().reserve(m_MinPendingBufferSize);
-#endif /* BOOST_VERSION */
}
~HttpResponseJsonWriter() override { m_Message.body().Finish(); }
@@ -36,9 +33,7 @@ public:
void write_characters(const char* s, std::size_t length) override
{
- auto buf = m_Message.body().Buffer().prepare(length);
- boost::asio::buffer_copy(buf, boost::asio::const_buffer{s, length});
- m_Message.body().Buffer().commit(length);
+ m_Message.body() << std::string_view{s, length};
}
void MayFlush(boost::asio::yield_context& yield) override
@@ -111,7 +106,6 @@ HttpResponse::HttpResponse(Shared<AsioTlsStream>::Ptr stream, HttpServerConnecti
void HttpResponse::Clear()
{
ASSERT(!m_SerializationStarted);
- boost::beast::http::response<body_type>::operator=({});
}
void HttpResponse::Flush(boost::asio::yield_context yc)
@@ -175,7 +169,7 @@ void HttpResponse::SendFile(const String& path, const boost::asio::yield_context
while (remaining) {
auto maxTransfer = std::min(remaining, maxChunkSize);
- auto buf = *body().Buffer().prepare(maxTransfer).begin();
+ auto buf = body().Buffer().prepare(maxTransfer);
fp.read(static_cast<char*>(buf.data()), buf.size());
body().Buffer().commit(buf.size());
diff --git a/lib/remote/httpmessage.hpp b/lib/remote/httpmessage.hpp
index 10d00fd49..43d9f65b5 100644
--- a/lib/remote/httpmessage.hpp
+++ b/lib/remote/httpmessage.hpp
@@ -10,6 +10,7 @@
#include "remote/url.hpp"
#include <boost/beast/http.hpp>
#include <boost/version.hpp>
+#include <boost/asio/streambuf.hpp>
namespace icinga {
@@ -20,11 +21,8 @@ namespace icinga {
* which uses a multi_buffer, with the ability to continue serialization when
* new data arrives of the @c boost::beast::http::buffer_body.
*
- * @tparam DynamicBuffer A buffer conforming to the boost::beast interface of the same name
- *
* @ingroup remote
*/
-template<class DynamicBuffer>
struct SerializableBody
{
class writer;
@@ -33,7 +31,7 @@ struct SerializableBody
{
public:
template<typename T>
- value_type& operator<<(T&& right)
+ std::ostream& operator<<(T&& right)
{
/* Preferably, we would return an ostream object here instead. However
* there seems to be a bug in boost::beast where if the ostream, or rather its
@@ -54,8 +52,8 @@ struct SerializableBody
* responses are handled via a reader instance, this shouldn't be too much of a
* problem.
*/
- boost::beast::ostream(m_Buffer) << std::forward<T>(right);
- return *this;
+ m_OStream << std::forward<T>(right);
+ return m_OStream;
}
[[nodiscard]] std::size_t Size() const { return m_Buffer.size(); }
@@ -63,7 +61,7 @@ struct SerializableBody
void Finish() { m_More = false; }
bool Finished() { return !m_More; }
void Start() { m_More = true; }
- DynamicBuffer& Buffer() { return m_Buffer; }
+ boost::asio::streambuf& Buffer() { return m_Buffer; }
friend class writer;
@@ -72,7 +70,8 @@ struct SerializableBody
* for simple messages and can still be written with http::async_write().
*/
bool m_More = false;
- DynamicBuffer m_Buffer;
+ boost::asio::streambuf m_Buffer;
+ std::ostream m_OStream{&m_Buffer};
};
static std::uint64_t size(const value_type& body) { return body.Size(); }
@@ -95,7 +94,7 @@ struct SerializableBody
class writer
{
public:
- using const_buffers_type = typename DynamicBuffer::const_buffers_type;
+ using const_buffers_type = boost::asio::streambuf::const_buffers_type;
#if BOOST_VERSION > 106600
template<bool isRequest, class Fields>
@@ -200,7 +199,7 @@ private:
*
* @ingroup remote
*/
-class HttpResponse : public boost::beast::http::response<SerializableBody<boost::beast::multi_buffer>>
+class HttpResponse : public boost::beast::http::response<SerializableBody>
{
public:
explicit HttpResponse(Shared<AsioTlsStream>::Ptr stream, HttpServerConnection::Ptr server = nullptr);
diff --git a/test/CMakeLists.txt b/test/CMakeLists.txt
index 5498a6d83..b69bdac8a 100644
--- a/test/CMakeLists.txt
+++ b/test/CMakeLists.txt
@@ -278,7 +278,6 @@ add_boost_test(base
remote_certs_fixture/cleanup_certs
remote_httpmessage/request_parse
remote_httpmessage/request_params
- remote_httpmessage/response_clear
remote_httpmessage/response_flush_nothrow
remote_httpmessage/response_flush_throw
remote_httpmessage/response_write_empty
@@ -298,7 +297,6 @@ add_boost_test(base
remote_httpserverconnection/reuse_connection
remote_httpserverconnection/wg_abort
remote_httpserverconnection/client_shutdown
- remote_httpserverconnection/handler_throw_error
remote_httpserverconnection/handler_throw_streaming
remote_httpserverconnection/liveness_disconnect
remote_configpackageutility/ValidateName
@@ -313,7 +311,6 @@ if(BUILD_TESTING)
set_tests_properties(
base-remote_httpmessage/request_parse
base-remote_httpmessage/request_params
- base-remote_httpmessage/response_clear
base-remote_httpmessage/response_flush_nothrow
base-remote_httpmessage/response_flush_throw
base-remote_httpmessage/response_write_empty
@@ -333,7 +330,6 @@ if(BUILD_TESTING)
base-remote_httpserverconnection/reuse_connection
base-remote_httpserverconnection/wg_abort
base-remote_httpserverconnection/client_shutdown
- base-remote_httpserverconnection/handler_throw_error
base-remote_httpserverconnection/handler_throw_streaming
base-remote_httpserverconnection/liveness_disconnect
PROPERTIES FIXTURES_REQUIRED ssl_certs) The PR
|
That isn't relevant here, since everything is going through the output adapter and not the ostream operator. My guess (since from your numbers I'm assuming you're still running the test unoptimized) would be that some low level functionality of asio's streambuf is compiled into the dynamic libraries (with optimization) and that's why its faster. As I said before, This all comes down on whether it's a priority for us that our binary should not have performance regressions in an unoptimized debug binary (I didn't test I'll finish a few more tests and then I'll get back to you with my own results. |
I didn't use debug binary for my previous tests! I just built an image with #10505 and run it as described in #10516 (comment). That's the exact executable that will also be used by the end users if they use our image.
Are sure about this? Beast dependes on Asio's functionality but never the other way around. Asio's streambuf implements the |
My bad. I must have misremembered. I thought this was part of beast. Seems it uses a
Then why do you get these absurdly high response times? Mine are more along the lines of 610ms per response for 1f92ec6 (on my potato notebook mind you) and my PR starting at ~690ms at a 4k flush threshold (and decreasing significantly on higher thresholds). |
@jschmidt-icinga You're right, compiler optimizations (or the lack thereof) is to blame here. Only adding Base (1f92ec6)
PR (bf04a55)
So with that, it's just a 0.7% increase in response time and 5% increase in CPU usage for a 88% decrease in memory usage (again, take the exact numbers with a grain of salt, they are from a single run, repeating it shows similar numbers though). That sounds more than acceptable for that saving in memory.
So that in itself wouldn't be a problem for me. Though I haven't looked in detail into what @yhabteab found, should we have a closer look at this nonetheless? |
I've got the following results with a container compiled with
At 1M, which I initially did just so make sure a really big number doesn't get any additional gains, I did an additional test that only does I also want to note that the values for this PR were incredibly consistent compared to 1f92ec6 (likely because of fewer memory allocations) which fluctuated a lot more so I averaged over 6 tests for that instead of 3 for the others. As you can see, the response time actually gets better than 1f92ec6 while user time breaks even at about 64k/128k. Given that under 1M max memory usage doesn't seem to be impacted much, I'd actually set this (and the threshold in Edit: |
Sounds fine for me. (Afterwards I'll spin it up for a test again, but I wouldn't expect any big surprises there).
Fun fact: I'm reading the documentation of
|
bf04a55
to
0b0b43d
Compare
@yhabteab I've looked at the new docker ContainerFile and tested with the same values of |
These parameters are no longer needed since they were only used by EventsHandler which was refactored in an earlier commit.
This is needed so it's possible to manually add an ApiListener object for the purpose of unit-testing.
0b0b43d
to
7373f36
Compare
But I do 🙈! The new Containerfile from #10505 sets the |
I find this funny because in my last comment, I first wrote
I think I found the reason why. look for |
I doubt that's the culprit. They are all inside |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Afterwards I'll spin it up for a test again, but I wouldn't expect any big surprises there
Just did that, unsurprisingly it still worked fine, so for me, this PR is ready 🚀
Note: that doesn't mean that the code added by this PR has to be immutable, there may still be room for improvement, but I think the current state is fine and any further changes would be easier to grasp in follow-up PRs rather than inside a 2k+ lines and 200+ comments PR.
Fixes #10409.
Fixes #10142, since many of the additional parameters
HttpHandler
s needed before are no longer necessary.Description
This PR mostly does four things:
HttpHandler
signature by containing additional parameters and interfaces in classes extending theboost::beast
request and response classes.HttpHandler
s to stream responses via chunked encoding.ObjectQueryHandler
andEventsHandler
to make use of these streaming capabilities.HttpServerConnection
and the newHttpResponse
andHttpRequest
classes.I will explain the implementation and the considerations I've made in detail below.
HttpRequest
andHttpResponse
The rationale for adding these classes is that they serve as an abstraction between
HttpServerConnection
, which does no longer have to offer an interface to the handlers directly (i.e. via aserver
parameter passed down to the handlers andStartStreaming()
being called by the handlers), and theHttpHandlers
, which do no longer need to concern themselves with connection handling.For the response class I've implemented a custom body type
SerializableBody
with awriter
that can be used by the serializer to write the added content to the connection, while adding chunked encoding as needed and tracking progress as needed. A previous iteration of this PR also implemented theBodyReader
, but this added unnecessary complexity and was removed.This wasn't easily achievable with any of the regular beast body types, because we needed functionality of the
buffer_body
, specifically the ability to interrupt serialization when the buffer content has fully been written to the connection and pick up again on the next call toFlush()
once more data is available, and thedynamic_body
, which has the ability to allocate additional memory as needed.Streaming and non-streaming responses
To start a streaming response, the handler has to first call
response.StartStreaming()
, which enables chunked encoding and allows the response header to be sent on the nextresponse.Flush(yc)
without setting thecontent_length
.In case of a non-streaming response, the handler can use the
HttpUtility::SendJson(Body|Error)
convenience functions, same as before, or write the body as normal and then just return. The handler no longer has to set upcontent_length
, which is now done automatically in the first call toresponse.Flush(yc)
which by default is done once after the handler returns toProcessRequest()
inhttpserverconnection.cpp
.Due to the new custom body, it is no longer possible (it would have been possible to implement, but more work) to use string operators on the body, however an output stream operator is added to make the interface as or more convenient. There is however a bug in beast that makes this slightly less efficient (but still more efficient than string
+=
) than it could be (see the comment in here). I intend to write a minimal reproducer and report this bug to the beast github repo.Refactoring
ObjectQueryHandler
andEventsHandler
Events were streamed before, but without using chunked encoding and needing to directly setup the server connection via
StartStreaming()
. Now, proper HTTP/1.1 compliant chunked encoding is used and no special treatment is necessary by the server connection.In
ObjectQueryHandler
we now make use of the new ability of passing a generator function to theJsonEncoder
added in #10414. The for-loop iterating over the config objects becomes a generator function that feeds the serialized objects into the JsonEncoder one by one, flushing the response occasionally.The same could easily be applied to other handlers like
ModifyObjectHandler
andDeleteObjectHandler
, but since these don't have as large of a memory footprint, the necessity of this can still be determined and done in a future PR.HttpServerConnection::DetectClientShutdown
This new coroutine was added to
HttpServerConnection
to detect shutdowns initiated by the HTTP-client while we're streaming a response. Before this PR, we'd detect shutdowns either when writes failed, or when the next request was being read. This had the disadvantage that during long running responses there were situations where it was impossible for the client to complete a graceful SSL-shutdown within a reasonable time-frame. Even with the new streaming capabilities added by this PR, waiting until writes failed would lead to an "application data after close notify" error on the client's side and an exception being logged on the server's side.In my opinion adding a general approach as a part of
HttpServerConnection
's connection handling made the most sense, and this coroutine was the simplest way to do this. Possibly there would have been ways to let handlers enable or disable this functionality, but that would have been more code for basically no benefit, as this coroutine doesn't do any harm in any of the other use-cases.The Coroutine basically initiates an
async_fill()
whenever reading a request is done:and for as long as the connection isn't in the process of (or done) disconnecting:
icinga2/lib/remote/httpserverconnection.cpp
Lines 532 to 533 in d5cedf1
Initially I intended to use
boost::asio
'sasync_wait
function withwait_type::wait_read
to avoid even as much as anasync_fill
in the case a client didn't initiate a shutdown. However, likely due to differences in the respective libc's, this would not work reliably on the Windows and Alpine builds. So instead I chose the following approach:icinga2/lib/remote/httpserverconnection.cpp
Lines 535 to 541 in d5cedf1
This tries to fill the stream's buffer with as little as a single byte from the connection, which either yields until one or more bytes are readable, or returns an error code when the connection is closed for whatever reason. In any case where an error code is returned, we will want to attempt a graceful shutdown, but mostly this will be responding to a shutdown the client has initiated.
In the case no error is returned and the connection is being reused for another request, the first few bytes of the request will be read into the stream's buffer and an
AsioDualEvent
flag will be set to allow the regular continuation ofHttpServerConnection::ProcessMessage()
to read the next request. This "locking" is needed to avoid twoasync_fill
s (the one here and the one inhttp::async_read
) being in progress at the same time, which causes exceptions/errors under certain race-conditions.Unit Testing
The largest part of new lines in this PR comes from the addition of unit testing for
HttpServerConnection
and the new HTTP message classes. To get this working I had to add a number of testing fixtures for setting up certificates, TLS connections, and log message pattern matching. These were all added in a generally useful way so its possible to reuse them for adding other medium complexity unit-tests later on. After this PR is merged, I intend to use these fixtures to add testing forApiListener
,JsonRpcConnection
, and some individualHttpHandler
s.Generating the certificates is relatively time consuming and takes above 1.5s on my system, so we wouldn't want to do it once per test, which is why I've added cmake-test fixtures (not to be confused with boost-test fixtures) which are implemented as "test-cases" themselves that set up the certificates in a temporary directory and define a dependency to all tests that use them. Ctest will then schedule the setup fixture test to be run before and the cleanup fixture test to be run after all dependent tests are finished. This looks a bit verbose inside
CMakeLists.txt
, which stems from ouradd_boost_test()
cmake function being old and limited legacy code. Maybe this can be revisited in the future to make it more elegant, like automated discovery of test targets similar todoctest
's cmake integration. Anyway, after the initial setup of the certificates, this makes each test case take about 80-100ms on my system, with the exception ofremote_httpserverconnection/liveness_disconnect
which has to wait 10s for the disconnect to be triggered inHttpServerConnection::CheckLiveness()
.For
HttpServerConnection
I've focused on testing connection handling with a simple test handler, so it doesn't rely on or test any specific behavior of the regular handlers. I've verified that all tests (includingclient_shutdown
usingHttpServerConnection::StartStreaming
in the handler) also run on the master branch with some modifications to theUnitTestHandler
and some of the Log Pattern matching commented out. I've decided against making them a separate PR, because then almost every commit in this PR would have to incrementally adapt that handler to the changes to still make it compile.For
HttpResponse
andHttpRequest
, parsing and serializing the messages is tested, in streaming and non-streaming conditions, and also their use together with some of the auxiliary functions inHttpUtility
.If anyone can think of test cases I missed, I'm be happy to add them.
Tests (yhabteab)
On a Debian VM of the following size:
Icinga 2 is running as a container within that VM using a locally built image from this PR. So, there is no other process involved in the resulted graph, since I've generated them using
docker stats
with a timeframe of30m
and Icinga 2 is the only container running.I've used this script from outside of the VM (from my local machine) to stress test it:
icinga2 daemon -C
Before:
It couldn’t even run for straight 1 minute before the OOM-Killer was triggered:
After:

