Skip to content

perf fix for channel over MetaTls #636

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

moonli
Copy link
Contributor

@moonli moonli commented Jul 24, 2025

Summary:
Throughput of channel on MetaTls is surprisingly low, it is much lower than regular TCP without TLS.

Profiling shows most of the time is spent in memset of buffer: https://fburl.com/strobelight/bxxdyloh,
Hot stack: P1879123816

Root cause:

TLS record size is 16KB, so every poll will get 16KB of data, but we use a buf of total message size to
receive the record, in my test, it is 70MB. The worst thing is, TLS lib will zero-out the whole 70MB
every time: e.g. P1879474802. As a result, the larger the message, the worse the perf.

This diff propose a mitigate to the problem, but it makes changes to rustls.

  • In monarch, we can get around of this problem but implementing our own framing, making sure size of
    each fame is fairly small.
  • But this is also a real problem in rust TLS lib, its API doesn't tell user about such a perf trap. It should
    either fix the perf issue, or explicitly asking users to always use small buffer (which I don't think it
    is desible behavior), so I still think it should be fixed in Rust TLS.

Differential Revision: D78875272

pzhan9 and others added 2 commits July 24, 2025 10:52
Differential Revision: D78831200
Summary:
Throughput of channel on MetaTls is surprisingly low, it is much lower than regular TCP without TLS.

Profiling shows most of the time is spent in memset of buffer: https://fburl.com/strobelight/bxxdyloh,
Hot stack: P1879123816

Root cause:

TLS record size is 16KB, so every poll will get 16KB of data, but we use a buf of total message size to
receive the record, in my test, it is 70MB. The worst thing is, TLS lib will zero-out the whole 70MB
every time: e.g. P1879474802. As a result, the larger the message, the worse the perf.


This diff propose a mitigate to the problem, but it makes changes to rustls.

* In monarch, we can get around of this problem but implementing our own framing, making sure size of
each fame is fairly small.
* But this is also a real problem in rust TLS lib, its API doesn't tell user about such a perf trap. It should
either fix the perf issue, or explicitly asking users to always use small buffer (which I don't think it
is desible behavior), so I still think it should be fixed in Rust TLS.

Differential Revision: D78875272
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jul 24, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78875272

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants