-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[improve][broker] Part-1 of PIP-434: Expose Netty channel configuration WRITE_BUFFER_WATER_MARK to pulsar conf and pause receive requests when channel is unwritable #24423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pulsar Broker already contains multiple solutions for backpressure.
The channel writability is controlled with https://netty.io/4.1/api/io/netty/channel/WriteBufferWaterMark.html settings which we don't currently expose, since Pulsar hasn't been using writability for backpressure (excluding the Proxy).
For consumers, there's already a backpressure solution for outbound entries with managedLedgerMaxReadsInFlightSizeInMB
.
In this case, based on the test case, this is specifically about returning topic pattern listing responses.
The problem with this PR is that the limit is per connection to the broker. That's not very effective in limiting the broker memory usage. In Netty, the standard way to handle this is to change autoread to false when writability changes to false. However, that could cause performance issues for Pulsar broker.
Instead, a more effective solution would be to have a rate limiter or semaphore in place that specifically targets the topic pattern listing responses and watchers. When this is handled asynchronously, the solution could add delay to responding to requests when limits are reached.
Besides GET_TOPICS_OF_NAMESPACE/GET_TOPICS_OF_NAMESPACE_RESPONSE, there's also WATCH_TOPIC_LIST with WATCH_TOPIC_LIST_SUCCESS and WATCH_TOPIC_UPDATE responses. That's why it makes more sense to design a proper backpressure fore topic listings.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #24423 +/- ##
============================================
- Coverage 74.27% 74.17% -0.10%
+ Complexity 33264 33254 -10
============================================
Files 1902 1905 +3
Lines 148463 148591 +128
Branches 17213 17228 +15
============================================
- Hits 110267 110221 -46
- Misses 29404 29577 +173
- Partials 8792 8793 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Changed the implementation to setting channel auto-read to |
Added a motivation section to the Motivation
|
For the scenario that I described here, for different namespaces, which have different quantities of topics, we can hardly set an appropriate value for the rate-limiter. So the current implementation is better. |
The root cause is similar with apache/bookkeeper#4556. I think maybe we can consider close the channel if it has too many pending buffer to write. |
Agree with lari, I think the current impl can not solve the root cause. |
I don't think that there's a need to close the channel. For Pulsar brokers, the memory is already more or less bounded with |
Regarding the TopicListWatcher, it will consume a significant amount of heap memory if the topic names aren't deduplicated String instances: pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/TopicListService.java Line 47 in 3bae1d1
Guava's Interner class is a good way to deduplicate String instances. Example from Gradle source code: https://github.com/gradle/gradle/blob/master/platforms/core-runtime/base-services/src/main/java/org/gradle/api/internal/cache/StringInterner.java . The usage of java.lang.String.intern could be problematic since the GC of the instances isn't well defined.
|
Yes, but maybe we should call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be fine with this solution if it would be based on
ChannelOption.WRITE_BUFFER_WATER_MARK
settings and handle autoread
toggling with ServerCnxThrottleTracker
's
incrementThrottleCount
/decrementThrottleCount
in the
channelWritabilityChanged
callback method. There should be a separate
setting to enable it, since configuring WriteBufferWaterMark
high and
low values are separate from the throttling based on channel
writability. This solution would be useful with suitable setting, but it wouldn't be the correct way to fix the actual problem of broker OOM.
More details in email response, https://lists.apache.org/thread/2nknhw2rbspjbb4y2ocsgqh15yj7lzyh (previous one: https://lists.apache.org/thread/2qf12h5nh1m0n6y4t7szxz061dljw90f).
pulsar-common/src/main/java/org/apache/pulsar/common/protocol/PulsarDecoder.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More comments, I hope these are useful.
pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java
Outdated
Show resolved
Hide resolved
pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java
Show resolved
Hide resolved
pulsar-broker-common/src/main/java/org/apache/pulsar/broker/ServiceConfiguration.java
Outdated
Show resolved
Hide resolved
pulsar-broker-common/src/main/java/org/apache/pulsar/broker/ServiceConfiguration.java
Outdated
Show resolved
Hide resolved
pulsar-broker-common/src/main/java/org/apache/pulsar/broker/ServiceConfiguration.java
Outdated
Show resolved
Hide resolved
pulsar-broker-common/src/main/java/org/apache/pulsar/broker/ServiceConfiguration.java
Outdated
Show resolved
Hide resolved
pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java
Show resolved
Hide resolved
pulsar-broker-common/src/main/java/org/apache/pulsar/broker/ServiceConfiguration.java
Outdated
Show resolved
Hide resolved
@poorbarcode checkstyle:
A quick way to run checkstyle before pushing is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, good work and thanks for the patience @poorbarcode
pulsar-broker-common/src/main/java/org/apache/pulsar/broker/ServiceConfiguration.java
Outdated
Show resolved
Hide resolved
There's a flaky test, #24827. @poorbarcode Please check. |
…on WRITE_BUFFER_WATER_MARK to pulsar conf and pause receive requests when channel is unwritable (apache#24423)
Motivation
See the PIP-434
How it works
After setting
pulsarChannelPauseReceivingRequestsIfUnwritable
totrue
, the channel state will be changed as follows.channel.writable
tofalse
when there is too much data that is waiting to be sent out(the size of the data cached inChannelOutboundBuffer
is larger than{pulsarChannelWriteBufferHighWaterMark}
)channelWritabilityChanged
autoRead
of theNetty channel
.channel.writable
totrue
once the size of the data that is waiting to be sent out is less than{pulsarChannelWriteBufferLowWaterMark}
channel.autoRead
totrue
).With the settings
pulsarChannelPauseReceivingRequestsIfUnwritable=false
(default settings), the behaviour is exactly the same as the previous.Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: x