Skip to content

Conversation

@oneby-wang
Copy link
Contributor

@oneby-wang oneby-wang commented Oct 29, 2025

Fixes #24879

Motivation

Partitioned-topic auto deletion call chain:

  1. gc process(invoked on broker0)
  2. delete partitioned-topic admin api(invoked on broker0)
  3. delete topic-partition-0 admin api(invoked on broker0)

If brokerClient's connectionsPerBroker is 1, then broker0 wait itself to release the connection, the result is timeout. So we check topic-partition-0 existence first to avoid connection pool deadlock.

See also: #24879 (comment)

Modifications

When calling delete partitioned-topic admin api, first check topic partitions existence to avoid calling admin api by broker itself again, which would help partitioned-topic auto deletion in gc process.

Verifying this change

  • Make sure that the change passes the CI checks.

This change is already covered by existing tests, such as (please describe tests).

InactiveTopicDeleteTest#testWhenSubPartitionNotDelete() test method.

@Test
public void testWhenSubPartitionNotDelete() throws Exception {
conf.setBrokerDeleteInactiveTopicsMode(InactiveTopicDeleteMode.delete_when_no_subscriptions);
conf.setBrokerDeleteInactivePartitionedTopicMetadataEnabled(true);
conf.setBrokerDeleteInactiveTopicsFrequencySeconds(1);
super.baseSetup();
final String topic = "persistent://prop/ns-abc/testDeleteWhenNoSubscriptions";
final TopicName topicName = TopicName.get(topic);
admin.topics().createPartitionedTopic(topic, 5);
pulsarClient.newProducer().topic(topic).create().close();
pulsarClient.newConsumer().topic(topic).subscriptionName("sub").subscribe().close();
Thread.sleep(2000);
// Topic should not be deleted
Assert.assertTrue(admin.topics().getPartitionedTopicList("prop/ns-abc").contains(topic));
admin.topics().deleteSubscription(topic, "sub");
Awaitility.await().untilAsserted(() -> {
// Now the topic should be deleted
Assert.assertFalse(admin.topics().getPartitionedTopicList("prop/ns-abc").contains(topic));
});
}

But the default connectionsPerBroker is 16, which can not reproduce the problem.

public PulsarAdminBuilderImpl() {
this.conf = new ClientConfigurationData();
this.conf.setConnectionsPerBroker(16);
}

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: oneby-wang#8

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Oct 29, 2025
@oneby-wang oneby-wang changed the title [fix][admin] Fix partitioned topic auto deletion http request timeout [fix][broker] Fix partitioned topic auto deletion http request timeout Oct 31, 2025
@Technoboy-
Copy link
Contributor

Could you please add a test for this pr ?

@oneby-wang
Copy link
Contributor Author

oneby-wang commented Dec 3, 2025

Hi, @Technoboy-, thanks for reminding me. This PR is covered by existing test, see InactiveTopicDeleteTest#testWhenSubPartitionNotDelete() test method.

But I configured the admin connectionsPerBroker param of PulsarService incorrectly before, now it is correct, see commit: edaff51.

Without this PR's fix, InactiveTopicDeleteTest#testWhenSubPartitionNotDelete() test method will fail due to assert timeout, because admin client connection pool(just 1 connection) is exhausted due to connection pool deadlock, and test failure messages like this:

org.awaitility.core.ConditionTimeoutException: Assertion condition defined as a org.apache.pulsar.broker.service.InactiveTopicDeleteTest expected [false] but found [true] within 10 seconds.
	at org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:167)
	at org.awaitility.core.AssertionCondition.await(AssertionCondition.java:119)
	at org.awaitility.core.AssertionCondition.await(AssertionCondition.java:31)
	at org.awaitility.core.ConditionFactory.until(ConditionFactory.java:985)
	at org.awaitility.core.ConditionFactory.untilAsserted(ConditionFactory.java:769)

This PR fixes the issue, so it can pass now.

@oneby-wang
Copy link
Contributor Author

oneby-wang commented Dec 3, 2025

Beyond this PR's issue , one more issue exists in #24879.

Race condition when deleting partitioned topic, see #24879 (comment). I haven't figured out the side effects of the race condition yet, and haven't found a good solution to solve the race condition. It seems like a distributed lock or cas operation is needed, because multi brokers may fire the GC process at the same time, but this solution needs more discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-not-needed Your PR changes do not impact docs ready-to-test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Auto topic delete of partitioned topics has a MetadataCacheImpl race condition

2 participants