Skip to content

Conversation

poorbarcode
Copy link
Contributor

Motivation

We encountered an issue that the ledger of a topic was lost after restarting ZKs

image

After checking brokers' logs, we found that both brokers loaded up the same topic.

image

And we found the following logs that printed by broker-0

2025-06-24T14:57:27,492+0000 [metadata-store-10-1] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl - Received MetadataStore session event: SessionLost
2025-06-24T14:58:20,273+0000 [pulsar-load-manager-1-1] WARN  org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl - Found this broker:logging-broker-0:8080 has not registered yet. Trying to register it
2025-06-24T14:58:35,067+0000 [CompletableFutureDelayScheduler] WARN  org.apache.pulsar.broker.loadbalance.extensions.channel.ServiceUnitStateChannelImpl - logging-broker-0:8080 failed to wait for owner for serviceUnit:xxx/xxx/0xca6d5a6c_0xd448fc50; Trying to return the current owner:Optional[logging-broker-0:8080]
java.util.concurrent.TimeoutException: null
	at java.base/java.util.concurrent.CompletableFuture$Timeout.run(Unknown Source) ~[?:?]
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:?]
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
	at java.base/java.lang.Thread.run(Unknown Source) [?:?]

Root cause

  • broker-0 lost the ZK session
  • Pulsar cluster switches the topic owner from broker-0 to broker-2
  • broker-0 loses the events of bundle owner switching because it has lost the session
  • Issue: it still maintains the old value of the bundle owner, which indicates broker-0 is the topic's owner.

Modifications

  • Add new events when brokers lose the ZK session: itemOutdatedListeners
    • Broker will unload all bundles that received the event
    • Clear the cached, invalidated data of MetadataStoreTableViewImpl
  • Shutdown the broker if it can not recover the bundle states after the ZK session is reestablished.
  • Keep the behavior the same as before if users do not implement the new event listener itemOutdatedListeners, which guarantees compatibility

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: x

@poorbarcode poorbarcode added this to the 4.1.0 milestone Jul 2, 2025
@poorbarcode poorbarcode self-assigned this Jul 2, 2025
@poorbarcode poorbarcode added type/bug The PR fixed a bug or issue reported a bug ready-to-test release/4.0.6 labels Jul 2, 2025
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Jul 2, 2025
…rs, when enabled ServiceUnitStateMetadataStoreTableViewImpl
@codelipenghui
Copy link
Contributor

@poorbarcode Do you know why Ledger 1049720 gets deleted? Or is it expected to be deleted?

@poorbarcode
Copy link
Contributor Author

@codelipenghui

@poorbarcode Do you know why Ledger 1049720 gets deleted? Or is it expected to be deleted?

I am not sure of the exact details, I guess there is a possibility:

Since two brokers assumed it is the owner, they overwrite the cursor's metadata, then we get a wrong cursor md-position.

@BewareMyPower BewareMyPower changed the title [fix][broker]Data lost due to conflit loaded up a topic for two brokers, when enabled ServiceUnitStateMetadataStoreTableViewImpl [fix][broker]Data lost due to conflict loaded up a topic for two brokers, when enabled ServiceUnitStateMetadataStoreTableViewImpl Jul 7, 2025
@codecov-commenter
Copy link

codecov-commenter commented Jul 7, 2025

Codecov Report

❌ Patch coverage is 74.50980% with 26 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.22%. Comparing base (bbc6224) to head (195ad83).
⚠️ Report is 1250 commits behind head on master.

Files with missing lines Patch % Lines
...ata/tableview/impl/MetadataStoreTableViewImpl.java 69.33% 15 Missing and 8 partials ⚠️
...xtensions/channel/ServiceUnitStateChannelImpl.java 72.72% 2 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #24478      +/-   ##
============================================
+ Coverage     73.57%   74.22%   +0.65%     
- Complexity    32624    32819     +195     
============================================
  Files          1877     1868       -9     
  Lines        139502   145823    +6321     
  Branches      15299    16717    +1418     
============================================
+ Hits         102638   108244    +5606     
- Misses        28908    28991      +83     
- Partials       7956     8588     +632     
Flag Coverage Δ
inttests 26.70% <0.98%> (+2.11%) ⬆️
systests 23.30% <0.00%> (-1.02%) ⬇️
unittests 73.72% <74.50%> (+0.88%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...el/ServiceUnitStateMetadataStoreTableViewImpl.java 67.74% <100.00%> (ø)
...ensions/channel/ServiceUnitStateTableViewBase.java 94.87% <100.00%> (ø)
...ensions/channel/ServiceUnitStateTableViewImpl.java 73.49% <100.00%> (ø)
...sions/channel/ServiceUnitStateTableViewSyncer.java 69.84% <ø> (ø)
...xtensions/channel/ServiceUnitStateChannelImpl.java 86.09% <72.72%> (+0.78%) ⬆️
...ata/tableview/impl/MetadataStoreTableViewImpl.java 74.87% <69.33%> (ø)

... and 1085 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@Technoboy- Technoboy- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@codelipenghui codelipenghui merged commit ce102da into apache:master Jul 9, 2025
53 checks passed
@heesung-sohn
Copy link
Contributor

Hi guys!

I think this issue probably might have revealed a bigger problem.

In fact, by design, I think pulsar brokers can see temporary inconsistent ownership views because of network delay or zk session issues.

In this case, brokers could see conflict ownership views, but by the "pulsar fencing logic", only one broker should serve(write to) the topic, and the other brokers should close the topic/ledger from the fenced topic / metadata conflict exception.

Can we confirm if pulsar fencing logic works fine?

@heesung-sohn
Copy link
Contributor

Regarding this warn

2025-06-24T14:58:35,067+0000 [CompletableFutureDelayScheduler] WARN org.apache.pulsar.broker.loadbalance.extensions.channel.ServiceUnitStateChannelImpl - logging-broker-0:8080 failed to wait for owner for serviceUnit:xxx/xxx/0xca6d5a6c_0xd448fc50; Trying to return the current owner:Optional[logging-broker-0:8080]

I think we can consider failing new lookups/ownership assignments in this situation, when the broker lost metadata connections.

Returning the current candidate "owner" from the local info was added to improve availability, but it appears that we better optimize for consistency.

@heesung-sohn
Copy link
Contributor

Now I think this metadatastore, metadatacache, metadatatableview sync logic makes the code too complicated and fragile. We could consider removing this shadow hashmap(the hashmap in the tableview) and directly using the metadata cache for the metadata store tableview. (This requires handling CompletableFuture more gracefully)

poorbarcode added a commit that referenced this pull request Jul 15, 2025
…ers, when enabled ServiceUnitStateMetadataStoreTableViewImpl (#24478)

(cherry picked from commit ce102da)
@poorbarcode
Copy link
Contributor Author

Thanks @heesung-sn

In this case, brokers could see conflict ownership views, but by the "pulsar fencing logic", only one broker should serve(write to) the topic, and the other brokers should close the topic/ledger from the fenced topic / metadata conflict exception.
Can we confirm if pulsar fencing logic works fine?

It does not work fine, because the first owner has loss notifications because of a session lost.

I think we can consider failing new lookups/ownership assignments in this situation, when the broker lost metadata connections.
Returning the current candidate "owner" from the local info was added to improve availability, but it appears that we better optimize for consistency.

Good pointer, the current PR has fixed it. See also https://github.com/apache/pulsar/pull/24478/files#diff-191ed1df4f5804c8fd6cdaf909cca01d071a110c3e701bdc98352f99e7a92e8eR469

Now I think this metadatastore, metadatacache, metadatatableview sync logic makes the code too complicated and fragile. We could consider removing this shadow hashmap(the hashmap in the tableview) and directly using the metadata cache for the metadata store tableview. (This requires handling CompletableFuture more gracefully)

The map has another useful case: we need to unload owned topics once the broker state is not fine, the map maintains which bundle is owned by the current broker.

@heesung-sohn
Copy link
Contributor

It does not work fine, because the first owner has loss notifications because of a session lost.

Interesting. I thought the fencing logic is guarded by the quorum of bk, regardless of zk.

codelipenghui pushed a commit to codelipenghui/incubator-pulsar that referenced this pull request Jul 15, 2025
…ers, when enabled ServiceUnitStateMetadataStoreTableViewImpl (apache#24478)
@heesung-sohn
Copy link
Contributor

The map has another useful case: we need to unload owned topics once the broker state is not fine,

In fact, the original design was when the metadata store is unstable, pulsar should behave as-is at the best effort basis(minimal impact on existing clients) This "unloading all topics when lost zk connection" seems to be a new behavior.

the map maintains which bundle is owned by the current broker.

Ok. One brute-force way to sync cache and tv in this case is to re-init the tableview(refresh the tableview again when connection is reestablished.) I think you already added this logic in this PR.

@heesung-sohn
Copy link
Contributor

@poorbarcode I raised this PR to print warn log when they are out of sync, #24513

priyanshu-ctds pushed a commit to datastax/pulsar that referenced this pull request Jul 22, 2025
…ers, when enabled ServiceUnitStateMetadataStoreTableViewImpl (apache#24478)

(cherry picked from commit ce102da)
(cherry picked from commit 88666bd)
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Jul 24, 2025
…ers, when enabled ServiceUnitStateMetadataStoreTableViewImpl (apache#24478)

(cherry picked from commit ce102da)
(cherry picked from commit 88666bd)
KannarFr pushed a commit to CleverCloud/pulsar that referenced this pull request Sep 22, 2025
…ers, when enabled ServiceUnitStateMetadataStoreTableViewImpl (apache#24478)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-picked/branch-4.0 doc-not-needed Your PR changes do not impact docs ready-to-test release/4.0.6 type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants