Skip to content

Conversation

Samrat002
Copy link

Purpose

Linked issue: close #1677

Fix the flaky test RemoteLogDownloaderTest#testPrefetchNum

Brief change log

Root Cause Analysis

The RemoteLogDownloader was acquiring a semaphore permit before polling for a download task. If the queue was empty, it would hold the permit while idle, causing a transient dip in the available permit count. This created a race condition that made the testPrefetchNum test fail randomly.

Approach

I've flipped the logic in RemoteLogDownloader.fetchOnce(). Poll for a request first. Only then, try to acquire a permit.

If no permit is available, the request is re-queued, ensuring work isn't lost and priority is maintained. This change prevents the downloader from reserving capacity when it has no work to do.

Tests

ensure RemoteLogDownloaderTest#testDownloadLogInParallelAndInPriority passes multiple times

API and Format

N/A

Documentation

N/A

@Samrat002
Copy link
Author

@wuchong PTAL whenever time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RemoteLogDownloaderTest#testPrefetchNum is unstable

1 participant