fix(es): add delay in TestPasswordFromFile to prevent flakiness#8150
fix(es): add delay in TestPasswordFromFile to prevent flakiness#8150Br1an67 wants to merge 1 commit intojaegertracing:mainfrom
Conversation
The ES client has background health check goroutines that may continue running briefly after Close() is called. When the password changes and a new client is created, the old client is closed but its health check may still make requests with the old password. This adds a small delay after the client changes to allow the old client's background goroutines to fully shut down before the test proceeds to verify the new password. Fixes jaegertracing#4743 Signed-off-by: root <root@C20251020184286.local>
There was a problem hiding this comment.
Pull request overview
Addresses flakiness in TestPasswordFromFile caused by background Elasticsearch client goroutines continuing briefly after Close() during password rotation, leading to occasional requests with the old credentials.
Changes:
- Adds a short delay after detecting the client swap in
runPasswordFromFileTestto reduce race-related flakiness.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Give the old client time to fully shut down its background goroutines | ||
| // (e.g., health checks) after being closed in onClientPasswordChange. | ||
| // This prevents flakiness where the old client might still make requests | ||
| // with the old password after the client has been swapped. | ||
| time.Sleep(100 * time.Millisecond) | ||
|
|
There was a problem hiding this comment.
Using a fixed time.Sleep here makes the test slower and can still be flaky on slow/loaded CI (the healthcheck goroutine may take longer than 100ms to stop). Since this test only validates auth headers on write requests, consider disabling background health checks in the test configuration (e.g., cfg.DisableHealthCheck = true) or replacing the sleep with a deterministic wait condition tied to observed requests (e.g., wait until no further requests with the old password are seen for a short window).
| // Give the old client time to fully shut down its background goroutines | |
| // (e.g., health checks) after being closed in onClientPasswordChange. | |
| // This prevents flakiness where the old client might still make requests | |
| // with the old password after the client has been swapped. | |
| time.Sleep(100 * time.Millisecond) |
| // (e.g., health checks) after being closed in onClientPasswordChange. | ||
| // This prevents flakiness where the old client might still make requests | ||
| // with the old password after the client has been swapped. | ||
| time.Sleep(100 * time.Millisecond) |
There was a problem hiding this comment.
timeouts do not improve flakiness, the solution needs to be deterministic.
b4cae47 to
5393205
Compare
Fixes #4743
Summary
The
TestPasswordFromFiletest was flaky because the Elasticsearch client has background health check goroutines that may continue running briefly afterClose()is called. When the password changes and a new client is created, the old client is closed but its health check goroutine might still make requests with the old password for a brief moment after the client swap.Changes
Testing
Root Cause
The olivere/elastic client library's
Stop()method signals background goroutines to stop but does not wait for them to complete. This can cause a race condition where:Close()is calledThe delay gives the old client's background operations time to fully terminate before the test proceeds.