Skip to content

backport PR 43277#43538

Open
fergian94 wants to merge 1 commit intoenvoyproxy:release/v1.34from
fergian94:release/v1.34
Open

backport PR 43277#43538
fergian94 wants to merge 1 commit intoenvoyproxy:release/v1.34from
fergian94:release/v1.34

Conversation

@fergian94
Copy link

Backport of the original #43277. Below text is copied from there:

Fixes #43116

Commit Message: healthcheck: defer health check until cluster finishes warming

Additional Description:
Active health checks started before the cluster finished initializing, causing failures when the cluster required SDS secrets for upstream connections. Health checks failed because the transport socket configuration was not ready, leading to intermittent failures at startup and when recreating the cluster to update the SDS reference, such as during certificate rotation..

See #43116 for full problem description.

This change defers the start of health checks until the cluster has finished warming, ensuring that all necessary configurations (including SDS secrets) are fetched before checks begin.

Risk Level: Low

Testing:
Added test to verify that health checks are deferred until the cluster warming state is cleared.

Docs Changes: N/A

Release Notes:
health_check: Fixed a race condition where active health checks could start before required upstream TLS SDS secrets were fetched, causing intermittent health check failures.

Platform Specific Features: N/A

@repokitteh-read-only
Copy link

Hi @fergian94, welcome and thank you for your contribution.

We will try to review your Pull Request as quickly as possible.

In the meantime, please take a look at the contribution guidelines if you have not done so already.

🐱

Caused by: #43538 was opened by fergian94.

see: more, trace.

@fergian94 fergian94 requested a deployment to external-contributors February 18, 2026 16:31 — with GitHub Actions Waiting
@repokitteh-read-only
Copy link

CC @envoyproxy/runtime-guard-changes: FYI only for changes made to (source/common/runtime/runtime_features.cc).

🐱

Caused by: #43538 was opened by fergian94.

see: more, trace.

Copy link
Member

@phlax phlax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for working on this @fergian94 - a couple of issues with backport

RUNTIME_GUARD(envoy_restart_features_fix_dispatcher_approximate_now);
RUNTIME_GUARD(envoy_restart_features_skip_backing_cluster_check_for_sds);
RUNTIME_GUARD(envoy_restart_features_use_eds_cache_for_ads);
RUNTIME_GUARD(envoy_restart_features_validate_http3_pseudo_headers);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this is a bad merge/pick


// TODO(yavlasov): Enabling by default will be hugely disruptive to existing traffic.
// Replace with a config option (default off) post CVE release.
FALSE_RUNTIME_GUARD(envoy_reloadable_features_reject_early_connect_data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments