Skip to content

Conversation

mogliang
Copy link

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #12399

/area clustercache

@k8s-ci-robot k8s-ci-robot added the area/clustercache Issues or PRs related to the clustercachetracker label Jun 25, 2025
Copy link

linux-foundation-easycla bot commented Jun 25, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Jun 25, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign enxebre for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

Welcome @mogliang!

It looks like this is your first PR to kubernetes-sigs/cluster-api 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/cluster-api has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @mogliang. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 25, 2025
@mogliang mogliang force-pushed the dev/mogliang/fixcache branch from 4142631 to 49bbe5d Compare June 27, 2025 02:56
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 27, 2025
@mogliang
Copy link
Author

added unit test

@sivchari
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 28, 2025
@@ -511,6 +512,16 @@ func (cc *clusterCache) Reconcile(ctx context.Context, req reconcile.Request) (r
requeueAfterDurations = append(requeueAfterDurations, accessor.config.HealthProbe.Interval)
Copy link
Member

@sbueringer sbueringer Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are hitting tooManyConsecutiveFailures in your scenario, right?

Would it also be enough for your use case to make HealthProbe.Timeout/Interval/FailureThreshold configurable?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really.
We have proxy between mgmt cluster and target clusters.
we see after kubeconfig updated (proxy address changed), the existing connection (clustercache probe) still works, so it doesn't refetch kubeconfig and still cache the old one, but new connections fails (etcd client)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if the health check would open a new connection it would detect it, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's correct~ another idea is to disconnect for a given time (e.g. 5m) period to force refresh connection. will this be better?

Copy link
Member

@sbueringer sbueringer Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just try to extend the health check that it does two requests. One over the existing connection one with a new connection

Copy link
Member

@sbueringer sbueringer Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

client-go will reuse the underlying transport (except create spdy transport).

Do you know where that happens? Somewhere in rest.HTTPClientFor or rest.TransportFor or transport.New?

Copy link
Author

@mogliang mogliang Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in transport.New

if restconfig has .Tranport, it will just use it, if not, it will get from cache.

// New returns an http.RoundTripper that will provide the authentication
// or transport level security defined by the provided Config.
func New(config *Config) (http.RoundTripper, error) {
	// Set transport level security
	if config.Transport != nil && (config.HasCA() || config.HasCertAuth() || config.HasCertCallback() || config.TLS.Insecure) {
		return nil, fmt.Errorf("using a custom transport with TLS certificate options or the insecure flag is not allowed")
	}

	if !isValidHolders(config) {
		return nil, fmt.Errorf("misconfigured holder for dialer or cert callback")
	}

	var (
		rt  http.RoundTripper
		err error
	)

	if config.Transport != nil {
		rt = config.Transport
	} else {
		rt, err = tlsCache.get(config)
		if err != nil {
			return nil, err
		}
	}

Copy link
Member

@sbueringer sbueringer Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if there is a good way to pass in a transport, without duplicating too much code

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let it check it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbueringer
https://github.com/kubernetes-sigs/cluster-api/compare/release-1.9...mogliang:cluster-api:dev/qliang/v1.9.5-fixetcd2?expand=1

copied the implemenation from tlscache.get.

need mention that, establish tcp connection takes ~1sec in our case, and normal ping may take ~100sec. so, this do adds some reconcile time.

@mogliang mogliang requested a review from sbueringer July 28, 2025 01:49
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 30, 2025
@mogliang mogliang force-pushed the dev/mogliang/fixcache branch from 2a67216 to 36e1ea5 Compare August 19, 2025 06:01
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Aug 19, 2025
@mogliang mogliang force-pushed the dev/mogliang/fixcache branch from 730e1a5 to 754d34a Compare August 20, 2025 05:52
@mogliang
Copy link
Author

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clustercache Issues or PRs related to the clustercachetracker cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ClusterCache doesn't pick latest kubeconfig secret proactively
4 participants