Skip to content

Conversation

@eshitachandwani
Copy link
Member

@eshitachandwani eshitachandwani commented Dec 17, 2025

Part of A74 changes. This PR changes the cdsbalancer to use the XDSConfig for resources and remove the cluster watcher from cds balancer.
Also removes the EDS/DNS watchers from the balancers and remove cluster resolver policy completely.
Also moves the e2e tests in the clusterresolver package to cdsbalancer package.

The PR also exports the pick first LB config to be used in tests, will change it back when #8733 is merged and we can configure round robin as child policy for DNS clusters.

RELEASE NOTES:

  • xds:
    • Ambient error for cluster resource will now only be logged in dependency manager and not propagated to LB policies.
    • When a re-resolution is requested, all the LOGICAL_DNS type clusters will be re-resolved as opposed to just one.
    • When a listener or route resource error is received, the in-flight RPCs will now fail.

@eshitachandwani eshitachandwani added this to the 1.79 Release milestone Dec 17, 2025
@codecov
Copy link

codecov bot commented Dec 18, 2025

Codecov Report

❌ Patch coverage is 76.56250% with 60 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.22%. Comparing base (4046676) to head (e720945).
⚠️ Report is 14 commits behind head on master.

Files with missing lines Patch % Lines
internal/xds/balancer/cdsbalancer/cdsbalancer.go 65.13% 36 Missing and 17 partials ⚠️
internal/xds/xdsdepmgr/xds_dependency_manager.go 90.00% 4 Missing and 1 partial ⚠️
internal/xds/balancer/cdsbalancer/configbuilder.go 93.54% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8777      +/-   ##
==========================================
- Coverage   83.42%   83.22%   -0.20%     
==========================================
  Files         418      411       -7     
  Lines       32897    32529     -368     
==========================================
- Hits        27443    27072     -371     
- Misses       4069     4074       +5     
+ Partials     1385     1383       -2     
Files with missing lines Coverage Δ
balancer/pickfirst/pickfirst.go 89.62% <100.00%> (+0.50%) ⬆️
...ds/balancer/cdsbalancer/configbuilder_childname.go 100.00% <ø> (ø)
internal/xds/clusterspecifier/rls/rls.go 62.50% <100.00%> (ø)
internal/xds/resolver/serviceconfig.go 86.04% <100.00%> (-2.02%) ⬇️
internal/xds/resolver/xds_resolver.go 86.95% <100.00%> (-1.81%) ⬇️
internal/xds/balancer/cdsbalancer/configbuilder.go 93.15% <93.54%> (ø)
internal/xds/xdsdepmgr/xds_dependency_manager.go 88.29% <90.00%> (+7.42%) ⬆️
internal/xds/balancer/cdsbalancer/cdsbalancer.go 73.60% <65.13%> (-11.73%) ⬇️

... and 33 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@easwars
Copy link
Contributor

easwars commented Dec 18, 2025

The test is still failing.

Comment on lines +882 to +884
// DependencyManagerKey is the type used as the key to store DependencyManager
// in the Attributes field of resolver.states.
type DependencyManagerKey struct{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key type should be private. Other packages must use the public getters and setters.

Comment on lines +959 to +962
// GetRefCount returns the reference count for a particluar cluster.
func (c *ClusterRef) GetRefCount() int32 {
return atomic.LoadInt32(&c.refCount)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Methods that simply return the value of an atomic are prone to race conditions and difficult to use correctly. A caller might read GetRefCount and take action, but the value could change before that action completes. I'm reviewing the surrounding code to see if this race actually manifests here.


func (pickfirstBuilder) ParseConfig(js json.RawMessage) (serviceconfig.LoadBalancingConfig, error) {
var cfg pfConfig
var cfg PfConfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not need to export the config here. In the test, we cast use the pickfirst builder into a config parser and make it return the struct for a particular json. Example:

grpclbConfigParser = balancer.Get("grpclb").(balancer.ConfigParser)

Exporting the config would allow external users to refer to the symbol and removing it would be a breaking change.

m.dnsSerializerCancel()
for name, dnsResolver := range m.dnsResolvers {
dnsResolver.stop()
if dnsResolver.extras.dnsR != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When can dnsResolver.extras be nil? If it is nil, why is it being stored in the map?

// We cannot wait for the dns serializer to finish here, as the callbacks
// try to grab the dependency manager lock, which is already held here.
m.dnsSerializerCancel()
for name, dnsResolver := range m.dnsResolvers {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable naming seems confusing here. The field named dnsResolver seems to the resource state of a logical DNS cluster. Should the field name be updated?

@arjan-bal
Copy link
Contributor

Just realized that this PR has been split as requested. I'm continuing the review on the smaller PRs. Please address these comments in the smaller PRs and then close this one.

@arjan-bal arjan-bal removed their assignment Dec 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants