-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
What feature you would like to be added?
Add two command-line flags to the controller start command to allow configuring the Kubernetes client rate limiter:
--kube-api-qps(float32, default: 5) — maximum QPS to the API server from the controller client--kube-api-burst(int, default: 10) — maximum burst size for the rate limiter
The implementation is a minimal, non-breaking change to cmd/operator/controller/start.go:
- Declare two package-level variables:
kubeAPIQPS float32
kubeAPIBurst int- Register the flags in NewStartCommand():
command.Flags().Float32Var(&kubeAPIQPS, "kube-api-qps", 5, "Maximum QPS to the API server from the controller client.")
command.Flags().IntVar(&kubeAPIBurst, "kube-api-burst", 10, "Maximum burst for throttle from the controller client.")
- Apply to the rest.Config in start() before building the manager:
cfg, err := ctrl.GetConfig()
cfg.WarningHandler = rest.NoWarnings{}
cfg.QPS = kubeAPIQPS // add
cfg.Burst = kubeAPIBurst // add
Why is this needed?
In large-scale production environments with hundreds of concurrent SparkApplications, the controller frequently hits the default client-go rate limiter (QPS=5, burst=10). This causes API calls — particularly status updates after spark-submit — to return context canceled errors.
E0303 12:28:42.451089 9 leaderelection.go:429] Failed to update lock optimistically: client rate limiter Wait returned an error: context deadline exceeded, falling back to slow path
E0303 12:28:42.451167 9 leaderelection.go:436] error retrieving resource lock spark-xx/spark-operator-xx-controller-lock: client rate limiter Wait returned an error: context deadline exceeded
I0303 12:28:42.451186 9 leaderelection.go:297] failed to renew lease spark-xx/spark-operator-xx-controller-lock: context deadline exceeded
2026-03-03T12:28:42.451Z ERROR controller/start.go:341 Failed to start manager {"error": "leader election lost"}
[github.com/kubeflow/spark-operator/v2/cmd/operator/controller.start](http://github.com/kubeflow/spark-operator/v2/cmd/operator/controller.start)
/workspace/cmd/operator/controller/start.go:341
[github.com/kubeflow/spark-operator/v2/cmd/operator/controller.NewStartCommand.func2](http://github.com/kubeflow/spark-operator/v2/cmd/operator/controller.NewStartCommand.func2)
/workspace/cmd/operator/controller/start.go:151
[github.com/spf13/cobra.(*Command).execute](http://github.com/spf13/cobra.(*Command).execute)
/go/pkg/mod/[github.com/spf13/cobra@v1.10.1/command.go:1019](http://github.com/spf13/cobra@v1.10.1/command.go:1019)
[github.com/spf13/cobra.(*Command).ExecuteC](http://github.com/spf13/cobra.(*Command).ExecuteC)
/go/pkg/mod/[github.com/spf13/cobra@v1.10.1/command.go:1148](http://github.com/spf13/cobra@v1.10.1/command.go:1148)
[github.com/spf13/cobra.(*Command).Execute](http://github.com/spf13/cobra.(*Command).Execute)
/go/pkg/mod/[github.com/spf13/cobra@v1.10.1/command.go:1071](http://github.com/spf13/cobra@v1.10.1/command.go:1071)
main.main
/workspace/cmd/operator/main.go:45
runtime.main
/usr/local/go/src/runtime/proc.go:283
Describe the solution you would like
Expose --kube-api-qps and --kube-api-burst flags on the controller start command, consistent with the pattern used by kube-controller-manager and many other Kubernetes controllers.
The change is confined to cmd/operator/controller/start.go — no changes to internal controller logic are required. Default values (QPS=5, burst=10) match the current hardcoded behavior, so existing deployments are unaffected.
For Helm-based deployments, the flags can optionally be exposed via values.yaml:
controller:
kubeAPIQPS: 5
kubeAPIBurst: 10Describe alternatives you have considered
No response
Additional context
- Spark operator version: v2.x (kubeflow/spark-operator)
- Kubernetes version: 1.31
Love this feature?
Give it a 👍 We prioritize the features with most 👍