Skip to content

feat: support configurable kube-api-qps and kube-api-burst for controller #2864

@houyuting

Description

@houyuting

What feature you would like to be added?

Add two command-line flags to the controller start command to allow configuring the Kubernetes client rate limiter:

  • --kube-api-qps (float32, default: 5) — maximum QPS to the API server from the controller client
  • --kube-api-burst (int, default: 10) — maximum burst size for the rate limiter

The implementation is a minimal, non-breaking change to cmd/operator/controller/start.go:

  1. Declare two package-level variables:
kubeAPIQPS float32
kubeAPIBurst int
  1. Register the flags in NewStartCommand():
command.Flags().Float32Var(&kubeAPIQPS, "kube-api-qps", 5, "Maximum QPS to the API server from the controller client.")
command.Flags().IntVar(&kubeAPIBurst, "kube-api-burst", 10, "Maximum burst for throttle from the controller client.")
  1. Apply to the rest.Config in start() before building the manager:
cfg, err := ctrl.GetConfig()
cfg.WarningHandler = rest.NoWarnings{}
cfg.QPS = kubeAPIQPS // add
cfg.Burst = kubeAPIBurst // add

Why is this needed?

In large-scale production environments with hundreds of concurrent SparkApplications, the controller frequently hits the default client-go rate limiter (QPS=5, burst=10). This causes API calls — particularly status updates after spark-submit — to return context canceled errors.

E0303 12:28:42.451089       9 leaderelection.go:429] Failed to update lock optimistically: client rate limiter Wait returned an error: context deadline exceeded, falling back to slow path
E0303 12:28:42.451167       9 leaderelection.go:436] error retrieving resource lock spark-xx/spark-operator-xx-controller-lock: client rate limiter Wait returned an error: context deadline exceeded
I0303 12:28:42.451186       9 leaderelection.go:297] failed to renew lease spark-xx/spark-operator-xx-controller-lock: context deadline exceeded
2026-03-03T12:28:42.451Z	ERROR	controller/start.go:341	Failed to start manager	{"error": "leader election lost"}
[github.com/kubeflow/spark-operator/v2/cmd/operator/controller.start](http://github.com/kubeflow/spark-operator/v2/cmd/operator/controller.start)
	/workspace/cmd/operator/controller/start.go:341
[github.com/kubeflow/spark-operator/v2/cmd/operator/controller.NewStartCommand.func2](http://github.com/kubeflow/spark-operator/v2/cmd/operator/controller.NewStartCommand.func2)
	/workspace/cmd/operator/controller/start.go:151
[github.com/spf13/cobra.(*Command).execute](http://github.com/spf13/cobra.(*Command).execute)
	/go/pkg/mod/[github.com/spf13/cobra@v1.10.1/command.go:1019](http://github.com/spf13/cobra@v1.10.1/command.go:1019)
[github.com/spf13/cobra.(*Command).ExecuteC](http://github.com/spf13/cobra.(*Command).ExecuteC)
	/go/pkg/mod/[github.com/spf13/cobra@v1.10.1/command.go:1148](http://github.com/spf13/cobra@v1.10.1/command.go:1148)
[github.com/spf13/cobra.(*Command).Execute](http://github.com/spf13/cobra.(*Command).Execute)
	/go/pkg/mod/[github.com/spf13/cobra@v1.10.1/command.go:1071](http://github.com/spf13/cobra@v1.10.1/command.go:1071)
main.main
	/workspace/cmd/operator/main.go:45
runtime.main
	/usr/local/go/src/runtime/proc.go:283

Describe the solution you would like

Expose --kube-api-qps and --kube-api-burst flags on the controller start command, consistent with the pattern used by kube-controller-manager and many other Kubernetes controllers.

The change is confined to cmd/operator/controller/start.go — no changes to internal controller logic are required. Default values (QPS=5, burst=10) match the current hardcoded behavior, so existing deployments are unaffected.

For Helm-based deployments, the flags can optionally be exposed via values.yaml:

controller:
   kubeAPIQPS: 5
   kubeAPIBurst: 10

Describe alternatives you have considered

No response

Additional context

  • Spark operator version: v2.x (kubeflow/spark-operator)
  • Kubernetes version: 1.31

Love this feature?

Give it a 👍 We prioritize the features with most 👍

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions