[Bugfix] Implement upgrade-aware controller ordering for FE/BEs/CNs by jmjm15x · Pull Request #707 · StarRocks/starrocks-kubernetes-operator

jmjm15x · 2025-10-09T16:23:27Z

Description

Add upgrade sequencing control to the StarRocks Kubernetes Operator to ensure proper component ordering during both initial deployments and upgrades. Previously, the operator always used FE-first ordering, which is correct for initial deployments but incorrect for upgrades. According to StarRocks guidelines:

Initial Deployment: FE → BEs/CNs (FE must be leader before workers join)
Cluster Upgrades: BEs/CNs → FE (data nodes upgraded before metadata nodes)

From official documentation:

Upgrade procedure
By design, BEs and CNs are backward compatible with the FEs. Therefore, you need to upgrade BEs and CNs first and then FEs to allow your cluster to run properly while being upgraded. Upgrading them in an inverted order may lead to incompatibility between FEs and BEs/CNs, and thereby cause the service to crash.

Solution

Implemented a comprehensive upgrade detection and sequencing mechanism with robust component readiness validation to prevent premature progression between components.

Key Changes

1. Upgrade Detection (`isUpgrade()`)

Detects upgrade scenarios by checking if StatefulSets exist with pending changes, compares Generation vs ObservedGeneration to detect spec changes.

Why this approach?

Simple and reliable, uses Kubernetes native generation tracking
Works for any spec change (images, resources, configs)
Handles transient states, correctly identifies upgrade during rollout progress

2. Controller Ordering (`getControllersInOrder()`)

Dynamically switches controller execution order based on deployment scenario:

Upgrade scenario: [be, cn, fe, feproxy] (BE-first ordering)
Initial deployment: [fe, be, cn, feproxy] (FE-first ordering)

3. Component Readiness Validation (`isComponentReady()`)

Multi-layer validation to prevent premature component progression by avoiding race conditions, ensuring rollout stability, and providing enhanced logging for debugging.

Logic Flow

Implements waiting logic directly in the reconciliation loop:

Reconcile() called
    ↓
Get controllers in order based on isUpgrade()
    ↓
For each controller in order:
    ↓
    ├─ If upgrade && feController
    │   └─ Check BE/CN ready? → If NO, wait and requeue
    │
    ├─ Sync controller (create/update resources)
    │
    ├─ If initial && feController  
    │   └─ Check FE ready? → If NO, wait and requeue
    │
    └─ If upgrade && (beController || cnController)
        └─ Check component ready? → If NO, wait and requeue

End-to-End Test Results

Test Case 1: Initial FE+BE Deployment (v3.1.0)

Expected: FE-first ordering (FE must be ready before BE starts)

Timeline:
  T0:       FE Pod Created - 2025-10-09 07:29:58 UTC
  T0+48s:   BE Pod Created - 2025-10-09 07:30:46 UTC

Operator Logs:
  - "initial deployment: waiting for FE to be ready before creating BE/CN"
  - Component progression: feController → wait for FE ready → beController

Verification:
  - FE StatefulSet created first
  - BE StatefulSet created 48 seconds later (after FE became ready)

Test Case 2: Version Upgrade (v3.1.0 → v3.1.8)

Expected: BE-first ordering (BE must complete before FE starts)

Timeline:
  T0:       BE Pod Created - 2025-10-09 07:44:45 UTC
  T0+6s:    FE Pod Created - 2025-10-09 07:44:51 UTC

Operator Logs:
  - "component not ready: StatefulSet spec change not yet observed"
  - "component not ready: StatefulSet rollout in progress"
  - "component not ready: no ready endpoints"
  - "upgrade: waiting for component rollout to complete before proceeding"

Verification:
  - BE StatefulSet updated first (detected generation change)
  - BE pod rolled out to v3.1.8
  - FE update waited for BE rollout completion
  - FE pod rolled out to v3.1.8 after BE was ready

Test Case 3: Configuration Change (Memory: 2Gi → 4Gi)

Expected: BE-first ordering (config changes treated as upgrades)

Timeline:
  T0:       BE Pod Created - 2025-10-09 07:47:05 UTC
  T0+6s:    FE Pod Created - 2025-10-09 07:47:11 UTC

Operator Logs:
  - "component not ready: no ready endpoints"
  - "upgrade: waiting for component rollout to complete before proceeding"

Verification:
  - Configuration change correctly detected as upgrade scenario
  - BE rolled out first with new memory limits (4Gi)
  - FE waited for BE readiness before rolling out
  - Both components running with 4Gi memory

Verification

# Check initial deployment uses FE-first (wait message)
kubectl logs -n <namespace> deployment/kube-starrocks-operator | grep "initial deployment: waiting for FE"

# Check upgrade uses BE-first (wait message)
kubectl logs -n <namespace> deployment/kube-starrocks-operator | grep "upgrade: waiting for component rollout"

# Verify component readiness details
kubectl logs -n <namespace> deployment/kube-starrocks-operator | grep "component not ready"

# Check StatefulSet rollout sequence with timestamps
kubectl get statefulsets -n <namespace> -o json | jq -r '.items[] | "\(.metadata.name): \(.status.currentRevision) -> \(.status.updateRevision)"'

# Verify pod creation/recreation timestamps
kubectl get pods -n <namespace> -o custom-columns=NAME:.metadata.name,CREATED:.metadata.creationTimestamp

Checklist

For operator, please complete the following checklist:

run make generate to generate the code.
run golangci-lint run to check the code style (0 issues).
run make test to run UT (all controller tests passing).
run make manifests to update the yaml files of CRD.

For helm chart, please complete the following checklist:

make sure you have updated the values.yaml
file of starrocks chart.
In scripts directory, run bash create-parent-chart-values.sh to update the values.yaml file of the parent
chart( kube-starrocks chart).

CLAassistant · 2025-10-09T16:23:35Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ yandongxiao
❌ jmjm15x

jmjm15x seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

yandongxiao · 2025-10-09T16:36:16Z

The upgrade sequence you mentioned is indeed a problem, as it doesn't follow the rules. I'd like to ask, have you encountered any issues during this upgrade process?

jmjm15x · 2025-10-09T17:00:52Z

Not with this approach; I encountered a critical race condition during upgrades with pervious implementation I tried (#704). When the operator updated a StatefulSet's spec, it would immediately check component readiness using only endpoint availability. However, endpoints don't immediately reflect the new state - the old pods remain "ready" for a few seconds while Kubernetes starts the rollout. This caused FE to upgrade prematurely, before BE/CN completed their rollouts.

Fix: Implemented isComponentReady() with following validation:

Service endpoints exist
StatefulSet controller observed the spec change (ObservedGeneration check)
Rollout is complete (currentRevision == updateRevision)
All replicas are ready

Implementation approach: I kept the existing logic flow and controller structure intact, only enhancing the readiness checks for robustness. This ensures backward compatibility while fixing the race condition.

I include logs from few E2E tests with proper sequencing (BE/CN → FE) in the description.

pkg/controllers/controllers.go

pkg/controllers/starrockscluster_controller.go

yandongxiao · 2025-10-10T17:40:48Z

The upgrade sequence you mentioned is indeed a problem, as it doesn't follow the rules. I'd like to ask, have you encountered any issues during this upgrade process?

@jmjm15x What I want to express here is, when you use the current version of the operator and upgrade FE and BE simultaneously, did you encounter any issues? Currently, we have not received any other reports of issues caused by simultaneous FE/BE upgrades.

pkg/controllers/controllers.go

jmjm15x · 2025-10-16T00:47:42Z

The upgrade sequence you mentioned is indeed a problem, as it doesn't follow the rules. I'd like to ask, have you encountered any issues during this upgrade process?

@jmjm15x What I want to express here is, when you use the current version of the operator and upgrade FE and BE simultaneously, did you encounter any issues? Currently, we have not received any other reports of issues caused by simultaneous FE/BE upgrades.

@yandongxiao, sorry for the misunderstanding. Yes, I’ve seen instability during upgrades in our clusters. Our current mitigation is using custom scripts to enforce BE-first ordering for stability, but this workaround risks disrupting the operator workflow.

Fixes upgrade sequence issues and prevents premature component updates Key changes: - Add isUpgrade() detection based on StatefulSet existence - Implement getControllersInOrder() for scenario-based sequencing - Add isComponentReady() with endpoint, generation, rollout, and replica checks - Detect and log corrupted state (BE without FE) with recovery attempt Signed-off-by: jmjm15x <jmjm15x@gmail.com>

only Previously, any StatefulSet existence triggered BE-Frist ordering. Now only actual image changes trigger upgrade ordering, preventing unnecessary use of upgrade path for all changes. Remove the redundant checks in the reconcile method Signed-off-by: jmjm15x <jmjm15x@gmail.com>

yandongxiao · 2025-10-16T17:22:31Z

Please rebase your code to follow the latest main code. Some auto-tests was missing to be exectued, and I have fixed it.

* [Enhancement] Support arrow_flight_port Signed-off-by: yandongxiao <yandongxiao@starrocks.com> * [BugFix] fix failed test cases and add test cases for arrow flight Signed-off-by: yandongxiao <yandongxiao@starrocks.com> --------- Signed-off-by: yandongxiao <yandongxiao@starrocks.com>

jmjm15x · 2025-10-20T16:30:33Z

Please rebase your code to follow the latest main code. Some auto-tests was missing to be exectued, and I have fixed it.

Rebased from the main branch.

yandongxiao · 2025-10-20T17:07:19Z

Please fix the failed test cases, I think you can run make test in your local computer.

yandongxiao · 2025-10-20T17:18:13Z

Another question: I think this PR should exclude the third PR.

pkg/controllers/starrockscluster_controller.go

yandongxiao · 2025-10-20T17:23:42Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.1 out of 2 committers have signed the CLA.✅ yandongxiao❌ jmjm15x

jmjm15x seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@jmjm15x Please sign the CLA.

Signed-off-by: jmjm15x <jmjm15x@gmail.com>

jmjm15x · 2025-10-22T20:25:23Z

Please fix the failed test cases, I think you can run make test in your local computer.

Fixed the broken tests in last commit.

jmjm15x · 2025-10-22T20:26:43Z

Another question: I think this PR should exclude the third PR.

@yandongxiao what do you mean by 3rd PR?

jmjm15x · 2025-10-22T20:36:35Z

@yandongxiao I noticed a typo in my CLA signature, that's why it's pending. Could you revoke it so I can sign again?

yandongxiao · 2025-10-22T21:03:52Z

@yandongxiao I noticed a typo in my CLA signature, that's why it's pending. Could you revoke it so I can sign again?

@kevincai

yandongxiao · 2025-10-22T21:05:09Z

Another question: I think this PR should exclude the third PR.

@yandongxiao what do you mean by 3rd PR?

Sorry, there are errors in what I wrote. Now the PR contains four commits, but one commit is from yandongxiao.

jmjm15x · 2025-10-22T21:25:20Z

Another question: I think this PR should exclude the third PR.

@yandongxiao what do you mean by 3rd PR?

Sorry, there are errors in what I wrote. Now the PR contains four commits, but one commit is from yandongxiao.

I rebased from main and I think that's the reason for the 3rd commit

yandongxiao

some names need to be updated

yandongxiao · 2025-10-22T22:23:27Z

pkg/controllers/controllers.go

+	feSts := &appsv1.StatefulSet{}
+	feExists := kubeClient.Get(ctx, types.NamespacedName{
+		Namespace: cluster.Namespace,
+		Name:      cluster.Name + "-fe",


use load.Name(cluster.Name, cluster.Spec.StarRocksFeSpec)

yandongxiao · 2025-10-22T22:23:48Z

pkg/controllers/controllers.go

+	beSts := &appsv1.StatefulSet{}
+	beExists := kubeClient.Get(ctx, types.NamespacedName{
+		Namespace: cluster.Namespace,
+		Name:      cluster.Name + "-be",


use load.Name(cluster.Name, cluster.Spec.StarRocksBeSpec)

yandongxiao · 2025-10-22T22:24:41Z

pkg/controllers/controllers.go

+			return true // Component not configured, consider it ready
+		}
+		serviceName = rutils.ExternalServiceName(cluster.Name, cluster.Spec.StarRocksFeSpec)
+		statefulSetName = cluster.Name + "-fe"


use load.Name(cluster.Name, cluster.Spec.StarRocksFeSpec), and you can pass a nil pointer for the second parameter.

yandongxiao · 2025-10-22T22:24:59Z

pkg/controllers/controllers.go

+			return true
+		}
+		serviceName = rutils.ExternalServiceName(cluster.Name, cluster.Spec.StarRocksBeSpec)
+		statefulSetName = cluster.Name + "-be"


the same issue

yandongxiao · 2025-10-22T22:25:09Z

pkg/controllers/controllers.go

+			return true
+		}
+		serviceName = rutils.ExternalServiceName(cluster.Name, cluster.Spec.StarRocksCnSpec)
+		statefulSetName = cluster.Name + "-cn"


the same issuse

yandongxiao · 2025-10-22T22:33:11Z

pkg/controllers/controllers.go

+		Namespace: cluster.Namespace,
+		Name:      cluster.Name + "-be",
+	}, beSts) == nil
+


I think there is missing the CN check

yandongxiao · 2025-10-22T22:34:42Z

pkg/controllers/controllers.go

+	// Corrupted state safeguard: BE exists but FE doesn't (invalid configuration).
+	// Treat as initial deployment so FE is reconciled first.
+	// Rationale: FE is a prerequisite for BE/CN; prioritizing FE allows recovery without misordering.
+	if beExists && !feExists {


this is duplicated with the following !feExists condition

yandongxiao · 2025-10-22T22:38:22Z

pkg/controllers/controllers.go

+		return false
+	}
+
+	return checkForImageChanges(ctx, kubeClient, cluster)


The above code is detecting whether sts is existing, checkForImageChanges compare their images. My suggestion is can we merge them together?

if cluster spec fe exist, check whether sts exist, then check the image.

yandongxiao · 2025-10-22T22:45:43Z

pkg/controllers/starrockscluster_controller.go

+
+		// After syncing, check if we need to wait for this component to be ready before proceeding
+		// Initial deployment: Wait for FE to be ready before creating BE/CN
+		if !isUpgradeScenario && controllerName == r.FeController.GetControllerName() {


This brings up a potential issue, e.g. if FE fails to start due to certain reasons, such as excessive metadata or a long startup time, causing the probe to fail. In this case, users would surely want to modify the probe time, but the logic here prevents it.

In the sub controller logic, there is a fe.CheckFEReady(ctx, be.Client, src.Namespace, src.Name) check, if FE is not ready, BE/CN will stop doing reconcile.

If it is an upgrade scenario, then after the FE image is updated, the Operator will no longer consider it an upgrade scenario. So, your logic here is to wait for the FE here until it becomes Ready when the next sync occurs. I don't think this operation seems very necessary. What do you think?

jmjm15x mentioned this pull request Oct 9, 2025

[Bugfix] Implement upgrade-aware controller ordering for FE/BEs/CNs #704

Closed

6 tasks

jmjm15x force-pushed the bugfix/fe-be-upgrade-sequence branch from 6b24c5d to 3aaaa15 Compare October 9, 2025 16:33

yandongxiao requested changes Oct 9, 2025

View reviewed changes

pkg/controllers/controllers.go Show resolved Hide resolved

pkg/controllers/starrockscluster_controller.go Show resolved Hide resolved

pkg/controllers/starrockscluster_controller.go Outdated Show resolved Hide resolved

pkg/controllers/starrockscluster_controller.go Show resolved Hide resolved

jmjm15x commented Oct 10, 2025

View reviewed changes

pkg/controllers/controllers.go Show resolved Hide resolved

jmjm15x force-pushed the bugfix/fe-be-upgrade-sequence branch from 16c0162 to 7063994 Compare October 16, 2025 07:17

jmjm15x added 2 commits October 16, 2025 00:21

jmjm15x force-pushed the bugfix/fe-be-upgrade-sequence branch from 7063994 to abc305e Compare October 16, 2025 07:21

yandongxiao requested changes Oct 20, 2025

View reviewed changes

repair unit tests broken by recent refactoring

1e441b6

Signed-off-by: jmjm15x <jmjm15x@gmail.com>

yandongxiao requested changes Oct 22, 2025

View reviewed changes

Conversation

jmjm15x commented Oct 9, 2025

Description

Solution

Key Changes

1. Upgrade Detection (isUpgrade())

2. Controller Ordering (getControllersInOrder())

3. Component Readiness Validation (isComponentReady())

Logic Flow

End-to-End Test Results

Test Case 1: Initial FE+BE Deployment (v3.1.0)

Test Case 2: Version Upgrade (v3.1.0 → v3.1.8)

Test Case 3: Configuration Change (Memory: 2Gi → 4Gi)

Verification

Checklist

Uh oh!

CLAassistant commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yandongxiao commented Oct 9, 2025

Uh oh!

jmjm15x commented Oct 9, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yandongxiao commented Oct 10, 2025

Uh oh!

Uh oh!

jmjm15x commented Oct 16, 2025

Uh oh!

yandongxiao commented Oct 16, 2025

Uh oh!

jmjm15x commented Oct 20, 2025

Uh oh!

yandongxiao commented Oct 20, 2025

Uh oh!

yandongxiao commented Oct 20, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yandongxiao commented Oct 20, 2025

Uh oh!

jmjm15x commented Oct 22, 2025

Uh oh!

jmjm15x commented Oct 22, 2025

Uh oh!

jmjm15x commented Oct 22, 2025

Uh oh!

yandongxiao commented Oct 22, 2025

Uh oh!

yandongxiao commented Oct 22, 2025

Uh oh!

jmjm15x commented Oct 22, 2025

Uh oh!

yandongxiao left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

1. Upgrade Detection (`isUpgrade()`)

2. Controller Ordering (`getControllersInOrder()`)

3. Component Readiness Validation (`isComponentReady()`)

CLAassistant commented Oct 9, 2025 •

edited

Loading