[Bugfix] Implement upgrade-aware controller ordering for FE/BEs/CNs#707
[Bugfix] Implement upgrade-aware controller ordering for FE/BEs/CNs#707jmjm15x wants to merge 4 commits intoStarRocks:mainfrom
Conversation
|
jmjm15x seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
6b24c5d to
3aaaa15
Compare
|
The upgrade sequence you mentioned is indeed a problem, as it doesn't follow the rules. I'd like to ask, have you encountered any issues during this upgrade process? |
|
Not with this approach; I encountered a critical race condition during upgrades with pervious implementation I tried (#704). When the operator updated a StatefulSet's spec, it would immediately check component readiness using only endpoint availability. However, endpoints don't immediately reflect the new state - the old pods remain "ready" for a few seconds while Kubernetes starts the rollout. This caused FE to upgrade prematurely, before BE/CN completed their rollouts. Fix: Implemented
Implementation approach: I kept the existing logic flow and controller structure intact, only enhancing the readiness checks for robustness. This ensures backward compatibility while fixing the race condition. I include logs from few E2E tests with proper sequencing (BE/CN → FE) in the description. |
@jmjm15x What I want to express here is, when you use the current version of the operator and upgrade FE and BE simultaneously, did you encounter any issues? Currently, we have not received any other reports of issues caused by simultaneous FE/BE upgrades. |
@yandongxiao, sorry for the misunderstanding. Yes, I’ve seen instability during upgrades in our clusters. Our current mitigation is using custom scripts to enforce BE-first ordering for stability, but this workaround risks disrupting the operator workflow. |
16c0162 to
7063994
Compare
Fixes upgrade sequence issues and prevents premature component updates Key changes: - Add isUpgrade() detection based on StatefulSet existence - Implement getControllersInOrder() for scenario-based sequencing - Add isComponentReady() with endpoint, generation, rollout, and replica checks - Detect and log corrupted state (BE without FE) with recovery attempt Signed-off-by: jmjm15x <jmjm15x@gmail.com>
only Previously, any StatefulSet existence triggered BE-Frist ordering. Now only actual image changes trigger upgrade ordering, preventing unnecessary use of upgrade path for all changes. Remove the redundant checks in the reconcile method Signed-off-by: jmjm15x <jmjm15x@gmail.com>
7063994 to
abc305e
Compare
|
Please rebase your code to follow the latest main code. Some auto-tests was missing to be exectued, and I have fixed it. |
* [Enhancement] Support arrow_flight_port Signed-off-by: yandongxiao <yandongxiao@starrocks.com> * [BugFix] fix failed test cases and add test cases for arrow flight Signed-off-by: yandongxiao <yandongxiao@starrocks.com> --------- Signed-off-by: yandongxiao <yandongxiao@starrocks.com>
Rebased from the main branch. |
|
Please fix the failed test cases, I think you can run |
|
Another question: I think this PR should exclude the third PR. |
@jmjm15x Please sign the CLA. |
Signed-off-by: jmjm15x <jmjm15x@gmail.com>
Fixed the broken tests in last commit. |
@yandongxiao what do you mean by 3rd PR? |
|
@yandongxiao I noticed a typo in my CLA signature, that's why it's pending. Could you revoke it so I can sign again? |
|
Sorry, there are errors in what I wrote. Now the PR contains four commits, but one commit is from yandongxiao. |
I rebased from main and I think that's the reason for the 3rd commit |
yandongxiao
left a comment
There was a problem hiding this comment.
some names need to be updated
| feSts := &appsv1.StatefulSet{} | ||
| feExists := kubeClient.Get(ctx, types.NamespacedName{ | ||
| Namespace: cluster.Namespace, | ||
| Name: cluster.Name + "-fe", |
There was a problem hiding this comment.
use load.Name(cluster.Name, cluster.Spec.StarRocksFeSpec)
| beSts := &appsv1.StatefulSet{} | ||
| beExists := kubeClient.Get(ctx, types.NamespacedName{ | ||
| Namespace: cluster.Namespace, | ||
| Name: cluster.Name + "-be", |
There was a problem hiding this comment.
use load.Name(cluster.Name, cluster.Spec.StarRocksBeSpec)
| return true // Component not configured, consider it ready | ||
| } | ||
| serviceName = rutils.ExternalServiceName(cluster.Name, cluster.Spec.StarRocksFeSpec) | ||
| statefulSetName = cluster.Name + "-fe" |
There was a problem hiding this comment.
use load.Name(cluster.Name, cluster.Spec.StarRocksFeSpec), and you can pass a nil pointer for the second parameter.
| return true | ||
| } | ||
| serviceName = rutils.ExternalServiceName(cluster.Name, cluster.Spec.StarRocksBeSpec) | ||
| statefulSetName = cluster.Name + "-be" |
| return true | ||
| } | ||
| serviceName = rutils.ExternalServiceName(cluster.Name, cluster.Spec.StarRocksCnSpec) | ||
| statefulSetName = cluster.Name + "-cn" |
| Namespace: cluster.Namespace, | ||
| Name: cluster.Name + "-be", | ||
| }, beSts) == nil | ||
|
|
There was a problem hiding this comment.
I think there is missing the CN check
| // Corrupted state safeguard: BE exists but FE doesn't (invalid configuration). | ||
| // Treat as initial deployment so FE is reconciled first. | ||
| // Rationale: FE is a prerequisite for BE/CN; prioritizing FE allows recovery without misordering. | ||
| if beExists && !feExists { |
There was a problem hiding this comment.
this is duplicated with the following !feExists condition
| return false | ||
| } | ||
|
|
||
| return checkForImageChanges(ctx, kubeClient, cluster) |
There was a problem hiding this comment.
The above code is detecting whether sts is existing, checkForImageChanges compare their images. My suggestion is can we merge them together?
- if cluster spec fe exist, check whether sts exist, then check the image.
|
|
||
| // After syncing, check if we need to wait for this component to be ready before proceeding | ||
| // Initial deployment: Wait for FE to be ready before creating BE/CN | ||
| if !isUpgradeScenario && controllerName == r.FeController.GetControllerName() { |
There was a problem hiding this comment.
This brings up a potential issue, e.g. if FE fails to start due to certain reasons, such as excessive metadata or a long startup time, causing the probe to fail. In this case, users would surely want to modify the probe time, but the logic here prevents it.
There was a problem hiding this comment.
In the sub controller logic, there is a fe.CheckFEReady(ctx, be.Client, src.Namespace, src.Name) check, if FE is not ready, BE/CN will stop doing reconcile.
There was a problem hiding this comment.
If it is an upgrade scenario, then after the FE image is updated, the Operator will no longer consider it an upgrade scenario. So, your logic here is to wait for the FE here until it becomes Ready when the next sync occurs. I don't think this operation seems very necessary. What do you think?
Description
Add upgrade sequencing control to the StarRocks Kubernetes Operator to ensure proper component ordering during both initial deployments and upgrades. Previously, the operator always used FE-first ordering, which is correct for initial deployments but incorrect for upgrades. According to StarRocks guidelines:
From official documentation:
Solution
Implemented a comprehensive upgrade detection and sequencing mechanism with robust component readiness validation to prevent premature progression between components.
Key Changes
1. Upgrade Detection (
isUpgrade())Detects upgrade scenarios by checking if StatefulSets exist with pending changes, compares
GenerationvsObservedGenerationto detect spec changes.Why this approach?
2. Controller Ordering (
getControllersInOrder())Dynamically switches controller execution order based on deployment scenario:
[be, cn, fe, feproxy](BE-first ordering)[fe, be, cn, feproxy](FE-first ordering)3. Component Readiness Validation (
isComponentReady())Multi-layer validation to prevent premature component progression by avoiding race conditions, ensuring rollout stability, and providing enhanced logging for debugging.
Logic Flow
Implements waiting logic directly in the reconciliation loop:
End-to-End Test Results
Test Case 1: Initial FE+BE Deployment (v3.1.0)
Expected: FE-first ordering (FE must be ready before BE starts)
Test Case 2: Version Upgrade (v3.1.0 → v3.1.8)
Expected: BE-first ordering (BE must complete before FE starts)
Test Case 3: Configuration Change (Memory: 2Gi → 4Gi)
Expected: BE-first ordering (config changes treated as upgrades)
Verification
Checklist
For operator, please complete the following checklist:
make generateto generate the code.golangci-lint runto check the code style (0 issues).make testto run UT (all controller tests passing).make manifeststo update the yaml files of CRD.For helm chart, please complete the following checklist:
file of starrocks chart.
scriptsdirectory, runbash create-parent-chart-values.shto update the values.yaml file of the parentchart( kube-starrocks chart).