Skip to content

Commit a44c119

Browse files
committed
Remove zero leading and update checklist
Signed-off-by: Heba Elayoty <[email protected]>
1 parent 6b49ddf commit a44c119

File tree

1 file changed

+65
-85
lines changed
  • keps/sig-scheduling/5471-enable-sla-based-scheduling

1 file changed

+65
-85
lines changed

keps/sig-scheduling/5471-enable-sla-based-scheduling/README.md

Lines changed: 65 additions & 85 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,6 @@
1818
- [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)
1919
- [Risks and Mitigations](#risks-and-mitigations)
2020
- [Scheduler Performance Regression](#scheduler-performance-regression)
21-
- [Edge Cases in Numeric Parsing](#edge-cases-in-numeric-parsing)
2221
- [Taint Misconfiguration Detection](#taint-misconfiguration-detection)
2322
- [Controller Hot-Loop When Feature Gate is Disabled](#controller-hot-loop-when-feature-gate-is-disabled)
2423
- [Cross-SIG Impact](#cross-sig-impact)
@@ -69,7 +68,7 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
6968
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
7069
- [ ] (R) Production readiness review completed
7170
- [ ] (R) Production readiness review approved
72-
- [ ] "Implementation History" section is up-to-date for milestone
71+
- [x] "Implementation History" section is up-to-date for milestone
7372
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
7473
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
7574

@@ -405,7 +404,7 @@ spec:
405404

406405
- **Non-Numeric Taint Values**: When a pod toleration uses `Lt` or `Gt` operators, it only matches taints with numeric values. If a node has a taint with a non-numeric value, the toleration will not match, and the pod cannot schedule on that node.
407406

408-
**Example**:
407+
**Example**:
409408
- Node taint: `node.kubernetes.io/sla=high:NoSchedule`
410409
- Pod toleration: `{key: "node.kubernetes.io/sla", operator: "Gt", value: "900"}`
411410
- **Result**: Toleration does not match and pod cannot schedule on this node
@@ -432,20 +431,6 @@ spec:
432431
- Consider caching parsed values in scheduler data structures if performance issues arise
433432
- Feature gate allows disabling if performance problems occur
434433

435-
#### Edge Cases in Numeric Parsing
436-
437-
**Risk**: Unexpected behavior with edge cases like integer overflow, leading zeros, or malformed input could cause scheduling failures. Leading zeros in values (e.g., `"0950"`) could create user confusion about whether values are treated as strings or numbers.
438-
439-
**Mitigation**:
440-
441-
- Use Go's standard `strconv.ParseInt()` with well-defined error handling
442-
- Comprehensive unit tests covering edge cases (overflow, underflow, malformed strings, leading zeros)
443-
- API validation rejects pods with unparseable values rather than silently failing
444-
- **API validation explicitly rejects values with leading zeros** when using numeric operators to eliminate confusion
445-
- Clear error messages help users identify and fix configuration issues
446-
- Documentation clearly states that leading zeros are not permitted for numeric operators
447-
- **Performance validation via scheduler-perf tests** to ensure no measurable scheduling latency degradation from integer parsing overhead
448-
449434
#### Taint Misconfiguration Detection
450435

451436
**Risk**: Node taints intended for numeric comparison may contain non-numeric values (e.g., `node.kubernetes.io/sla=high` instead of `node.kubernetes.io/sla=950`). Since taint values are not validated at node registration time, these misconfigurations are only detected during scheduling when a pod with `Lt`/`Gt` tolerations attempts to match. This can lead to pods remaining in `Pending` state without clear indication of the root cause.
@@ -558,12 +543,6 @@ func validateTolerations(tolerations []core.Toleration, fldPath *field.Path) fie
558543
toleration.Value, "value must be a valid integer for numeric operators"))
559544
continue
560545
}
561-
562-
// Reject values with leading zeros to prevent confusion
563-
if len(toleration.Value) > 1 && toleration.Value[0] == '0' && toleration.Value != "0" {
564-
allErrors = append(allErrors, field.Invalid(idxPath.Child("value"),
565-
toleration.Value, "leading zeros are not allowed in numeric values (use '950' instead of '0950')"))
566-
}
567546
}
568547
}
569548
return allErrors
@@ -625,7 +604,7 @@ N/A
625604

626605
##### Unit tests
627606

628-
All core changes must be covered by unit tests, in both Taint API, validation, and scheduler sides. Tests must specifically cover leading zeros behavior (e.g., `"0950"` vs `"950"`):
607+
All core changes must be covered by unit tests, in both Taint API, validation, and scheduler sides.
629608

630609
- `staging/src/k8s.io/api/core/v1/toleration_test.go`: Sep-16-2025 - 66.7%
631610
- `staging/src/k8s.io/component-helpers/scheduling/corev1/helpers_test.go`: Sep-16-2025 - 100%
@@ -649,7 +628,7 @@ Update the following integration tests to include new operators:
649628
- Dynamic taint addition/removal
650629
- Pod rescheduling after taint changes
651630
- Integration with NodeAffinity
652-
- Feature gate on/off
631+
- Feature gate on/off
653632

654633
##### e2e tests
655634

@@ -682,9 +661,11 @@ The existing e2e tests will be extended to cover the new taints cases introduced
682661
### Upgrade / Downgrade Strategy
683662

684663
#### Upgrade
664+
685665
Enable the feature gate in kube-apiserver first then kube-scheduler. This ensures the API server can accept and validate pods with the new operators before the kube-scheduler tries to process them.
686666

687667
#### Downgrade
668+
688669
Disable the feature gate in in kube-scheduler then kube-apiserver. Since we want to stop the kube-scheduler from processing the new operators first, then stop the API server from accepting new pods with those operators. This prevents the scheduler from trying to handle features the API server would reject.
689670

690671
**What happens when the scheduler doesn't recognize Gt/Lt operators:**
@@ -695,7 +676,7 @@ When the feature gate is disabled and the scheduler encounters a pod with `Gt`/`
695676
- Pod is considered to have untolerated taints
696677
- Filter returns `UnschedulableAndUnresolvable` status
697678
- Pod remains in Pending state.
698-
- Feature gate on/off test cases
679+
- Feature gate on/off test cases
699680

700681
### Version Skew Strategy
701682

@@ -728,12 +709,12 @@ Impact on existing pods with Gt/Lt operators when feature is disabled:
728709

729710
1. **Already-running pods**: Continue running normally. The kubelet doesn't need to re-evaluate tolerations for running pods.
730711

731-
2. **Unscheduled/pending pods**:
712+
2. **Unscheduled/pending pods**:
732713
- Remain in the cluster but cannot be scheduled
733714
- The scheduler's TaintToleration plugin won't recognize Gt/Lt operators and will treat them as non-matching
734715
- These pods will remain in Pending state with events indicating untolerated taints
735716

736-
3. **New pod creation**:
717+
3. **New pod creation**:
737718
- API server validation will **reject** new pods with Gt/Lt operators
738719
- Error: `spec.tolerations[].operator: Unsupported value: "Gt": supported values: "Equal", "Exists"`
739720

@@ -744,6 +725,7 @@ Impact on existing pods with Gt/Lt operators when feature is disabled:
744725
###### What happens if we reenable the feature if it was previously rolled back?
745726

746727
Extended toleration operators will be respected again:
728+
747729
- Existing pods with Gt/Lt operators in etcd become valid and schedulable
748730
- New pods can be created with Gt/Lt operators
749731
- The scheduler will properly evaluate numeric comparisons
@@ -768,8 +750,9 @@ Tests have been added in the integration tests. See [Integration tests](#integra
768750
3. Workload controllers (Deployments, StatefulSets, etc.):
769751
- If the pod template uses Gt/Lt operators, the controller cannot create new pods
770752
- Rolling updates will fail
771-
753+
772754
**Recommended rollback procedure to prevent hot loop**:
755+
773756
1. Update identified workloads to use `Equal` or remove numeric tolerations
774757
2. Delete pending pods that use `Lt`/`Gt` operators
775758
3. Disable feature gate in kube-scheduler first, then kube-apiserver
@@ -901,67 +884,64 @@ N/A
901884
## Implementation History
902885

903886
- 2025-08-11: Initial KEP
887+
- 2025-10-15: KEP for alpha is merged
888+
- 2025-10-20: Update checklist and remove zero leading risk to match affinity behavior as agreed
904889

905890
## Drawbacks
906891

907892
## Alternatives
908893

909894
There are many different alternatives were considered:
910895

911-
1. **New Dedicated SLA API Resource:** Create `SLAPolicy` CRD
912-
- **Pros:** Clean separation, rich policy definitions.
913-
- **Cons:** New API surface, additional complexity, breaks unified taint/toleration model.
914-
2. **Custom Scheduler Plugin:** Use scheduling plugin with SLA-aware logic, [placement-policy-scheduler-plugins](https://github.com/Azure/placement-policy-scheduler-plugins)
915-
- **Pros:** Full scheduling control, rich logic possible
916-
- **Cons:**
917-
- Out-of-tree scheduler plugin to maintain and manage
918-
- Doesn't leverage existing taint/toleration infrastructure.
919-
3. **Node Labels + Enhanced NodeAffinity:** Use labels instead of taints, extend NodeAffinity matching.
920-
- **Pros:** Leverages existing label system.
921-
- **Cons:**
922-
- No default push-back behavior
923-
- No eviction semantics
924-
- Labels aren't meant for operational constraints.
925-
926-
4. **Add Separate `NumValue int64` Field:** Add a dedicated numeric field alongside the existing `Value string` field in Taint/Toleration structs.
927-
- **Pros:**
928-
- Eliminates parsing overhead and errors
929-
- Type-safe integer handling
930-
- No concerns about leading zeros or malformed values
931-
- Better performance for numeric comparisons
932-
- **Cons:**
933-
- Not aesthetically pleasing API design with dual fields
934-
- Users might set wrong field or both fields accidentally
935-
- Complex validation logic for field combinations
936-
- Memory/storage overhead for additional field
937-
- API complexity and documentation burden
938-
939-
5.**Use Existing `Equal` Operator with Numeric Values (No New Operators):**
940-
941-
Instead of introducing `Lt`/`Gt`, use the existing `Equal` operator with numeric taint values. For example:
942-
- Node: `node.kubernetes.io/sla=950:NoSchedule`
943-
- Pod: `{key: "node.kubernetes.io/sla", operator: "Equal", value: "950"}`
944-
945-
**Pros:**
946-
- No API changes needed
947-
948-
**Cons:**
949-
- Pods must specify exact SLA values, not ranges. A pod cannot say "accept any node with SLA > 950"
950-
- Multiple tolerations required: If nodes have varying SLA values (e.g., 950, 960, 970, 980, 990), pods need separate `Equal` tolerations for each value they're willing to accept:
951-
```yaml
952-
tolerations:
953-
- key: node.kubernetes.io/sla
954-
operator: Equal
955-
value: "950"
956-
- key: node.kubernetes.io/sla
957-
operator: Equal
958-
value: "960"
959-
- key: node.kubernetes.io/sla
960-
operator: Equal
961-
value: "970"
962-
# ... and so on
963-
```
964-
- Poor semantics for "best effort" workloads since you can't easily express "I'll take any spot/preemptible node regardless of SLA" without enumerating all possible low-SLA values
965-
- Changes to node SLA classification schemes require updating all pod manifests
966-
896+
1. **New Dedicated SLA API Resource:** Create `SLAPolicy` CRD
897+
- **Pros:** Clean separation, rich policy definitions.
898+
- **Cons:** New API surface, additional complexity, breaks unified taint/toleration model.
899+
2. **Custom Scheduler Plugin:** Use scheduling plugin with SLA-aware logic, [placement-policy-scheduler-plugins](https://github.com/Azure/placement-policy-scheduler-plugins)
900+
- **Pros:** Full scheduling control, rich logic possible
901+
- **Cons:**
902+
- Out-of-tree scheduler plugin to maintain and manage
903+
- Doesn't leverage existing taint/toleration infrastructure.
904+
3. **Node Labels + Enhanced NodeAffinity:** Use labels instead of taints, extend NodeAffinity matching.
905+
- **Pros:** Leverages existing label system.
906+
- **Cons:**
907+
- No default push-back behavior
908+
- No eviction semantics
909+
- Labels aren't meant for operational constraints.
910+
911+
4. **Add Separate `NumValue int64` Field:** Add a dedicated numeric field alongside the existing `Value string` field in Taint/Toleration structs.
912+
- **Pros:**
913+
- Eliminates parsing overhead and errors
914+
- Type-safe integer handling
915+
- Better performance for numeric comparisons
916+
- **Cons:**
917+
- Not aesthetically pleasing API design with dual fields
918+
- Users might set wrong field or both fields accidentally
919+
- Complex validation logic for field combinations
920+
- Memory/storage overhead for additional field
921+
- API complexity and documentation burden
922+
923+
5.**Use Existing `Equal` Operator with Numeric Values (No New Operators):** Instead of introducing `Lt`/`Gt`, use the existing `Equal` operator with numeric taint values. For example:
924+
1. Node: `node.kubernetes.io/sla=950:NoSchedule`
925+
2. Pod: `{key: "node.kubernetes.io/sla", operator: "Equal", value: "950"}`
926+
- **Pros:**
927+
- No API changes needed
928+
- **Cons:**
929+
- Pods must specify exact SLA values, not ranges. A pod cannot say "accept any node with SLA > 950"
930+
- Multiple tolerations required: If nodes have varying SLA values (e.g., 950, 960, 970, 980, 990), pods need separate `Equal` tolerations for each value they're willing to accept:
931+
```yaml
932+
tolerations:
933+
- key: node.kubernetes.io/sla
934+
operator: Equal
935+
value: "950"
936+
- key: node.kubernetes.io/sla
937+
operator: Equal
938+
value: "960"
939+
- key: node.kubernetes.io/sla
940+
operator: Equal
941+
value: "970"
942+
...
943+
```
944+
- Poor semantics for "best effort" workloads since you can't easily express "I'll take any spot/preemptible node regardless of SLA" without enumerating all possible low-SLA values
945+
- Changes to node SLA classification schemes require updating all pod manifests
946+
967947
## Infrastructure Needed (Optional)

0 commit comments

Comments
 (0)