Skip to content

Commit 7ab48a9

Browse files
committed
KEP-3751: add error handling
1 parent 5ae3e6e commit 7ab48a9

File tree

3 files changed

+107
-7
lines changed

3 files changed

+107
-7
lines changed

keps/sig-storage/3751-volume-attributes-class/README.md

Lines changed: 107 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,8 @@
3434
- [Create PVC and Create Volume](#create-pvc-and-create-volume)
3535
- [Delete PVC](#delete-pvc)
3636
- [Modify PVC](#modify-pvc)
37-
- [Implementation & Handling Failure](#implementation--handling-failure)
37+
- [Handle failures](#handle-failures)
38+
- [Implementation](#implementation)
3839
- [Test Plan](#test-plan)
3940
- [Prerequisite testing updates](#prerequisite-testing-updates)
4041
- [Unit tests](#unit-tests)
@@ -400,10 +401,42 @@ Note:
400401

401402
The parameters in VolumeAttributesClass are opaque and immutable. This gives different cloud providers flexibility but at the same time it is up to the specific driver to verify the values set in the parameters. The parameters from VolumeAttributesClass associated with a volume are mutable because they are coming from different VolumeAttributesClasses.
402403

404+
**Deal with Volume Reverted to nil VAC**
405+
406+
After reverting VAC name to nil, it will not be fully reverted to the previous state, because Kubernetes does not actually know the previous state of the volume. The volume will keep occupying the quota of previously specified VAC. Now user can:
407+
* If there is no quota, user may not care about this, and just leave it there
408+
* User can set a (potentially different) VAC name to try again
409+
* Admin can check the volume status manually and reset PVC.status.modifyStatus to nil to fully revert the change.
410+
411+
This may also confuse the higher level controller (e.g. [KEP-4650](https://github.com/kubernetes/enhancements/issues/4650)) when reconciling PVC. Higher level controllers can also special casing nil VAC names to skip waiting for the modification.
412+
403413
### Risks and Mitigations
404414

415+
**Parameter Conflicts**
416+
405417
Users may configure both parameters in StorageClass and parameters in VolumeAttributesClass differently. The mitigation here is to provide a guideline of solving conflicts. Drivers should return an error if they are conflicting. If the MODIFY_VOLUME capability is present, VolumeAttributesClass should be honored.
406418

419+
**Quota Abuse**
420+
421+
For administrators who want to set up quota on the VACs to control the expense, he should set up each VAC to fully specify the state of the volume. That is to say, each VAC should fully overwrite changes to every other VAC.
422+
423+
Consider this bad setup that violates the previous requirement.
424+
* VAC A: `{"bandwidth": "100Mbps"}`
425+
* VAC B: `{"bandwidth": "200Mbps", "iops": "2000"}`
426+
427+
Users can switch from A to B then back to A to get 2000 iops without using his quota on VAC B.
428+
429+
If this requirement is met, Kubernetes promises to administrators: every PVC with `Status.ModifyVolumeStatus == nil` is properly covered by quota.
430+
431+
While Kubernetes will work with storage providers to prevent quota abuse at best effort, the state of volumes with non-nil ModifyVolumeStatus is undefined.
432+
Consider when a volume is modified from VAC A to B, C, D, E in sequence, all failed with Internal error, the actual state of the volume can be any combination of the 5 VACs, but we cannot charge the quota of all 5 VACs for this volume.
433+
434+
To mitigate this, admin should properly setup to VACs and actively monitor the number of volumes with non-nil ModifyVolumeStatus, and driving them to the final state.
435+
436+
CSI drivers should try to avoid making volume stuck at partially modified states to improve user experience and avoid quota abuse.
437+
438+
CSI drivers should try their best to verify all the parameters before applying any modifications. It should make light-weight and reversible changes first and non-reversible changes last, so that if something still goes wrong in the middle, users can revert the volume to the previous state.
439+
407440
## Proposed Changes
408441

409442
As part of this proposal, we are proposing:
@@ -429,7 +462,7 @@ A. Count of bound/unbound PVCs per VolumeAttributesClass similar to [existing PV
429462

430463
Prior to this enhancement, we loop through all PVCs, check if `pvc.Status.Phase == v1.VolumeBound` and increment the relevant metric only on `namespace` dimension. When the feature flag is enabled, new metrics will take into account `namespace`, `storage_class`, and `volume_attribute_class`.
431464

432-
With these additional labels, cluster operators can alarm on these new metrics to detect PVCs that are not able to bind. With the additional StorageClass and VolumeAttributesClass name labels, cluster operators can more easily check whether VolumeAttributeClass or StorageClass object misconfiguration is the cause of these binding issues.
465+
With these additional labels, cluster operators can alarm on these new metrics to detect PVCs that are not able to bind. With the additional StorageClass and VolumeAttributesClass name labels, cluster operators can more easily check whether VolumeAttributesClass or StorageClass object misconfiguration is the cause of these binding issues.
433466

434467
```
435468
boundPVCCountWithVACDesc = metrics.NewDesc(
@@ -683,11 +716,78 @@ spec:
683716

684717
ModifyVolume is only allowed on bound PVCs. Under the ModifyVolume call, it will pass in the mutable parameters and do the update operation based on the `VolumeAttributesClass` parameters.
685718

686-
![ModifyVolume Flow Diagram](./VolumeAttributesClass-ModifyVolume-Flow-v3.png)
687-
688-
### Implementation & Handling Failure
689-
690-
VolumeAttributesClass parameters can be considered as best-effort parameters, the CSI driver should report the status of bad parameters as INVALID_ARGUMENT and the volume would fall back to a workable default configuration.
719+
##### Handle failures
720+
721+
The design principle of failure handling:
722+
**Acurate state sync**:
723+
when `pvc.Status.CurrentVolumeAttributesClassName == pvc.Spec.VolumeAttributesClassName && pvc.Status.ModifyVolumeStatus == nil`,
724+
the PVC reaches the specified state, the volume is guaranteed to have user specified parameters applied. More formally: `ControllerModifyVolume` has been called with parameters in the specified VAC, returned OK, and no other `ControllerModifyVolume` has been called after that.
725+
726+
This is important for user to ensure proper function of workload.
727+
This also helps to ensure the volumes are properly covered by quota when they reaches the stable state.
728+
729+
To balance the predictable and intuitive system behavior with the quota abuse possibility,
730+
we classify the errors returned by `ControllerModifyVolume` into 3 categories and handle them differently:
731+
- Non-final errors (such as `DeadlineExceeded`), indicating volume modification is likely still in-progress.
732+
In this case, We will always retry the request with the same parameters to allow the operation to complete.
733+
This policy safeguards against potential quota abuse that can occur if users time their requests strategically.
734+
- Final errors (such as `Internal`), indicating storage provider failed to modify the volume and likely no longer processing the request.
735+
In this case, We allow changing the parameters to recover from the error.
736+
- Infeasible Errors (e.g., `InvalidArgument`): This is a subset of final errors indicating the request itself is invalid and is not likely to succeed when retried.
737+
We will retry at lower frequency.
738+
If the `Spec.VolumeAttributesClassName` is set to nil, we will not retry the request.
739+
740+
For details, please refer to the flow chart below.
741+
742+
##### Implementation
743+
744+
Flow diagram describes how external-resizer modifies the volume:
745+
![ModifyVolume Flow Diagram](./VolumeAttributesClass-Modify-Flow.jpg)
746+
where
747+
* `spec`: `pvc.spec.volumeAttributesClassName`
748+
* `cur`: `pvc.status.currentVolumeAttributesClassName`
749+
* `target`: `pvc.status.modifyVolumeState.targetVolumeAttributesClassName`
750+
* `status`: `pvc.status.modifyVolumeState.status`. resetting status to nil also implies resetting `target` to nil.
751+
752+
To further verify the correctness, this tables lists all possible states and the corresponding actions:
753+
754+
| \# | spec | cur | status | target | uncertain | action | OK | final | infeasible | non-final |
755+
| -: | --- | --- | ------ | --- | ----- | -: | -: | -: | -: | -: |
756+
| 1 | nil | nil | nil | N/A | false | 1 | | | | |
757+
| 2 | nil | nil | InProg | A | \* | A | 4 | 2 | 3 | 2 |
758+
| 3 | nil | A | Infeas | A | \* | 3 | | | | |
759+
| 4 | nil | A | nil | N/A | false | 4 | | | | |
760+
| 5 | A | nil | nil | N/A | false | 6 | | | | |
761+
| 6 | A | nil | InProg | A | false | A | 9 | 6 | 8 | 7 |
762+
| 7 | A | nil | InProg | A | true | A | 9 | 6 | 8 | 7 |
763+
| 8 | A | nil | Infeas | A | false | A | 9 | 6 | 8 | 7 |
764+
| 9 | A | A | nil | N/A | false | 9 | | | | |
765+
| 10 | B | nil | InProg | A | false | 6 | | | | |
766+
| 11 | B | nil | InProg | A | true | A | 13 | 10 | 12 | 11 |
767+
| 12 | B | nil | Infeas | A | false | 6 | | | | |
768+
| 13 | B | A | nil | N/A | false | 14 | | | | |
769+
| 14 | B | A | InProg | B | false | B | 9 | 14 | 16 | 15 |
770+
| 15 | B | A | InProg | B | true | B | 9 | 14 | 16 | 15 |
771+
| 16 | B | A | Infeas | B | false | B | 9 | 14 | 16 | 15 |
772+
| 17 | C | A | InProg | B | false | 14 | | | | |
773+
| 18 | C | A | InProg | B | true | B | 13 | 17 | 19 | 18 |
774+
| 19 | C | A | Infeas | B | false | 14 | | | | |
775+
| 20 | A | A | InProg | B | false | 23 | | | | |
776+
| 21 | A | A | InProg | B | true | B | 13 | 20 | 22 | 21 |
777+
| 22 | A | A | Infeas | B | false | 23 | | | | |
778+
| 23 | A | A | InProg | A | false | A | 9 | 23 | 25 | 24 |
779+
| 24 | A | A | InProg | A | true | A | 9 | 23 | 25 | 24 |
780+
| 25 | A | A | Infeas | A | false | A | 9 | 23 | 25 | 24 |
781+
| 26 | B | A | InProg | A | false | 14 | | | | |
782+
| 27 | B | A | InProg | A | true | A | 13 | 26 | 28 | 27 |
783+
| 28 | B | A | Infeas | A | false | 14 | | | | |
784+
785+
Letter A in the action column means call ControllerModifyVolume with parameters of VAC A.
786+
787+
Number 1 in the last five columns means we should transit to state 1 in this case.
788+
If action equals to self, we have reached the final state (1, 3, 4, 9) and should stop reconcile until user modifies the spec again.
789+
790+
One can verify this contains all states by arbitrarily changing the spec and verify it will still hit a listed state.
691791

692792
### Test Plan
693793

242 KB
Loading

0 commit comments

Comments
 (0)