|
| 1 | +# Design |
| 2 | + |
| 3 | +This document describes the interaction between `EtcdCluster` custom resources and other Kubernetes |
| 4 | +primitives and gives an overview of the underlying implementation. |
| 5 | + |
| 6 | +## Reconciliation flowchart |
| 7 | + |
| 8 | +```mermaid |
| 9 | +flowchart TD |
| 10 | + Start(Start) --> A[Ensure service.] |
| 11 | + A --> AA{Are there any\nendpoints?} |
| 12 | + AA --> |Yes| AAA[Connect to the cluster\nand fetch all statuses.] |
| 13 | + AAA --> |Got some response| AAAA{All reachable\nmembers have the\nsame cluster ID?} |
| 14 | + AAAA --> |Yes| AAAAA{Is cluster\nin quorum?} |
| 15 | + AAAAA --> |Yes| AAAAAA{Are all members \nmanaged by the operator?} |
| 16 | + AAAAAA --> |Yes| AAAAAAA["` |
| 17 | + Promote any learners. |
| 18 | + Ensure configmap with initial cluster matching existing members and cluster state=existing. |
| 19 | + Ensure StatefulSet with replicas = max member ordinal + 1 |
| 20 | + `"] |
| 21 | + AAAAAAA --> |OK| AAAAAAAA{Are all\nmembers healthy?} |
| 22 | + AAAAAAAA --> |Yes| AAAAAAAAA{Are all STS pods present\nin the member list?} |
| 23 | + AAAAAAAAA --> |Yes| AAAAAAAAAA{Is the\nEtcdCluster\nsize equal to the\nStatefulSet\nsize?} |
| 24 | + AAAAAAAAAA -->|Yes| AAAAAAAAAAA[Set cluster\nstatus to ready.] |
| 25 | + AAAAAAAAAAA --> HappyStop([Stop]) |
| 26 | +
|
| 27 | + AAAAAAAAAA --> |No, desired\nsize larger| AAAAAAAAAAB[Ensure ConfigMap with\ninitial cluster state existing\nand initial cluster URLs\nequal to current cluster\nplus one member, do\n'member add' API call and\nincrement StatefulSet size.] |
| 28 | + AAAAAAAAAAB --> ScaleUpStop([Stop]) |
| 29 | +
|
| 30 | + AAAAAAAAAA --> |No, desired\nsize smaller| AAAAAAAAAAC[Member remove API\ncall, then decrement\nStatefulSet size\nthen delete PVC.] |
| 31 | + AAAAAAAAAAC --> ScaleDownStop([Stop]) |
| 32 | +
|
| 33 | + AAAAAAAAAA --> |Etcd replicas=0\nSTS replicas=1| AAAAAAAAAAD[Decrement\nSTS to zero] |
| 34 | + AAAAAAAAAAD --> ScaleToZeroStop([Stop]) |
| 35 | +
|
| 36 | + AAAAAAAA --> |No| AAAAAAAAB1[On timeout evict member.] |
| 37 | + AAAAAAAAB1 --> AAAAAAAAB2[Delete PVC, ensure ConfigMap with\nmembers + this one and delete pod.] |
| 38 | +
|
| 39 | + AAAAAAAAA --> |No| AAAAAAAAB2 |
| 40 | +
|
| 41 | + AAAAAAA -->|Error| AAAAAAAB([Requeue]) |
| 42 | +
|
| 43 | + AAAAAA --> |No| AAAAAAB([Not implemented,\nstop.]) |
| 44 | +
|
| 45 | + AAAAA --> |No| AAAAAB([Quorum Loss Detected: |
| 46 | + 1. Check for temporary issues: |
| 47 | + - Network partitions |
| 48 | + - Pod scheduling problems |
| 49 | + 2. If temporary, wait for recovery |
| 50 | + 3. If permanent: |
| 51 | + - Alert operators |
| 52 | + - Document disaster recovery steps |
| 53 | + - Consider backup restoration]) |
| 54 | +
|
| 55 | + AAAA --> |No| AAAAB[Cluster is in\nsplit-brain. Set\nerror status.] |
| 56 | + AAAAB --> AAAABStop([Stop]) |
| 57 | +
|
| 58 | + AAA --> |No members\nreached| AAAB{Is the STS\npresent?} |
| 59 | + AAAB --> |Yes| AAABA{"`Does it have the correct pod spec?`"} |
| 60 | + AAABA --> |Yes| AAABAA(["`The statefulset cannot be ready, as the ready and liveness probes must be failing. Hope it becomes ready or wait for user intervention.`"]) |
| 61 | + AAABA --> |No| AAABAB["`Patch the podspec`"] |
| 62 | +
|
| 63 | + AAAB --> |No| AAABB(["`Looks like it was deleted with cascade=orphan. Create it again and see what happens`"]) |
| 64 | +
|
| 65 | + AA --> |No| AAB{Is the STS\npresent?} |
| 66 | + AAB --> |Yes| AABA{Does it have the\ncorrect pod spec?} |
| 67 | + AABA --> |Yes| AABAA{Is it\nready?} |
| 68 | + AABAA --> |Yes| AABAAA{Then it must have\nspec.replicas==0\n Is EtcdCluster\n.spec.replicas==0?} |
| 69 | + AABAAA --> |Yes| AABAAAA([Cluster successfully\nscaled to zero, stop.]) |
| 70 | + AABAAA --> |No| AABAAAB["` |
| 71 | + Ensure ConfigMap with initial cluster = new, |
| 72 | + initial cluster peers with single member name-0, |
| 73 | + increment STS size. |
| 74 | + `"] |
| 75 | +
|
| 76 | + AABAA --> |No| AABAAB([Stop and wait, either\nit will turn ready soon\nand the next reconcile\nwill move things along,\nor user intervention is\nneeded]) |
| 77 | +
|
| 78 | + AABA --> |No| AABAB[Patch the podspec] |
| 79 | +
|
| 80 | + AAB --> |No| AABB[Create configmap, initial state new\ninitial cluster according to spec.\nreplicas, create statefulset.] |
| 81 | +``` |
0 commit comments