-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
- Which image of the operator are you using? ghcr.io/zalando/postgres-operator:v1.12.2
- Where do you run it - cloud or metal? Kubernetes or OpenShift? Cloud, K8s, GCP & AWS
- Are you running Postgres Operator in production? yes
- Type of issue? Bug report
We encountered an issue where when we defined two labels in the node_readiness_label
the postgresql cluster's with empty nodeAffinity constantly showed a drift in the postgresql statefulset and triggered a recreation of the STS and switchover of the database.
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"info","msg":"reason: new statefulset's pod affinity does not match the current one","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"replacing statefulset","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
here's the full debug log:
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"info","msg":"statefulset platform-harbor/ccs-harbor-postgres is not in the desired state and needs to be updated","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"- terminationMessagePath: /dev/termination-log,","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"- terminationMessagePolicy: File,","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"- terminationMessagePath: /dev/termination-log,","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"- terminationMessagePolicy: File,","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"- restartPolicy: Always,","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"- dnsPolicy: ClusterFirst,","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"- serviceAccount: postgres-pod,","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"- key: postgres_ready,","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"+ key: nodepool,","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"- true","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"+ platform","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"- key: nodepool,","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"+ key: postgres_ready,","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"- platform","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"+ true","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"- schedulerName: default-scheduler,","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"- kind: PersistentVolumeClaim,","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"- apiVersion: v1,","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"- status: {","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"- phase: Pending","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"- }","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"+ status: {}","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"- revisionHistoryLimit: 10,","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
The above was reproduced on multiple clusters & postgres instances with the following setup
OperatorConfiguration
:
configuration:
kubernetes:
node_readiness_label:
nodepool: "platform"
postgres_ready: "true"
the cluster spec is very basic without spec.nodeAffinity
the produced StatefulSet appeared to be correct:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: nodepool
operator: In
values:
- platform
- key: postgres_ready
operator: In
values:
- "true"
however per the log above, the operator constantly detects a diff and triggers a recreation of the postgres cluster STS, from what I can tell it seem to want to flip the order of the keys in the match expression:
remove key: postgres_ready
in first position
add key: nodepool
in first position
remove key: nodepool
in second position
add key: postgres_ready
in second position
which is the order of how the keys are defined in node_readiness_label
? but I might be misreading the debug log on this.
we currently worked around this issue by removing the nodepool=platform
label from node_readiness_label
and defined it explicitly in the postgresql spec nodeAffinity field, this solved the issue for us.