Skip to content

Multiple labels in node_readiness_label & empty postgresql spec nodeAffinity triggers a perma-diff in cluster statefulset #2931

@dkulchinsky

Description

@dkulchinsky
  • Which image of the operator are you using? ghcr.io/zalando/postgres-operator:v1.12.2
  • Where do you run it - cloud or metal? Kubernetes or OpenShift? Cloud, K8s, GCP & AWS
  • Are you running Postgres Operator in production? yes
  • Type of issue? Bug report

We encountered an issue where when we defined two labels in the node_readiness_label the postgresql cluster's with empty nodeAffinity constantly showed a drift in the postgresql statefulset and triggered a recreation of the STS and switchover of the database.

postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"info","msg":"reason: new statefulset's pod affinity does not match the current one","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"replacing statefulset","pkg":"cluster","time":"2025-07-01T22:34:45Z"}

here's the full debug log:

postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"info","msg":"statefulset platform-harbor/ccs-harbor-postgres is not in the desired state and needs to be updated","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"-          terminationMessagePath: /dev/termination-log,","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"-          terminationMessagePolicy: File,","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"-          terminationMessagePath: /dev/termination-log,","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"-          terminationMessagePolicy: File,","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"-      restartPolicy: Always,","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"-      dnsPolicy: ClusterFirst,","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"-      serviceAccount: postgres-pod,","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"-                    key: postgres_ready,","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"+                    key: nodepool,","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"-                      true","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"+                      platform","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"-                    key: nodepool,","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"+                    key: postgres_ready,","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"-                      platform","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"+                      true","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"-      schedulerName: default-scheduler,","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"-      kind: PersistentVolumeClaim,","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"-      apiVersion: v1,","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"-      status: {","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"-        phase: Pending","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"-      }","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"+      status: {}","pkg":"cluster","time":"2025-07-01T22:34:45Z"}
postgres-operator-5cc587dcd5-bxdct postgres-operator {"cluster-name":"platform-harbor/ccs-harbor-postgres","level":"debug","msg":"-  revisionHistoryLimit: 10,","pkg":"cluster","time":"2025-07-01T22:34:45Z"}

The above was reproduced on multiple clusters & postgres instances with the following setup

OperatorConfiguration:

configuration:
  kubernetes:
    node_readiness_label:
      nodepool: "platform"
      postgres_ready: "true"

the cluster spec is very basic without spec.nodeAffinity

the produced StatefulSet appeared to be correct:

      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: nodepool
                operator: In
                values:
                - platform
              - key: postgres_ready
                operator: In
                values:
                - "true"

however per the log above, the operator constantly detects a diff and triggers a recreation of the postgres cluster STS, from what I can tell it seem to want to flip the order of the keys in the match expression:

remove key: postgres_ready in first position
add key: nodepool in first position
remove key: nodepool in second position
add key: postgres_ready in second position

which is the order of how the keys are defined in node_readiness_label? but I might be misreading the debug log on this.

we currently worked around this issue by removing the nodepool=platform label from node_readiness_label and defined it explicitly in the postgresql spec nodeAffinity field, this solved the issue for us.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions