Skip to content

K8SPXC-1800 | run prepare job for PITR#2449

Open
mayankshah1607 wants to merge 6 commits intomainfrom
K8SPXC-1800
Open

K8SPXC-1800 | run prepare job for PITR#2449
mayankshah1607 wants to merge 6 commits intomainfrom
K8SPXC-1800

Conversation

@mayankshah1607
Copy link
Copy Markdown
Member

@mayankshah1607 mayankshah1607 commented Apr 29, 2026

CHANGE DESCRIPTION

Problem:
Prepare job does not run when PITR is specified. As a result, PXC pods do not come up for cross-cluster replication or if the passwords are changed since the backup.

Cause:
Prepare job is skipped in PITR step.

Solution:
Run prepare job for PITR

CHECKLIST

Jira

  • Is the Jira ticket created and referenced properly?
  • Does the Jira ticket have the proper statuses for documentation (Needs Doc) and QA (Needs QA)?
  • Does the Jira ticket link to the proper milestone (Fix Version field)?

Tests

  • Is an E2E test/test case added for the new feature/change?
  • Are unit tests added where appropriate?
  • Are OpenShift compare files changed for E2E tests (compare/*-oc.yml)?

Config/Logging/Testability

  • Are all needed new/changed options added to default YAML files?
  • Are all needed new/changed options added to the Helm Chart?
  • Did we add proper logging messages for operator actions?
  • Did we ensure compatibility with the previous version or cluster upgrade process?
  • Does the change support oldest and newest supported PXC version?
  • Does the change support oldest and newest supported Kubernetes version?

Signed-off-by: Mayank Shah <mayank.shah@percona.com>
Signed-off-by: Mayank Shah <mayank.shah@percona.com>
@pull-request-size pull-request-size Bot added the size/L 100-499 lines label Apr 29, 2026
Comment thread e2e-tests/pitr/run
echo ${gtid}
}


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[shfmt] reported by reviewdog 🐶

Suggested change

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the PXC restore controller so PITR restores run the “prepare cluster” job (needed for cross-cluster replication and password-changed backups), and extends PITR E2E coverage to exercise restore-from-source onto a freshly recreated cluster with changed system user passwords.

Changes:

  • Move PITR execution to occur after the “Preparing Cluster” stage by running it from the RestorePrepareCluster state handler.
  • Extend PITR E2E scripts to support dynamic root passwords and add a new case restoring onto a recreated cluster.
  • Update PITR test cluster manifest finalizers to ensure PVC cleanup during cluster deletion/recreate flows.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
pkg/controller/pxcrestore/controller.go Adjust restore state machine so PITR is triggered after the prepare job completes.
e2e-tests/pitr/run Add password randomization helper and a new PITR restore-on-new-cluster scenario.
e2e-tests/pitr/conf/pitr.yml Add PVC-deletion finalizers to support delete/recreate in PITR E2E.
e2e-tests/functions Pass root password into PITR recovery checks and wait for new intermediate restore states.
Comments suppressed due to low confidence (1)

pkg/controller/pxcrestore/controller.go:355

  • PITR restore for clusters with version < 1.18.0 appears to be broken: PITR job creation/state transition was moved into reconcileStatePrepareCluster, but reconcileStateRestore routes <1.18.0 restores directly to RestoreStartCluster (skipping RestorePrepareCluster), so RestorePITR is never reached and the PITR job won't run. Consider keeping the PITR branching in reconcileStateRestore for <1.18.0 (or otherwise ensuring PITR flows still create/run the PITR job when PrepareCluster is skipped).
	if cluster.CompareVersionWith("1.18.0") >= 0 {
		log.Info("preparing cluster", "cluster", cr.Spec.PXCCluster)
		cr.Status.State = api.RestorePrepareCluster
	} else {
		log.Info("starting cluster", "cluster", cr.Spec.PXCCluster)
		cr.Status.State = api.RestoreStartCluster
	}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread e2e-tests/functions
Comment on lines +2282 to 2286
wait_backup_restore ${restore} "Restoring"

wait_backup_restore ${restore} "Preparing Cluster"

wait_backup_restore ${restore} "Point-in-time recovering"
Comment thread e2e-tests/pitr/run
Comment on lines +281 to +283
desc "[CASE 4] delete PXC cluster ${cluster} and wait for PVCs"
kubectl_bin delete pxc ${cluster}
desc "[CASE 4] wait for PVCs (app.kubernetes.io/instance=${cluster}) to be deleted"
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubectl delete blocks until the object is gone

Signed-off-by: Mayank Shah <mayank.shah@percona.com>
@JNKPercona
Copy link
Copy Markdown
Collaborator

Test Name Result Time
auto-tuning-8-0 passed 00:00:00
allocator-8-0 passed 00:00:00
allocator-8-4 passed 00:00:00
backup-storage-tls-8-0 passed 00:00:00
cross-site-8-0 passed 00:00:00
cross-site-proxysql-8-0 passed 00:00:00
cross-site-proxysql-8-4 passed 00:00:00
custom-users-8-0 passed 00:00:00
demand-backup-cloud-8-0 passed 00:00:00
demand-backup-cloud-8-4 passed 00:00:00
demand-backup-cloud-pxb-8-0 passed 00:00:00
demand-backup-encrypted-with-tls-5-7 passed 00:00:00
demand-backup-encrypted-with-tls-8-0 passed 00:00:00
demand-backup-encrypted-with-tls-8-4 passed 00:00:00
demand-backup-encrypted-with-tls-pxb-5-7 passed 00:00:00
demand-backup-encrypted-with-tls-pxb-8-0 passed 00:00:00
demand-backup-encrypted-with-tls-pxb-8-4 passed 00:00:00
demand-backup-8-0 passed 00:00:00
demand-backup-flow-control-8-0 passed 00:00:00
demand-backup-flow-control-8-4 passed 00:00:00
demand-backup-parallel-8-0 passed 00:00:00
demand-backup-parallel-8-4 passed 00:00:00
demand-backup-without-passwords-8-0 passed 00:00:00
demand-backup-without-passwords-8-4 passed 00:00:00
extra-pvc-8-0 passed 00:00:00
haproxy-5-7 passed 00:00:00
haproxy-8-0 passed 00:00:00
haproxy-8-4 passed 00:00:00
init-deploy-5-7 passed 00:00:00
init-deploy-8-0 passed 00:00:00
limits-8-0 passed 00:00:00
monitoring-2-0-8-0 passed 00:00:00
monitoring-pmm3-8-0 passed 00:00:00
monitoring-pmm3-8-4 passed 00:00:00
one-pod-5-7 passed 00:00:00
one-pod-8-0 passed 00:00:00
pitr-8-0 passed 00:00:00
pitr-8-4 passed 00:00:00
pitr-pxb-8-0 passed 00:00:00
pitr-pxb-8-4 passed 00:00:00
pitr-gap-errors-8-0 passed 00:00:00
pitr-gap-errors-8-4 passed 00:00:00
proxy-protocol-8-0 passed 00:00:00
proxy-switch-8-0 passed 00:00:00
proxysql-sidecar-res-limits-8-0 passed 00:00:00
proxysql-scheduler-8-0 passed 00:00:00
pvc-resize-5-7 passed 00:00:00
pvc-resize-8-0 passed 00:00:00
recreate-8-0 passed 00:00:00
restore-to-encrypted-cluster-8-0 passed 00:00:00
restore-to-encrypted-cluster-8-4 passed 00:00:00
restore-to-encrypted-cluster-pxb-8-0 passed 00:00:00
restore-to-encrypted-cluster-pxb-8-4 passed 00:00:00
scaling-proxysql-8-0 passed 00:00:00
scaling-8-0 passed 00:00:00
scheduled-backup-5-7 passed 00:00:00
scheduled-backup-8-0 passed 00:00:00
scheduled-backup-8-4 passed 00:00:00
security-context-8-0 passed 00:00:00
smart-update1-8-0 passed 00:00:00
smart-update1-8-4 passed 00:00:00
smart-update2-8-0 passed 00:00:00
smart-update2-8-4 passed 00:00:00
smart-update3-8-0 passed 00:00:00
sst-retry-limit-8-0 passed 00:00:00
sst-retry-limit-8-4 passed 00:00:00
storage-8-0 passed 00:00:00
tls-issue-cert-manager-ref-8-0 passed 00:00:00
tls-issue-cert-manager-8-0 passed 00:00:00
tls-issue-self-8-0 passed 00:00:00
upgrade-consistency-8-0 passed 00:00:00
upgrade-consistency-8-4 passed 00:00:00
upgrade-haproxy-5-7 passed 00:00:00
upgrade-haproxy-8-0 passed 00:00:00
upgrade-proxysql-5-7 passed 00:00:00
upgrade-proxysql-8-0 passed 00:00:00
users-5-7 passed 00:27:30
users-8-0 passed 00:00:00
users-scheduler-8-4 passed 00:00:00
validation-hook-8-0 passed 00:00:00
Summary Value
Tests Run 80/80
Job Duration 00:51:23
Total Test Time 00:27:30

commit: b47912f
image: perconalab/percona-xtradb-cluster-operator:PR-2449-b47912f7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L 100-499 lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants