Skip to content

Conversation

mgencur
Copy link

@mgencur mgencur commented Aug 20, 2025

This commit introduces a complete HCP (Hosted Control Plane) backup and restore testing framework with support for both newly created and existing HostedCluster environments.

Key Features Added:

New Test Infrastructure

  • Add hcp_full_backup_restore_suite_test.go: Test suite for full HCP backup/restore scenarios with data plane
  • Support for two operational modes:
    • create: Creates new HostedCluster for testing (existing behavior)
    • external: Uses pre-existing HostedCluster with data plane

Enhanced Configuration Options

  • Add Makefile variables for HCP test configuration:
    • HC_NAME: Specifies HostedCluster name for existing mode

Improved Test Architecture

  • Refactor runHCPBackupAndRestore() function for unified handling of both modes
  • Retrieve Kubeconfig for guest cluster from the management cluster on demand (before backup, and later before restore as it changes)
  • Add guest cluster verification functions (PreBackupVerifyGuest, PostRestoreVerifyGuest)
  • Separate log gathering and DPA resource cleanup into reusable functions
  • Enhanced error handling and validation for both control plane and guest cluster

Documentation Updates

  • Add testing documentation for HCP scenarios
  • Include examples for running tests against existing HostedControlPlane
  • Document environment variable configuration options

Build System Improvements

  • Add conditional must-gather build based on SKIP_MUST_GATHER flag

Technical Implementation:

The implementation supports testing both scenarios where OADP needs to:

  1. Create a new HostedCluster and test backup/restore (existing functionality)
  2. Work with an existing HostedCluster that already has workloads and data plane

This enables comprehensive testing of HCP backup/restore functionality in realistic production-like environments where clusters already exist and contain user workloads.

🤖 PR description Generated with Claude Code and modified.

Why the changes were made

  • The test suite should support validations in data plane for hosted clusters after "restore". This can be further extended to support ROSA (there are some todos as hints in this PR) where some resources already exist in the cluster (like DataProtectionApplication)

How to test the changes made

  • See the new instructions in TESTING.md
  • The new tests only run if -hc_backup_restore_mode=external.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 20, 2025
@openshift-ci openshift-ci bot requested review from kaovilai and sseago August 20, 2025 06:50
Copy link

openshift-ci bot commented Aug 20, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mgencur
Once this PR has been reviewed and has the lgtm label, please assign sseago for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

This commit introduces a complete HCP (Hosted Control Plane) backup and restore testing framework
with support for both newly created and existing HostedCluster environments.

- Add `hcp_full_backup_restore_suite_test.go`: Complete test suite for full HCP backup/restore scenarios
- Support for two operational modes:
  - `create`: Creates new HostedCluster for testing (existing behavior)
  - `existing`: Uses pre-existing HostedCluster with data plane

- Add Makefile variables for HCP test configuration:
  - `HC_BACKUP_RESTORE_MODE`: Controls test execution mode (create/existing)
  - `HC_NAME`: Specifies HostedCluster name for existing mode
  - `HC_KUBECONFIG`: Path to guest cluster kubeconfig for existing mode
- Pass HCP configuration parameters to e2e test execution

- Refactor `runHCPBackupAndRestore()` function for unified handling of both modes
- Add guest cluster verification functions (`PreBackupVerifyGuest`, `PostRestoreVerifyGuest`)
- Separate log gathering and DPA resource cleanup into reusable functions
- Enhanced error handling and validation for both control plane and guest cluster

- Add support for kubeconfig-based guest cluster operations
- Implement pre/post backup verification for guest cluster resources
- Add namespace creation/validation tests for guest cluster functionality

- Add `GetHostedCluster()` method to retrieve existing HostedCluster objects
- Add `ClientGuest` field to `HCHandler` for guest cluster operations
- Improve error message formatting in DPA helpers

- Add comprehensive testing documentation for HCP scenarios
- Include examples for running tests against existing HostedControlPlane
- Document environment variable configuration options

- Add conditional must-gather build based on `SKIP_MUST_GATHER` flag
- Enhanced e2e test parameter passing for HCP configurations

The implementation supports testing both scenarios where OADP needs to:
1. Create a new HostedCluster and test backup/restore (existing functionality)
2. Work with an existing HostedCluster that already has workloads and data plane

This enables comprehensive testing of HCP backup/restore functionality in realistic
production-like environments where clusters already exist and contain user workloads.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 20, 2025
@@ -0,0 +1,94 @@
package e2e_test
Copy link
Author

@mgencur mgencur Aug 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you suggest a better name for this file? Maybe hcp_external_cluster_backup_restore_suite_test.go ?
Or hcp_existing_cluster_backup_restore_suite_test.go

func runHCPBackupAndRestore(brCase HCPBackupRestoreCase, updateLastBRcase func(brCase HCPBackupRestoreCase), h *libhcp.HCHandler) {
const (
HCModeCreate HCBackupRestoreMode = "create" // Create new HostedCluster for test
HCModeExisting HCBackupRestoreMode = "existing" // Get existing HostedCluster
Copy link
Author

@mgencur mgencur Aug 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this could be external instead of existing...

@mgencur
Copy link
Author

mgencur commented Aug 20, 2025

The work can be further extended for testing with ROSA, here's the initial draft: https://github.com/mgencur/oadp-operator/pull/1/files (just showing the idea, I sent this PR against my own branch).

Copy link
Contributor

@jparrill jparrill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the PR still on going? I'm missing some verifications at multiple levels. In general looks good apart from the comments mentioned in the review.

Said that, I would put these new tests under a different flag, something like TEST_HCP_NEW to avoid impact the current tests until we have them properly set.

Makefile Outdated
Comment on lines 782 to 784
-hc_backup_restore_mode=$(HC_BACKUP_RESTORE_MODE) \
-hc_name=$(HC_NAME) \
-hc_kubeconfig=$(HC_KUBECONFIG) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would isolate those 3 new flags, and bring those up only when the TEST_HCP == true. This way we would not affect the other tests reqs

}
}

func postBackupVerifyGuest() VerificationFunctionGuest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those ones will be reused, adding more checks? or the verification is just that?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This is just the beginning. It's basically a smoke test for the guest cluster. We can add checks over time, in separate PRs.

Are the PR still on going? I'm missing some verifications at multiple levels.

What other verifications are you missing? Just curious if it's specific to the guest cluster or something more general.

@weshayutin
Copy link
Contributor

@kaovilai FYI

- Replace HC_BACKUP_RESTORE_MODE with TEST_HCP_EXTERNAL flag
- Rename "existing" mode to "external" for clarity
- Move HCP external test args to separate HCP_EXTERNAL_ARGS variable
- Rename hcp_full_backup_restore_suite_test.go to hcp_external_cluster_backup_restore_suite_test.go
- Update test labels from "hcp" to "hcp_external" for external cluster tests
- Simplify Makefile by removing unused HC mode variables from main test-e2e target
- Update documentation to reflect new external cluster test configuration
@mgencur
Copy link
Author

mgencur commented Aug 21, 2025

@jparrill , tried to address your review comments.

@mgencur
Copy link
Author

mgencur commented Aug 21, 2025

/hold
After doing restore, I will need to get a new kubeconfig for the guest cluster because the old one doesn't work anymore.
Let me figure this out.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 21, 2025
mgencur and others added 2 commits August 21, 2025 14:08
…rieval

- Remove HC_KUBECONFIG flag and related global variables from test suite
- Remove hardcoded crClientForHC global client initialization
- Add GetHostedClusterKubeconfig() method to dynamically retrieve kubeconfig from HostedCluster status
- Update pre/post backup verification to create client on-demand using retrieved kubeconfig
- Clean up Makefile to remove HC_KUBECONFIG parameter handling
- Simplify HCHandler by removing ClientGuest field

This change improves test reliability by ensuring the guest cluster client
is always created with the current kubeconfig rather than relying on
potentially stale configuration passed via flags.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@mgencur
Copy link
Author

mgencur commented Aug 21, 2025

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 21, 2025
Copy link

openshift-ci bot commented Aug 21, 2025

@mgencur: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/4.20-e2e-test-hcp-aws 94f5f0b link false /test 4.20-e2e-test-hcp-aws

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants