Skip to content

Add OADP toolset for managing Velero backups and restores#122

Open
shubham-pampattiwar wants to merge 6 commits intoopenshift:mainfrom
shubham-pampattiwar:feature/oadp-toolset
Open

Add OADP toolset for managing Velero backups and restores#122
shubham-pampattiwar wants to merge 6 commits intoopenshift:mainfrom
shubham-pampattiwar:feature/oadp-toolset

Conversation

@shubham-pampattiwar
Copy link
Member

@shubham-pampattiwar shubham-pampattiwar commented Jan 29, 2026

Summary

Adds OADP (OpenShift API for Data Protection) toolset to the kubernetes-mcp-server, enabling AI agents to create, monitor, and manage data protection workflows.

Tools (8 consolidated action-based tools)

Following the kubevirt pattern, each tool uses an action parameter to select the operation:

Tool Actions Description
oadp_backup list, get, create, delete, logs Manage Velero backups
oadp_restore list, get, create, delete, logs Manage Velero restores
oadp_schedule list, get, create, delete, pause, unpause Manage backup schedules
oadp_storage_location list, get View BSL and VSL storage locations
oadp_dpa list, get View DataProtectionApplication status
oadp_repository list, get View backup repositories
oadp_data_mover list, get View DataUpload/DataDownload status
oadp_data_protection_test list, get, create, delete Manage data protection tests

Example Usage

// List all backups
{"name": "oadp_backup", "arguments": {"action": "list"}}

// Create a backup
{"name": "oadp_backup", "arguments": {"action": "create", "name": "my-backup", "includedNamespaces": ["app-ns"]}}

// Get backup logs
{"name": "oadp_backup", "arguments": {"action": "logs", "name": "my-backup"}}

// Create a scheduled backup
{"name": "oadp_schedule", "arguments": {"action": "create", "name": "daily-backup", "schedule": "0 3 * * *", "includedNamespaces": ["app-ns"]}}

Test plan

  • Unit tests pass (go test -v ./pkg/toolsets/oadp/...)
  • Build passes with no lint errors (make build && make lint)
  • Tested with mcp-inspector against live OpenShift cluster with OADP installed
  • 8/8 evals pass with 24/24 assertions (mcpchecker)

Changes in this PR

  • Add OADP toolset with 8 consolidated tools using action parameter pattern
  • Add 8 eval tasks for testing OADP functionality
  • Update README and docs/OADP.md documentation

@openshift-ci openshift-ci bot requested review from Cali0707 and matzew January 29, 2026 07:28
@matzew
Copy link
Member

matzew commented Jan 29, 2026

/assign matzew

@weshayutin
Copy link

woot! thanks @shubham-pampattiwar

@Cali0707
Copy link

/assign @Cali0707

@Cali0707
Copy link

@shubham-pampattiwar thanks for contributing a new toolset! A few thoughts:

Provides 90 tools covering all 23 CRDs shipped by OADP

Our experience with the core toolset and others so far is that in general we want to try to minimize the number of tools in the server, so that we don't overwhelm the model's context. E.g. cursor will return a warning to users when somewhere in the realm of 20-60 tools are loaded (and this is providing 90!)

IMO we should be working to drastically reduce the number of tools provided by this toolset. A few things that can likely help:

  1. For any CRUD operations on CRs, there is the resources_<create|update|delete|list> tools from the core toolset, that can likely replace all your CR CRUD tools. Maybe we need to provide the model with more context on how to use your CRs specifically, but that is something we should look for in evals
  2. For operations that involve more than a simple CRUD operation against the kube API server, or which needs to use verbs not exposed through the generic resource tools (I'm thinking something around these ones: oadp_data_upload_list/get/cancel, oadp_data_download_list/get/cancel, specifically the cancel bit), let's try to make the verbs part of the parameters of the tool.

For example, in the kubevirt toolset they made the action to be done to a VM a parameter of the tool, rather than having one tool for each action:

"action": {
Type: "string",
Enum: []any{string(ActionStart), string(ActionStop), string(ActionRestart)},
Description: "The lifecycle action to perform: 'start' (changes runStrategy to Always), 'stop' (changes runStrategy to Halted), or 'restart' (stops then starts the VM)",
},

Happy to discuss further on how to reduce the count of the tools while still ensuring agents can take all the actions they need to!

@shubham-pampattiwar
Copy link
Member Author

@shubham-pampattiwar thanks for contributing a new toolset! A few thoughts:

Provides 90 tools covering all 23 CRDs shipped by OADP

Our experience with the core toolset and others so far is that in general we want to try to minimize the number of tools in the server, so that we don't overwhelm the model's context. E.g. cursor will return a warning to users when somewhere in the realm of 20-60 tools are loaded (and this is providing 90!)

IMO we should be working to drastically reduce the number of tools provided by this toolset. A few things that can likely help:

  1. For any CRUD operations on CRs, there is the resources_<create|update|delete|list> tools from the core toolset, that can likely replace all your CR CRUD tools. Maybe we need to provide the model with more context on how to use your CRs specifically, but that is something we should look for in evals
  2. For operations that involve more than a simple CRUD operation against the kube API server, or which needs to use verbs not exposed through the generic resource tools (I'm thinking something around these ones: oadp_data_upload_list/get/cancel, oadp_data_download_list/get/cancel, specifically the cancel bit), let's try to make the verbs part of the parameters of the tool.

For example, in the kubevirt toolset they made the action to be done to a VM a parameter of the tool, rather than having one tool for each action:

"action": {
Type: "string",
Enum: []any{string(ActionStart), string(ActionStop), string(ActionRestart)},
Description: "The lifecycle action to perform: 'start' (changes runStrategy to Always), 'stop' (changes runStrategy to Halted), or 'restart' (stops then starts the VM)",
},

Happy to discuss further on how to reduce the count of the tools while still ensuring agents can take all the actions they need to!

Hey @Cali0707, thanks for the feedback! Makes sense - 90 tools is definitely too many.

Here's my plan to reduce to ~9 tools using the action parameter pattern like kubevirt:

┌───────────────────────┬─────────────────────────────┬──────────────────────────────────────────┐                                                                       
│         Tool          │          Resources          │                 Actions                  │                                                                       
├───────────────────────┼─────────────────────────────┼──────────────────────────────────────────┤                                                                       
│ oadp_backup           │ Backup                      │ list, get, create, delete, logs          │                                                                       
├───────────────────────┼─────────────────────────────┼──────────────────────────────────────────┤                                                                       
│ oadp_restore          │ Restore                     │ list, get, create, delete, logs          │                                                                       
├───────────────────────┼─────────────────────────────┼──────────────────────────────────────────┤                                                                       
│ oadp_schedule         │ Schedule                    │ list, get, create, update, delete, pause │                                                                       
├───────────────────────┼─────────────────────────────┼──────────────────────────────────────────┤                                                                       
│ oadp_dpa              │ DataProtectionApplication   │ list, get, create, update, delete        │                                                                       
├───────────────────────┼─────────────────────────────┼──────────────────────────────────────────┤                                                                       
│ oadp_storage_location │ BSL + VSL (type param)      │ list, get, create, update, delete        │                                                                       
├───────────────────────┼─────────────────────────────┼──────────────────────────────────────────┤                                                                       
│ oadp_data_mover       │ DataUpload + DataDownload   │ list, get, cancel                        │                                                                       
├───────────────────────┼─────────────────────────────┼──────────────────────────────────────────┤                                                                       
│ oadp_repository       │ BackupRepository            │ list, get, delete                        │                                                                       
├───────────────────────┼─────────────────────────────┼──────────────────────────────────────────┤                                                                       
│ oadp_non_admin        │ All NonAdmin* resources     │ list, get, create, delete, approve       │                                                                       
├───────────────────────┼─────────────────────────────┼──────────────────────────────────────────┤                                                                       
│ oadp_vm_restore       │ VM discovery + file restore │ list, get, create, delete                │                                                                       
└───────────────────────┴─────────────────────────────┴──────────────────────────────────────────┘                                                                       

Less common CRDs (DeleteBackupRequest, DownloadRequest, ServerStatusRequest, PodVolumeBackup/Restore) can use generic resources_* tools.

Let me know if this approach works!

@matzew
Copy link
Member

matzew commented Jan 30, 2026

Reducing the number of tools sounds like a good idea, as well leveraging core tools (e.g. resources_*) for generic operations seem good.

A good idea is to also look at providing evals, like other toolsets (kubevirt for instance)

We use mcpchecker, which now has also quickstarts for get going!

https://github.com/mcpchecker

@openshift-ci
Copy link

openshift-ci bot commented Feb 3, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: shubham-pampattiwar
Once this PR has been reviewed and has the lgtm label, please ask for approval from cali0707. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@shubham-pampattiwar
Copy link
Member Author

Addressed review feedback - refactored from 90 individual tools to 8 consolidated tools using the action parameter pattern (like kubevirt).

Changes:

  • oadp_backup - actions: list, get, create, delete, logs
  • oadp_restore - actions: list, get, create, delete, logs
  • oadp_schedule - actions: list, get, create, delete, pause, unpause
  • oadp_storage_location - actions: list, get
  • oadp_dpa - actions: list, get
  • oadp_repository - actions: list, get
  • oadp_data_mover - actions: list, get
  • oadp_data_protection_test - actions: list, get, create, delete

Also added 8 eval tasks - all passing (8/8 tasks, 24/24 assertions).

Ready for re-review @Cali0707 @matzew

@shubham-pampattiwar
Copy link
Member Author

Gentle ping @Cali0707 @matzew - any feedback on the refactored approach?

@@ -0,0 +1,134 @@
package oadp

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, nothing in this file is referenced in the toolset - maybe we can remove?

@@ -0,0 +1,55 @@
package oadp

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, nothing in this file is referenced in the toolset - maybe we can remove?

@@ -0,0 +1,33 @@
package oadp

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, nothing in this file is referenced in the toolset - maybe we can remove?

@@ -0,0 +1,33 @@
package oadp

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, nothing in this file is referenced in the toolset - maybe we can remove?

@@ -0,0 +1,29 @@
package oadp

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, nothing in this file is referenced in the toolset - maybe we can remove?

}

// GetTools returns all tools provided by this toolset.
// The toolset provides 10 consolidated tools covering all OADP CRDs:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The toolset provides 10 consolidated tools covering all OADP CRDs:
// The toolset provides 8 consolidated tools covering all OADP CRDs:

Comment on lines +59 to +61
// GetBackupLogs retrieves backup logs
// Note: In a real implementation, this would create a DownloadRequest and fetch logs from object storage
// For now, we return the backup's status information as a simplified version

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell (and from this comment) - this does not actually retrieve logs. IMO this would confuse the agent, as it is executing a "logs" action. So, we should do one of:

  1. remove the logs action
  2. rename the logs action to something more intuitive
  3. Actually retrieve the logs here


// GetRestoreLogs retrieves restore logs
// Note: Similar to backup logs, this returns status information
func GetRestoreLogs(ctx context.Context, client dynamic.Interface, namespace, name string) (string, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell (and from this comment) - this does not actually retrieve logs. IMO this would confuse the agent, as it is executing a "logs" action. So, we should do one of:

  1. remove the logs action
  2. rename the logs action to something more intuitive
  3. Actually retrieve the logs here

Comment on lines +112 to +115
namespace := oadp.DefaultOADPNamespace
if v, ok := params.GetArguments()["namespace"].(string); ok && v != "" {
namespace = v
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some helpers for retrieving string/bool values from the params: api.OptionalString(params, "namespace", oadp.DefaultOADPNamespace) - maybe use those here (and on other tools) ?

Comment on lines +253 to +269
// parseLabelSelector parses a label selector string like "app=myapp,env=prod" into a map
func parseLabelSelector(selector string) map[string]string {
result := make(map[string]string)
if selector == "" {
return result
}

pairs := splitIgnoreEmpty(selector, ',')
for _, pair := range pairs {
kv := splitIgnoreEmpty(pair, '=')
if len(kv) == 2 {
result[kv[0]] = kv[1]
}
}
return result
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to be dropping any non equality selector expressions - maybe we should do either:

  1. return errors when we see a non equality selector (so that agent is informed that some of the selector is not being considered)
  2. use full selector syntax support here

@Cali0707
Copy link

@shubham-pampattiwar the toolset looks much better now! I left a few comments around the code, but overall this is looking great

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 10, 2026
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 11, 2026
@shubham-pampattiwar
Copy link
Member Author

Addressed review comments:

Removed unused files:

  • pkg/oadp/nonadmin.go, vmrestore.go, cloudstorage.go, downloadrequest.go, podvolume.go, serverstatus.go, deletebackuprequest.go (and tests)

Code changes:

  • Renamed logs action to status for backup/restore tools (returns status info, not actual logs - avoids confusing the agent)
  • Using api.OptionalString() helper instead of manual param parsing
  • Fixed parseLabelSelector to return error on non-equality selectors (e.g., !=, in, notin)
  • Fixed comment: 8 tools instead of 10

Rebased on main to resolve merge conflict with netedge toolset.

All tests passing.

@shubham-pampattiwar
Copy link
Member Author

Ready for re-review @Cali0707 @matzew - addressed all comments.

@shubham-pampattiwar
Copy link
Member Author

shubham-pampattiwar commented Feb 26, 2026

Hi @Cali0707 @matzew - friendly follow-up on this PR. All review comments have been addressed. Would appreciate a re-review when you get a chance. Thanks!

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 26, 2026
This adds an OADP (OpenShift API for Data Protection) toolset that
enables AI agents to manage backup and restore operations on OpenShift
clusters through the MCP server.

New tools (21 total):
- Backup: list, get, create, delete, logs
- Restore: list, get, create, delete, logs
- Schedule: list, get, create, delete, pause
- Storage Locations: BSL and VSL list/get
- DataProtectionApplication: list, get

Fixes: OADP-7194
Add documentation for the OADP (OpenShift API for Data Protection)
toolset including all 21 tools for managing Velero backups, restores,
schedules, storage locations, and DataProtectionApplications.
Complete OADP toolset implementation covering all CRDs shipped by OADP:

Velero v1 CRDs:
- BackupRepository: list, get, delete
- DeleteBackupRequest: list, get
- DownloadRequest: list, get, create, delete
- PodVolumeBackup: list, get
- PodVolumeRestore: list, get
- ServerStatusRequest: list, get, create, delete

Velero v2alpha1 CRDs:
- DataUpload: list, get, cancel
- DataDownload: list, get, cancel

OADP CRDs:
- CloudStorage: list, get, create, delete
- DataProtectionTest: list, get, create, delete
- DPA: create, update, delete (added to existing list, get)

NAC CRDs:
- NonAdminBackup: list, get, create, delete
- NonAdminRestore: list, get, create, delete
- NonAdminBackupStorageLocation: list, get, create, update, delete
- NonAdminBackupStorageLocationRequest: list, get, approve
- NonAdminDownloadRequest: list, get, create, delete

VM CRDs:
- VirtualMachineBackupsDiscovery: list, get, create, delete
- VirtualMachineFileRestore: list, get, create, delete

Existing CRD updates:
- BSL: create, update, delete (added to existing list, get)
- VSL: create, update, delete (added to existing list, get)
- Schedule: update (added to existing list, get, create, delete, pause)

Includes comprehensive unit tests for all CRUD operations.
- Update README.md OADP section with all 90 tools covering 23 CRDs
- Add docs/OADP.md with detailed toolset documentation
- Update docs/README.md to include OADP and Observability links
Per PR review feedback, refactor OADP tools to use action parameter
pattern (like kubevirt) instead of individual tools per operation.

Changes:
- Consolidate 90 individual tools into 8 action-based tools:
  - oadp_backup (list, get, create, delete, logs)
  - oadp_restore (list, get, create, delete, logs)
  - oadp_schedule (list, get, create, delete, pause, unpause)
  - oadp_storage_location (list, get for BSL and VSL)
  - oadp_dpa (list, get)
  - oadp_repository (list, get)
  - oadp_data_mover (list, get for uploads/downloads)
  - oadp_data_protection_test (list, get, create, delete)
- Add 8 eval tasks for OADP toolset testing
- Update documentation to reflect new tool structure
- Fix eval scripts to use explicit Velero API groups
  (backup.velero.io, restore.velero.io, schedule.velero.io)
  to avoid conflicts with OpenShift built-in resources

All 8/8 evals pass with 24/24 assertions.
- Remove unused pkg/oadp/ files (nonadmin, vmrestore, cloudstorage,
  downloadrequest, podvolume, serverstatus, deletebackuprequest)
- Rename 'logs' action to 'status' for backup/restore tools
  (returns status info, not actual logs)
- Use api.OptionalString helper instead of manual param parsing
- Fix parseLabelSelector to return error on non-equality selectors
- Update comment: 8 tools instead of 10
- Update tests to expect 'status' action
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 26, 2026
@openshift-ci
Copy link

openshift-ci bot commented Feb 26, 2026

@shubham-pampattiwar: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@Cali0707
Copy link

@shubham-pampattiwar I'm curious - did you run the evals with just the core toolset enabled vs. this new toolset? Most of the tools seem to be CRUD tools, and on frontier models they can often succeed with the generic resource tools for many domains (but not all, which is why evals are so useful)

Annotations: api.ToolAnnotations{
Title: "OADP: Backup",
ReadOnlyHint: ptr.To(false),
DestructiveHint: ptr.To(false),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this be true since there could be a delete action?

Annotations: api.ToolAnnotations{
Title: "OADP: Backup Repository",
ReadOnlyHint: ptr.To(false),
DestructiveHint: ptr.To(false),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this be true since there could be a delete action?

Annotations: api.ToolAnnotations{
Title: "OADP: Data Mover",
ReadOnlyHint: ptr.To(false),
DestructiveHint: ptr.To(false),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this be true since there could be a cancel action?


| Tool | CRDs Covered | Actions |
|------|--------------|---------|
| `oadp_backup` | Backup | list, get, create, delete, logs |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this got switched to status (probably relevant in other places in the docs too)

Suggested change
| `oadp_backup` | Backup | list, get, create, delete, logs |
| `oadp_backup` | Backup | list, get, create, delete, status |

Comment on lines +51 to +55
s.Run("returns empty list when no backup repositories exist", func() {
list, err := ListBackupRepositories(s.ctx, s.client, DefaultOADPNamespace, metav1.ListOptions{})
s.NoError(err)
s.Empty(list.Items)
})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may break in the future, since the objects created within one s.Run are persisted within the test - so this would only work when run first

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth verifying the rest of the unit tests that are doing similar as well

DefaultOADPNamespace = "openshift-adp"
)

var (

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think all of the GVRs defined here are currently being used, maybe we can clean up the unused ones?

},
"labelSelector": {
Type: "string",
Description: "Label selector to filter backups (for list action)",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we seem to only support equality based labels (given the parse function included here), should we add something to this description?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants