Skip to content

Conversation

gman0
Copy link
Contributor

@gman0 gman0 commented Oct 3, 2025

Summary

This PR adds Virtual resources (VRs) feature, and modifies the Replication virtual workspace to be in line with VRs.

Unlike regular, CRD-backed resources, VRs don't reside in a workspace, and they don't consume additional etcd storage. They exist at a virtual workspace endpoint, and that endpoint is then aggregated with the rest of the GVRs in a workspace, and exposed like that to the client. This way, producers can project resources to consumers, and their changes to the objects are visible immediately (=as soon as the object is propagated to cache, and from there to consumer's shard and informers).

A virtual resource is consumable through the usual APIExport - APIBinding relationship:

apiVersion: apis.kcp.io/v1alpha2
kind: APIExport
metadata:
  name: cowboys-cr
spec:
  resources:
  - name: cowboys
    group: wildwest.dev
    schema: today.cowboys.wildwest.dev
    storage:
      virtual:
        reference:
          apiGroup: cache.kcp.io
          kind: CachedResourceEndpointSlice
          name: cowboys
        identityHash: cd2eb0837c9b4c962c22d2ff8b5441b7b45805887f051d39bf133b583baf6860
---
apiVersion: apis.kcp.io/v1alpha2
kind: APIBinding
metadata:
  name: cowboys-from-cache
spec:
  reference:
    export:
      path: root:provider-rw:consumer-rw
      name: cowboys-cr

The export defines a resource with storage.virtual, and refers to an endpoint slice object in reference by API group, Kind, and name of that object. Binding that export then exposes it in the consumer's workspace.

Binding

Binding a VR uses the same code as with CRD-storage resources: a bound CRD is created from the APIResourceSchema, and that CRD must become Established before the resource shows up in discovery and can be used. The only difference is that VRs don't set any APIBinding.Status.BoundResources[].StorageVersions[] -- this is the only user-visible way to distinguish VRs from other resources. It was necessary to leave StorageVersions as nil so that binding deletion works correctly: the crdcleanup controller shouldn't touch the virtual resources.

Discovery and OpenAPI

Since VRs still use bound CRDs, group discovery and OpenAPI works out-of-the-box, without changes. Version discovery however required some work: see pkg/server/aggregatingcrdversiondiscovery. A new apiserver was added to handle aggregation of CRDs (system/apibinding/normal) with VRs.

This was necessary because: the apiextensions apiserver would gladly advertise all bound CRDs, even those that belong to VRs, but all APIResources would get the same verbs; this is a problem because a virtual workspaces that backs up a VR can offer a different set of verbs. AggregatingCRDVersionDiscovery lists CRDs using the apiBindingAwareCRDLister, and then modifies the resulting APIResourceList so that all entries have correct verbs.

How does it know which bound CRDs are exported as VRs? apiBindingAwareCRDLister was modified to decorate them with apis.kcp.io/schema-storage annotation. This is never persisted in etcd.

Another approach would be to modify the apiextensions apiserver to handle the discovery, but I wasn't sure how to do that cleanly without making a lot of changes. So for now, order of resolving a discovery request is like so:

  • MiniAggregator handles group discovery (no changes here)
  • AggregatingCRDVersionDiscovery handles /apis/<Group>/<Resource>:
    • Only if the group is not reserved, i.e. *.k8s.io or *.kubernetes.io. This was done to also prevent hijacking discovery for /apis/apiextensions.k8s.io/v1 which is handled by the apiextensions apiserver. I'm not completely sure this is the best way to do it.
    • The handler code is actually very similar to the one in apiextensions, only there is verbsProvider to get the correct verbs for the crd.
  • The next delegate is the VR apiserver.
  • If nothing matches, the request is delegated to apiextensions.
  • If nothing matches, NotFound handler.

Resource handling

The VR apiserver checks whether the resource is defined as VR in the export: either through digging the associated APIBinding, or through the export identity. If it is, it retrieves the endpoint URL from the endpoint slice, and proxies the request to there.

The handling VW must be able to decode this URL: <VW endpoint>:<APIExport identity>/<Path...>, e.g. https://shard-1.kcp.io:7443/services/replication/<CachedResource cluster>/<CachedResource>:<APIExport identity>/apis/wildwest.dev/v1alpha1/cowboys/namespaces/default/cowboy-1, where <APIExport identity> is the identity of the export owning the schema of the VR.

I'm not completely sure this is ok, but I didn't find a better way to handle local and wildcard requests. The export is the source of the schema (#3553 didn't work well enough), and since a VR (a CachedResource for example) can be exported by multiple exports, there must be a way to refer to a specific one in the request.

  • To match a specific APIExport against an identity, the export must define the requested GVR, and that GVR must be storage.virtual. Then the code considers it as a good match.
  • There is a caveat where there could be an export that matches, but the schema for that resource is wrong. I have the impression this is normal and expected. Is that so?

What's missing in this PR

The code is behind CacheAPIs feature flag, disabled by default. Also, at the moment the virtual workspaces server must be running as a separate process -- the tls struct expects the certs to be in opts.Extra.ShardVirtualWorkspaceCAFile, ...: probably easy to fix, but I haven't tested this properly yet.

  • Missing validation for APIExport's storage.virtual
    • To be added in an admission plugin
    • Maybe also disallow storage.virtual if the feature flag is disabled?
  • CachedResourceEndpointSlice.Spec.CachedResource is missing an optional Path in the reference
    • CachedResource objects would need an admission plugin to insert kcp.io/path annotation
  • If provider creates an object that's replicated with CachedResource in a namespace that the consumer doesn't have, that object is accessible regardless.
    • How do we want to handle this? The current behavior is obviously undesired. One option is to simply check for the ns, and serve the objects only if the consumer has it.
  • CachedResource reconciler replicates only partial metadata: bug: CachedObjects are populated with PartialObjectMetadata #3478
    • We need a new dynamic informer with full objects
  • There are changes in apiexport API
    • We'll need to coordinate version bump, conversion...

What Type of PR Is This?

/kind api-change
/kind feature

Related Issue(s)

Fixes #3487

Release Notes

* Changes in APIExport API: resource schema storage `virtual`
* Added Virtual resources support

@kcp-ci-bot kcp-ci-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API dco-signoff: yes Indicates the PR's author has signed the DCO. kind/feature Categorizes issue or PR as related to a new feature. labels Oct 3, 2025
@kcp-ci-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign embik for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kcp-ci-bot kcp-ci-bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Oct 3, 2025
@gman0 gman0 force-pushed the virtual-resources branch from 0c6691f to 992fc97 Compare October 3, 2025 03:46
@gman0
Copy link
Contributor Author

gman0 commented Oct 3, 2025

Testing

Prerequsites:

  • make test-run-sharded-server
  • Have binding/CRD for cowboys.v1alpha1.wildwest.dev ready

Producer:

  • Create a CachedResource:
    apiVersion: v1
    kind: Secret
    metadata:
      name: cowboys-cr
      namespace: default
    stringData:
      key: xxx
    ---
    apiVersion: cache.kcp.io/v1alpha1
    kind: CachedResource
    metadata:
      name: cowboys
    spec:
      group: wildwest.dev
      version: v1alpha1
      resource: cowboys
      # schema: today.cowboys.wildwest.dev
      # For easier testing...
      identity:
        secretRef:
          name: cowboys-cr
          namespace: default
  • Create an export:
    apiVersion: apis.kcp.io/v1alpha2
    kind: APIExport
    metadata:
      name: cowboys-cr
    spec:
      resources:
      - name: cowboys
        group: wildwest.dev
        schema: today.cowboys.wildwest.dev
        storage:
          virtual:
            reference:
              apiGroup: cache.kcp.io
              kind: CachedResourceEndpointSlice
              name: cowboys
            identityHash: cd2eb0837c9b4c962c22d2ff8b5441b7b45805887f051d39bf133b583baf6860
  • Create a Cowboy:
    apiVersion: wildwest.dev/v1alpha1
    kind: Cowboy
    metadata:
      name: cowgirl
    spec:
      intent: hehe

Consumer:

  • Bind the export:
    apiVersion: apis.kcp.io/v1alpha2
    kind: APIBinding
    metadata:
      name: cowboys-from-cache
    spec:
      reference:
        export:
          path: root
          name: cowboys-cr
  • kubectl api-resources, kubectl get cowboys

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces Virtual Resources (VRs), a new feature that allows resources to be projected from virtual workspaces without consuming additional etcd storage. VRs are exposed through APIExport-APIBinding relationships with virtual storage configuration instead of CRD storage.

Key changes include:

  • Addition of virtual storage configuration in APIExport API
  • New virtual resources server for handling VR requests
  • Enhanced version discovery aggregation for mixed CRD/virtual storage resources
  • Updated replication virtual workspace infrastructure

Reviewed Changes

Copilot reviewed 68 out of 68 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
test/e2e/virtualresources/cachedresources/vr_cachedresources_test.go Comprehensive E2E test for virtual resources using cached resources
test/e2e/virtual/replication/virtualworkspace_test.go Removed existing replication virtual workspace test
test/e2e/fixtures/wildwest/bootstrap.go Added CRD helper function for test fixtures
sdk/apis/apis/v1alpha2/types_apiexport.go Extended APIExport API with virtual storage configuration
pkg/server/virtualresources/server.go New server for handling virtual resource requests via proxy
pkg/server/aggregatingcrdversiondiscovery/server.go New version discovery server that handles mixed storage types
pkg/virtual/replication/builder/unwrap.go Enhanced replication builder with cluster-aware unwrapping

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

"github.com/kcp-dev/kcp/test/e2e/framework"
)

const XXX_timeout = math.MaxInt64
Copy link

Copilot AI Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be a placeholder constant with an inappropriate name and value. Replace 'XXX_timeout' with a meaningful name and use an appropriate timeout value instead of math.MaxInt64.

Suggested change
const XXX_timeout = math.MaxInt64
const defaultTestTimeout = 30 * time.Second

Copilot uses AI. Check for mistakes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a leftover from when I was testing the test. I'll get rid of it.

obj := innerObj.DeepCopy()
setCluster(obj, cluster)
if !w.safeWrite(watch.Event{
Type: watch.Added,
Copy link

Copilot AI Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the UpdateFunc handler, the event type should be watch.Modified, not watch.Added. This will cause incorrect watch event types to be sent to clients.

Suggested change
Type: watch.Added,
Type: watch.Modified,

Copilot uses AI. Check for mistakes.

obj := innerObj.DeepCopy()
setCluster(obj, cluster)
if !w.safeWrite(watch.Event{
Type: watch.Added,
Copy link

Copilot AI Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the DeleteFunc handler, the event type should be watch.Deleted, not watch.Added. This will cause incorrect watch event types to be sent to clients.

Suggested change
Type: watch.Added,
Type: watch.Deleted,

Copilot uses AI. Check for mistakes.

// of an exported virtual resource.
const virtualResourceAPIExportIdentityKey virtualResourceAPIExportIdentityKeyType = "VirtualResourceAPIExportIdentity"

// WithVirtualWorkspaceName adds the VirtualWorkspace name to the context.
Copy link

Copilot AI Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment incorrectly describes the function as adding VirtualWorkspace name, but the function actually adds APIExport identity. Update the comment to match the function's actual purpose.

Suggested change
// WithVirtualWorkspaceName adds the VirtualWorkspace name to the context.
// WithVirtualResourceAPIExportIdentity adds the APIExport identity to the context.

Copilot uses AI. Check for mistakes.

Copy link
Contributor

@mjudeikis mjudeikis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did a first round. Next - manual testing will follow :)

return crd, nil
}

func AllCRDs() ([]*apiextensionsv1.CustomResourceDefinition, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go docs, public function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And its not quite clear what function does? GetAllCRDs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a hacky way to make DynamicRESTMapper resolve resources coming from system CRDs (e.g. the CachedResourceEndpointSlice). It's superseded by #3630 and once it gets in, this change (+ all other rest mapper changes in this PR) won't be necessary anymore.

return urls, nil
}

func FindOneURL(prefix string, urls []string) (string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APIBindingByIdentityAndGroupResource = "apibinding-byIdentityGroupResource"
)

func IndexAPIBindingByIdentityGroupResource(obj interface{}) ([]string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go doc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return result, nil
}

func IndexAPIExportByVirtualResourceIdentities(obj interface{}) ([]string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

)
},

knownVirtualResourceVerbs: map[string][]string{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel like this need code comment why we doing this verbProvider thingy. I know we talked about this but other readers might not be aware of reasons.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

knownVirtualResourceStatusVerbs: map[string][]string{
"CachedResourceEndpointSlice.cache.kcp.io": {"get"},
},
knownVirtualResourceScaleVerbs: map[string][]string{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and wondering, if we should ahve "catch all", where if somebody adds customer sub-resource we give get automatically? if this even possilbe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do real discovery, and see what verbs are supported by the VW.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we leave it for later? We either hard-code get as you suggested, or do discovery, but in either case it's a functionality that has no use-case for now. It's fairly easy to implement it, but it would be pretty complex to write a test for it. The whole virtual resources codebase is ready for custom *EndpointSlice, but I think we need to iron out specifics of how we want users to add their own virtual resources / backing virtual workspaces etc.

sliceMapping, err := s.drm.ForCluster(apiExportCluster).RESTMapping(schema.GroupKind{
Group: ptr.Deref(virtual.Reference.APIGroup, ""),
Kind: virtual.Reference.Kind,
}, "v1alpha1") // HACK: we need to be able to discover the latest version.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comment in #3620 (comment) -- this is to be dropped, because #3630 adds support for version defaulting.


func virtualResourceURLWithCluster(vwURL, apiExportIdentity string, clusterNameOrWildcard string) string {
// Formats the URL like so:
// <Virtual resource VW endpoint>:<APIExport identity>/clusters/<Target cluster>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add full example how this would look like?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// +kubebuilder:validation:XValidation:rule="self == oldSelf",message="CachedResource reference must not be changed"
CachedResource CachedResourceReference `json:"cachedResource"`

// partition (optional) points to a partition that is used for filtering the endpoints
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// partition (optional) points to a partition that is used for filtering the endpoints
// partition points to a partition that is used for filtering the endpoints

@mjudeikis
Copy link
Contributor

I pushed 1 commit to you branch (my miskate but lets keep it), where I started wiring in the example for VirtualMachines.
16c0acc

But having hard time to make it work :/

~/go/src/github.com/kcp-dev/kcp virtual-resources ❯ kubectl get instances.machines.svm.io                                                                                                                   11:32:19
Error from server (InternalError): Internal error occurred: error resolving resource: no matches for kind "CachedResourceEndpointSlice" in version "cache.kcp.io/v1alpha1"
~/go/src/github.com/kcp-dev/kcp virtual-resources ❯                                     

@gman0
Copy link
Contributor Author

gman0 commented Oct 8, 2025

/retest

gman0 added 3 commits October 9, 2025 12:32
…3553'

CachedResourceSchemaSource is not needed anymore. APIExport holds the
reference to the schema for virtual resources.

On-behalf-of: @SAP [email protected]
Signed-off-by: Robert Vasek <[email protected]>
@gman0 gman0 force-pushed the virtual-resources branch from 16c0acc to 154927d Compare October 9, 2025 11:24
gman0 and others added 4 commits October 13, 2025 10:31
Virtual resources add a way to project resources from a provider
to consumer clusters using APIExports and APIBindings.

On-behalf-of: @SAP [email protected]
Signed-off-by: Robert Vasek <[email protected]>
On-behalf-of: @SAP [email protected]
Signed-off-by: Robert Vasek <[email protected]>
@gman0 gman0 force-pushed the virtual-resources branch from 154927d to 2536889 Compare October 13, 2025 08:32
@kcp-ci-bot
Copy link
Contributor

@gman0: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kcp-test-e2e-shared 2536889 link true /test pull-kcp-test-e2e-shared
pull-kcp-test-e2e-sharded 2536889 link true /test pull-kcp-test-e2e-sharded
pull-kcp-test-e2e 2536889 link true /test pull-kcp-test-e2e
pull-kcp-test-e2e-multiple-runs 2536889 link true /test pull-kcp-test-e2e-multiple-runs

Full PR test history

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has signed the DCO. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feature: Virtual resources aggregation

3 participants