Skip to content

feat: add ambient mode support to kubeflow controllers#628

Closed
madmecodes wants to merge 5 commits intokubeflow:notebooks-v1from
madmecodes:feat/ambient-mode-support
Closed

feat: add ambient mode support to kubeflow controllers#628
madmecodes wants to merge 5 commits intokubeflow:notebooks-v1from
madmecodes:feat/ambient-mode-support

Conversation

@madmecodes
Copy link

Motivation

This PR makes the Kubeflow controllers (notebook-controller, tensorboard-controller, and pvcviewer-controller)
compatible with Kubernetes Gateway API, which is required for Istio ambient mode support.

Why this change is needed

Istio ambient mode is the service mesh architecture that provides simplified operations and
improved performance. However, it requires different routing resources compared to traditional Istio sidecar
mode:

  • Sidecar mode (current): Uses Istio-specific VirtualService resources for L7 routing
  • Ambient mode (new): Uses Kubernetes-native HTTPRoute resources from the Gateway API for L7 routing

What this PR accomplishes

  1. Adds Gateway API support: Controllers can now create and manage HTTPRoute resources alongside existing
    VirtualService support
  2. Maintains backward compatibility: Default behavior remains unchanged (sidecar mode with VirtualService)
  3. Proper separation of concerns: USE_ISTIO controls VirtualService creation, USE_GATEWAY_API controls HTTPRoute
    creation
  4. Configurable deployment: Environment variables allow switching between modes without code changes

Configuration

  • Sidecar mode (default): USE_ISTIO=true, USE_GATEWAY_API=false
  • Ambient mode: USE_ISTIO=false, USE_GATEWAY_API=true
  • Gateway configuration: K8S_GATEWAY_NAME and K8S_GATEWAY_NAMESPACE specify the target Gateway resource

This change enables Kubeflow to work seamlessly in both traditional sidecar environments and modern ambient mesh
deployments.

Reference: kubeflow/kubeflow#7736

Signed-off-by: madmecodes <ayushguptadev1@gmail.com>
Signed-off-by: madmecodes <ayushguptadev1@gmail.com>
Signed-off-by: madmecodes <ayushguptadev1@gmail.com>
Signed-off-by: madmecodes <ayushguptadev1@gmail.com>
Signed-off-by: madmecodes <ayushguptadev1@gmail.com>
@github-project-automation github-project-automation bot moved this to Needs Triage in Kubeflow Notebooks Sep 30, 2025
@google-oss-prow google-oss-prow bot added the area/controller area - related to controller components label Sep 30, 2025
@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign thesuperzapper for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@juliusvonkohout
Copy link
Member

@kimwnasptd @orfeas-k for review.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity.
It will be closed if no further activity occurs.
Thank you for your contributions.

Members may comment /lifecycle frozen to prevent this pull request from being marked as stale.

@juliusvonkohout
Copy link
Member

@kimwnasptd @orfeas-k

@monotek
Copy link

monotek commented Dec 1, 2025

How is authentication handled here?
For example notebooks behind oauth2 proxy?

Suppose the waypoint proxy is in another namespace ("istio-system" for example). In that case, the profile controller should likely also be adjusted to be able to add this namespace to the "ns-owner-access-istio" AuthorizationPolicy's "source.namespace" list?

I guess the notebook controller would also need to add an AuthorizationPolicy per service?
At least that's currently our solution (implemented via Kyverno mutate / create).

@kimwnasptd
Copy link
Member

I'm trying to run the Notebook Controller locally from this PR, but am getting the following error in the logs of the controller:

1.7659585291018312e+09  ERROR   controller-runtime.source       kind must be registered to the Scheme   {"error": "no kind is registered for the type v1beta1.HTTPRoute in scheme \"pkg/runtime/scheme.go:100\""}
sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1.1
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.1/pkg/source/source.go:142
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext
        /go/pkg/mod/k8s.io/apimachinery@v0.24.1/pkg/util/wait/wait.go:233
k8s.io/apimachinery/pkg/util/wait.WaitForWithContext
        /go/pkg/mod/k8s.io/apimachinery@v0.24.1/pkg/util/wait/wait.go:660
k8s.io/apimachinery/pkg/util/wait.poll
        /go/pkg/mod/k8s.io/apimachinery@v0.24.1/pkg/util/wait/wait.go:594
k8s.io/apimachinery/pkg/util/wait.PollImmediateUntilWithContext
        /go/pkg/mod/k8s.io/apimachinery@v0.24.1/pkg/util/wait/wait.go:545
sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.1/pkg/source/source.go:132

I've ensured that the following env vars are set:

  1. USE_GATEWAY_API is set to true
  2. USE_ISTIO is set to false

@madmecodes any ideas?

@kimwnasptd
Copy link
Member

A first approach that gets this error away is to add the HTTPRoute to the scheme directly, but am not sure why this is the case with HTTPRoute and not VirtualService.

// main.go
func init() {
	utilruntime.Must(clientgoscheme.AddToScheme(scheme))

	utilruntime.Must(nbv1.AddToScheme(scheme))
	utilruntime.Must(nbv1alpha1.AddToScheme(scheme))
	utilruntime.Must(nbv1beta1.AddToScheme(scheme))
	utilruntime.Must(gwapiv1beta1.AddToScheme(scheme))  // <-- ADDED

	//+kubebuilder:scaffold:scheme
}

@madmecodes do you also encounter this error?

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity.
It will be closed if no further activity occurs.
Thank you for your contributions.

Members may comment /lifecycle frozen to prevent this pull request from being marked as stale.

@github-actions
Copy link

This pull request has been automatically closed because it has not had recent activity.
You can reopen the PR if you want.

@github-actions github-actions bot closed this Mar 10, 2026
@github-project-automation github-project-automation bot moved this from Needs Triage to Done in Kubeflow Notebooks Mar 10, 2026
@christian-heusel
Copy link

/lifecycle frozen
/reopen

@google-oss-prow google-oss-prow bot reopened this Mar 10, 2026
@google-oss-prow
Copy link

@christian-heusel: Reopened this PR.

Details

In response to this:

/lifecycle frozen
/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@google-oss-prow
Copy link

@christian-heusel: The lifecycle/frozen label cannot be applied to Pull Requests.

Details

In response to this:

/lifecycle frozen
/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@juliusvonkohout
Copy link
Member

@christian-heusel that is continued in #865

@christian-heusel
Copy link

Yeah sorry, I confused the two PRs, thanks for cleaning up @juliusvonkohout 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/controller area - related to controller components area/v1 area - version - kubeflow notebooks v1 size/XL

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants