Skip to content

Conversation

@madmecodes
Copy link
Contributor

@madmecodes madmecodes commented Jul 15, 2025

Motivation

This PR makes the Kubeflow profile-controller compatible with Istio ambient mode, enabling Kubeflow to leverage service mesh architecture for improved performance and simplified operations.

Why this change is needed

Istio ambient mode represents a fundamental shift in service mesh architecture that provides simplified operations and improved performance. However, it requires different configuration and resources compared to traditional Istio sidecar mode:

  • Sidecar mode (current): Uses sidecar containers in each pod with istio-injection: enabled namespace labels and VirtualService resources for L7 routing (in CNI envoy sidecar is present but no istio-init privilege container )
  • Ambient mode (new): Uses waypoint proxies with ambient-specific namespace labels (istio.io/dataplane-mode: ambient) and Gateway API resources for L7 routing

The profile-controller is responsible for configuring namespaces and network policies for Kubeflow user profiles. To support ambient mode, it needs to:

  1. Apply the correct namespace labels for ambient data plane mode
  2. Configure waypoint proxy usage for L7 traffic processing
  3. Create appropriate AuthorizationPolicy resources for waypoint traffic
  4. Manage Gateway API resources alongside existing Istio resources

What this PR accomplishes

  • Adds ambient mode support: Profile-controller can now configure namespaces for ambient mode with proper labeling (istio.io/dataplane-mode: ambient, istio.io/use-waypoint, etc.)
  • Waypoint proxy management: Creates and configures waypoint proxies using Gateway API when needed
  • L4 Authorization Policies: Implements waypoint-specific AuthorizationPolicy resources for secure traffic routing
  • Maintains backward compatibility: Default behavior remains unchanged (sidecar mode), existing deployments continue to work
  • Configurable deployment: Environment variables allow switching between modes without code changes
  • Proper separation of concerns: SERVICE_MESH_MODE controls the mesh architecture, waypoint settings control proxy configuration

Configuration

  • Sidecar mode (default): SERVICE_MESH_MODE=istio-sidecar
  • Ambient mode: SERVICE_MESH_MODE=istio-ambient with optional waypoint configuration:
    • WAYPOINT_NAME: Name of waypoint proxy (default: waypoint)
    • WAYPOINT_NAMESPACE: Waypoint namespace (optional, defaults to profile namespace)
    • CREATE_WAYPOINT: Whether to create waypoint if it doesn't exist (default: false)

This change is a critical component for enabling Kubeflow to work seamlessly in ambient mesh deployments, complementing the Gateway API support added to other Kubeflow controllers in PR #7736.

Configuration Options

New environment variables and CLI flags:

  • SERVICE_MESH_MODE: istio-sidecar (default) or istio-ambient
  • WAYPOINT_NAME: Name of waypoint proxy (default: waypoint)
  • WAYPOINT_NAMESPACE: Waypoint namespace (optional, defaults to profile namespace)
  • CREATE_WAYPOINT: Whether to create waypoint if it doesn't exist (default: false)

Deployment

# Deploy with ambient mode
kustomize build config/overlays/kubeflow-ambient | kubectl apply -f -

# Or set environment variables manually
export SERVICE_MESH_MODE=istio-ambient
export WAYPOINT_NAME=my-waypoint

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign kimwnasptd for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

@orfeas-k orfeas-k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @madmecodes . The changes look promising. There are some things missing though.

For visibility, it 'd be useful to include in the PR, somewhere before the summary of changes, the features (from below) plus the goal of the PR, explaining why we need it (make profile-controller compatible with Istio ambient's mode that requires Gateway API compatibility). Let's reiterate what the profile-controller needs to do in order to achieve this.

In Istio ambient, L7 AuthoizationPolicies are attached to waypoint, thus a waypoint is needed to secure the namespace. So when profile-controller works in ambient mode, it needs to:

  1. Accept arguments about waypoint, waypoint-namespace and create-waypoint (instead of the gateway ones)
  2. Create the (existing) L7 AuthorizationPolicy as it already does, but attach it to the waypoint provided.
  3. Create a second L4 AuthorizationPolicy to allow traffic from the waypoint's principal to all services in the profile namespace.
  4. label profile namespace with appropriate labels.
    istio.io/use-waypoint=<waypoint-name>
    istio.io/use-waypoint-namespace=<waypoint-namespace>
    istio.io/ingress-use-waypoint=true
    
  5. if the create-waypoint is set to true, create a waypoint in the profile namespace.

Comment on lines 28 to 33
- "-service-mesh-mode"
- $(SERVICE_MESH_MODE)
- "-gateway-name"
- $(GATEWAY_NAME)
- "-gateway-namespace"
- $(GATEWAY_NAMESPACE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the code, the gateway and gateway-namespace are not used at all. The HTTPRoute is applied during deployment (instead of a VirtualService) and not by the profile-controller binary AFAIK.

The AuthorizationPolicy that the profile-controller already creates is what needs to be changed to secure profile namespaces in Istio ambient mode. To achieve this, it needs to be attached to a waypoint, since this is an L7 AuthorizationPolicy (see this example).

This is done using a targetRef instead of a selector. The targetRef in that case is indeed of kind: Gateway but the gateway has a gateway-class: waypoint. Thus, using the waypoint naming in the code is more clear, since it actually targets a different resource than the HTTPRoute.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I 'm not too familiar with overlays but I think we should be able split the kubeflow to two flavours in order to avoid manifests duplication and only change what's needed in sidecar and ambient.

Comment on lines 10 to 15
env:
- name: SERVICE_MESH_MODE
value: ambient
- name: GATEWAY_NAME
value: kubeflow-gateway
- name: GATEWAY_NAMESPACE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other environment variables are being fetched from the generated configmap which is generated here https://github.com/madmecodes/dashboard/blob/2ecf65c9c42aaf4ccdbbc5913c18ac78899982fe/components/profile-controller/config/manager/kustomization.yaml#L5-L14. To keep a single source of truth and ensure those manifests are configurable from a higher level, it 'd be nice to follow what's already done.

Comment on lines 139 to 140
if r.ServiceMeshMode == "ambient" {
// In ambient mode, disable sidecar injection but enable ambient mesh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of including this if statement twice in the code, I think we could leverage the existing setNamespaceLabels() calls, by probably creating a new object including istio labels + defaultKubeflowNamespaceLabels and passing this to those calls.

Signed-off-by: madmecodes <[email protected]>
Copy link
Contributor

@orfeas-k orfeas-k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: review in progress, submitted those comments by accident

Comment on lines +811 to +814
delete(ns.Labels, "istio.io/dataplane-mode")
delete(ns.Labels, "istio.io/use-waypoint")
delete(ns.Labels, "istio.io/use-waypoint-namespace")
delete(ns.Labels, "istio.io/ingress-use-waypoint")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, kudos!

Comment on lines 531 to 535
// In ambient mode, we still use selector but target the waypoint workload
// TODO: Once Istio supports targetRef in AuthorizationPolicy, update this
if r.ServiceMeshMode == "istio-ambient" {
// For now, keep the selector-based approach for ambient mode
// The waypoint will be created separately and policies will apply to namespace workloads
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Istio in ambient mode already supports the use of targetRef. That's how it can be bound to all traffic passing through the whole namespace https://github.com/orfeas-k/kubeflow-istio-ambient-demo/blob/main/use-case-c-programmatic-access-using-k8s-token/ns-owner-access-istio-ap-l7-waypoint.yaml#L38-L41.

Copy link
Contributor

@orfeas-k orfeas-k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job @madmecodes !! We 're close, I 've left some comments. As a side note, I would appreciate if there was a response to comments to understand how each comment is addressed in the pushed changes, cause I needed to scan the changes to figure out the rationale.

}

// Set service mesh labels based on mode
r.setServiceMeshLabels(ns, instance)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a follow up to #127 (comment), can't this and setNamespaceLabels be one call? I don't see the benefit in doing two calls in the code to set namespace labels.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created setNamespaceLabelsAndServiceMesh() that handles both default labels and service mesh labels in one call

- name: WAYPOINT_NAMESPACE
value: ""
- name: CREATE_WAYPOINT
value: "true"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it defaults to true while here to false. LEt's keep the default to false everywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed ambient patch to consistently use "false"

}

// updateL4AuthorizationPolicy creates L4 AuthorizationPolicy to allow traffic from waypoint to services
func (r *ProfileReconciler) updateL4AuthorizationPolicy(profileIns *profilev1.Profile) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good job, this looks good!

"cluster.local/ns/kubeflow/sa/ml-pipeline-ui")

return istioSecurity.AuthorizationPolicy{
policy := istioSecurity.AuthorizationPolicy{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted to direct return since no policy modifications were actually needed

- name: WAYPOINT_NAMESPACE
value: ""
- name: CREATE_WAYPOINT
value: "true" No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need those here since we have the envFrom in the deployment and the same envvars in the configmap, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kept only SERVICE_MESH_MODE=istio-ambient override, removed redundant WAYPOINT_NAME, WAYPOINT_NAMESPACE, and CREATE_WAYPOINT

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I tried to apply this I got

The HTTPRoute "profiles-kfam" is invalid: 
* spec.hostnames[0]: Invalid value: "*": spec.hostnames[0] in body should match '^(\*\.)?[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$'
* spec.parentRefs[0].namespace: Invalid value: "$(GATEWAY_NAMESPACE)": spec.parentRefs[0].namespace in body should match '^[a-z0-9]([-a-z0-9]*[a-z0-9])?$'
* spec.rules[0].backendRefs[0].namespace: Invalid value: "$(PROFILES_NAMESPACE)": spec.rules[0].backendRefs[0].namespace in body should match '^[a-z0-9]([-a-z0-9]*[a-z0-9])?$'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I tried to apply this I got

The HTTPRoute "profiles-kfam" is invalid: 
* spec.hostnames[0]: Invalid value: "*": spec.hostnames[0] in body should match '^(\*\.)?[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$'
* spec.parentRefs[0].namespace: Invalid value: "$(GATEWAY_NAMESPACE)": spec.parentRefs[0].namespace in body should match '^[a-z0-9]([-a-z0-9]*[a-z0-9])?$'
* spec.rules[0].backendRefs[0].namespace: Invalid value: "$(PROFILES_NAMESPACE)": spec.rules[0].backendRefs[0].namespace in body should match '^[a-z0-9]([-a-z0-9]*[a-z0-9])?$'

i tried, some issue in injecting variable values, if i harcode them like this they work and are installed,

- $(GATEWAY_NAMESPACE) to istio-system
- $(PROFILES_NAMESPACE) to kubeflow

this is how i am installing:

  1. kind create cluster --name profile-controller-test
  2. kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.2.0/standard-install.yaml
  3. kustomize build /Users/themadme/gsoc/manifests/common/istio/istio-crds/base | kubectl apply -f -

Finally:
kustomize build config/overlays/kubeflow-ambient | kubectl apply --dry-run=client -f -
Result: All Resources Validated Successfully

Am i missing anything?

  - Fix HTTPRoute hostname validation (*.kubeflow.local to kubeflow.local)
  - Add ambient mode CLI arguments to manager container
  - Disable sidecar injection for ambient deployments
  - Separate KFAM and manager container arguments
  - Add SERVICE_MESH_MODE environment variable support

Signed-off-by: madmecodes <[email protected]>
@madmecodes madmecodes force-pushed the feat/profile-controller-ambient-mode-support branch from 04904a1 to 29274e3 Compare September 10, 2025 05:17
@madmecodes madmecodes requested a review from orfeas-k September 10, 2025 05:44
@madmecodes
Copy link
Contributor Author

@Orfeas Kourkakis I have tested the Dashboard PR in a GKE cluster with ambient support.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity.
It will be closed if no further activity occurs.
Thank you for your contributions.

Members may comment /lifecycle frozen to prevent this pull request from being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants