feat: add Gateway API support for v1 controllers#865
feat: add Gateway API support for v1 controllers#865kimwnasptd wants to merge 17 commits intokubeflow:notebooks-v1from
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
2e68509 to
76db9b8
Compare
andyatmiami
left a comment
There was a problem hiding this comment.
@kimwnasptd - in effort to give you some feedback as you have patiently been waiting - I reviewed all the files EXCEPT the 3 *_controller.go files..
I will look to get the _controller.go files reviewed ASAP - but realized this weekend is actually pretty busy for me - so unlikely i'll have the focus time to really reason through code... but I at least went through everything else to drop various thoughts/comments.
I wholly realize we are looking to kinda wind down notebooks-v1 - so I wouldn't necessarily force any of the comments left thus far - so you are welcome to push back/disagree and we can leave code as-is... at minimum - please provide justification so we can be transparent on why we did/didn't act on any given comment.
(i also haven't tested code - yet - again, simply due to my time/availability today.. but will certainly be planning a 2nd review + actual testing)
| github.com/onsi/gomega v1.18.1 | ||
| github.com/prometheus/client_golang v1.12.1 | ||
| k8s.io/api v0.24.1 | ||
| k8s.io/apimachinery v0.24.1 | ||
| k8s.io/client-go v0.24.1 | ||
| sigs.k8s.io/controller-runtime v0.12.1 | ||
| sigs.k8s.io/gateway-api v0.5.1 |
There was a problem hiding this comment.
Question moreso for @thesuperzapper... do we want to try to get a little more aggressive on these API versions if we are already going to be upgrading them?
Here is a list of latest releases for changed packages:
- github.com/onsi/gomega v1.39.1
- github.com/prometheus/client_golang v1.23.2
- k8s.io/api v0.35.0
- k8s.io/apimachinery v0.35.0
- k8s.io/client-go v0.35.0
- sigs.k8s.io/controller-runtime v0.23.1
- sigs.k8s.io/gateway-api v1.4.1
( this comment will likely apply to all *controller lib's go.mod files )
There was a problem hiding this comment.
The only "problematic" one here is the sigs.k8s.io/controller-runtime one. I tried to bump it into a newer version but it ultimately requires more code changes on the core of the controller, as the newer controller-runtime has slightly different function signatures in some cases.
In the end I ended up reverting it, and this is the main reason that some components are also using the v1beta1 (v0.5.1) package for Gateway API
There was a problem hiding this comment.
Let me give it a go to bump the packages to those exact versions and test out the changes.
If they look good, then it will be a nice addition for future proofing ourselves for CVEs also.
For me the main goal would be to bump the sigs.k8s.io/gateway-api v1.4.1 so that the Gateway API code is as much aligned as possible across the controllers.
There was a problem hiding this comment.
Tough luck.
I tried to do the above combination, updating everything aside sigs.k8s.io/controller-runtime but because I was bumping sigs.k8s.io/gateway-api it was also then bumping sigs.k8s.io/controller-runtime to v0.22.1.
So I then tried to update everything else aside those two packages
require (
github.com/go-logr/logr v1.4.3
github.com/onsi/ginkgo v1.16.5
github.com/onsi/gomega v1.39.1
github.com/prometheus/client_golang v1.23.1
k8s.io/api v0.35.0
k8s.io/apimachinery v0.35.0
k8s.io/client-go v0.35.0
sigs.k8s.io/controller-runtime v0.12.1
sigs.k8s.io/gateway-api v0.5.1
)
But I end up getting build errors. Most probably because there's some mismatch between the sigs.k8s.io/controller-runtime v0.12.1 version and the k8s.io/client-go v0.35.0 packages:
------
> [builder 7/7] RUN CGO_ENABLED=0 GOOS=linux go build -a -o manager main.go:
9.317 # sigs.k8s.io/controller-runtime/pkg/cache
9.317 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.1/pkg/cache/informer_cache.go:144:9: cannot use i.Informer (variable of interface type "k8s.io/client-go/tools/cache".SharedIndexInformer) as Informer value in return statement: "k8s.io/client-go/tools/cache".SharedIndexInformer does not implement Informer (wrong type for method AddEventHandler)
9.317 have AddEventHandler("k8s.io/client-go/tools/cache".ResourceEventHandler) ("k8s.io/client-go/tools/cache".ResourceEventHandlerRegistration, error)
9.317 want AddEventHandler("k8s.io/client-go/tools/cache".ResourceEventHandler)
9.317 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.1/pkg/cache/informer_cache.go:158:9: cannot use i.Informer (variable of interface type "k8s.io/client-go/tools/cache".SharedIndexInformer) as Informer value in return statement: "k8s.io/client-go/tools/cache".SharedIndexInformer does not implement Informer (wrong type for method AddEventHandler)
9.317 have AddEventHandler("k8s.io/client-go/tools/cache".ResourceEventHandler) ("k8s.io/client-go/tools/cache".ResourceEventHandlerRegistration, error)
9.317 want AddEventHandler("k8s.io/client-go/tools/cache".ResourceEventHandler)
------
There was a problem hiding this comment.
Kind worried how we are evolving this file.. as it opens up the door to combinations of environment variables that are obviously non-sensical and/or hard to reason about...
What would it mean if someone configured the following:
USE_ISTIO=true
USE_GATEWAY_API=true
ISTIO_GATEWAY=kubeflow/kubeflow-gateway
K8S_GATEWAY_NAME=kubeflow-gateway
K8S_GATEWAY_NAMESPACE=kubeflow
I would probably something like this (but I also wouldn't die on this hill)
GATEWAY_VENDOR=<istio/k8s>
GATEWAY_FQN=...
USE_ISTIO=true
ISTIO_GATEWAY=kubeflow/kubeflow-gateway
- then for backwards compatibility:
- if
USE_ISTIO: true- thenGATEWAY_VENDOR: istio - if
USE_ISTIO: true- thenGATEWAY_FQNis set toISTIO_GATEWAY(if its defined)
- if
I think this gives us sufficient backwards compatibility while also avoiding potentially undefined config combinations...
Additionally - it seems the gateway name/namspace for GatewayAPI cares about the individual pieces (not the FQN) - but given a namspace cannot contain / in its name - splitting apart this singular env var seems straightforward enough..
WDYT?
There was a problem hiding this comment.
But can't we still end up with non-sensical and/or hard to reason about scenarios with the proposed names?
For example, what would it mean if someone configured the following:
USE_ISTIO=true
GATEWAY_VENDOR=k8s
ISTIO_GATEWAY=kubeflow/kubeflow-gateway
GATEWAY_FQN=kubeflow/istio-ingressgatewaySo IIUC, as long as the older env vars keep the same name (which they should, to preserve backwards compatibility) then we can end up with conflicting scenarios with the env vars.
My suggestion is the following:
- We acknowledge that there can be such conflicts
- We agree on what the schema for the new env vars will be
- By default, if the new env vars are not set, the controller code should behave the old way (creating VirtualServices).
- We add some login in the
main.goof the controller to error out immediately with a helpful log message, if such a conflicted configuration is found (USE_ISTIOtrue andUse_GATEWAY_APIto true) - Add comments about that also in the manifests
WDYT?
There was a problem hiding this comment.
Fair point @kimwnasptd!
I guess my other "concern" - albeit its a minor one - is in adding new environment variables that are also "vendor-specific*
USE_ISTIO=true
USE_GATEWAY_API=true
ISTIO_GATEWAY=kubeflow/kubeflow-gateway
K8S_GATEWAY_NAME=kubeflow-gateway
K8S_GATEWAY_NAMESPACE=kubeflow
If for some reason (and this is likely wildly unlikely) - a 3rd ( ❗ ) alternative rose to prominence - we'd then need to add 2-3 new environment variables again.. as we are scoping env vars to the "vendor"
Whereas in this mode - we break that pattern - using "generic" names + a "discriminator" (i.e. enum in GATEWAY_VENDOR) to capture the data..
USE_ISTIO=true
GATEWAY_VENDOR=k8s
ISTIO_GATEWAY=kubeflow/kubeflow-gateway
GATEWAY_FQN=kubeflow/istio-ingressgateway
While for notebooks-v1 exclusively I will absolutely admit this does not matter - setting a pattern here may/could influence what we'd want to do for notebooks-v2
But, I've already probably typed too much on the topic.. I agree with your suggestions in prior comment.. and i'd (personally) lean towards the env vars I am suggesting - but its not a hill I would die on..
I am comfortable that you have internalized my feedback - and am happy to then defer to whatever implementation you feel best.
There was a problem hiding this comment.
Yes, I see your point.
The only reason I preferred the USE_GATEWAY_API is because (theoretically) the goal of the Gateway API was to be the standard way on describing Gateways / Routing in K8s.
Let's get some feedback from @thesuperzapper
| selector: | ||
| matchLabels: | ||
| app: notebook-controller | ||
| kustomize.component: notebook-controller |
There was a problem hiding this comment.
apologies if I am being dense - but what is the underling motivation behind adding this?
is it required to be defined for GatewayAPI uses cases?
There was a problem hiding this comment.
Ah, good catch. This should be redundant as we are already setting those in
Removing the changes.
| @@ -0,0 +1,6 @@ | |||
| USE_ISTIO=true | |||
There was a problem hiding this comment.
same comment as here:
| envFrom: | ||
| - configMapRef: | ||
| name: config |
There was a problem hiding this comment.
this was beautiful simplification ( long time coming 🤣 )
There was a problem hiding this comment.
I feel like maybe we should take this effort to align various package dependency versions that exist across different go.mod files for the various controllers with this repo.
This then also factors into my discussion on if/how aggressively we wanted to bump up dependencies
But, I can also wholly appreciate this is scope creep - so feel free to push back.
There was a problem hiding this comment.
same comment as here:
There was a problem hiding this comment.
I feel like maybe we should take this effort to align various package dependency versions that exist across different go.mod files for the various controllers with this repo.
This then also factors into my discussion on if/how aggressively we wanted to bump up dependencies
But, I can also wholly appreciate this is scope creep - so feel free to push back.
|
Thank you so much for your time reviewing this one @andyatmiami! |
5351b9c to
2243bf5
Compare
andyatmiami
left a comment
There was a problem hiding this comment.
Sorry for delay - finally had time/focus to step through the controller changes... appreciate your time in crafting this new feature.
Happy to have conversations on any of these points if you feel it necessary - and also feel empowered to 'overrule' me should you feel any suggestion unnecessary.
I'll try to make sure @thesuperzapper looks over this too to capture his feedback so you don't get further slow-dripped on requests for change.
ℹ️ I realize the VirtualService "framework" (and the controllers as a whole) do NOT presently have unit tests... so it seems (possibly?) unfair to ask tests be contributed in the context of this PR. But just in the sake of transparency - its very unfortunate w.r.t the immaturity of tests here - as this new code contribution (and its significant impact in controller capabilities) could/would REALLY benefit now and moving forward with having tests give us some peace of mind.
- That being said - I know we are looking to get
notebooks-v1in a more strict "maintenance mode" ASAP (if/asnotebooks-v2is GA'd).. so might be a moot point and/or not burn us in the future.
| // Reconcile VirtualService if we use ISTIO. | ||
| if os.Getenv("USE_ISTIO") == "true" { | ||
| err = r.reconcileVirtualService(instance) | ||
| if err != nil { | ||
| return ctrl.Result{}, err | ||
| } | ||
| } | ||
|
|
||
| // Reconcile HTTPRoute if we use Gateway API. | ||
| if os.Getenv("USE_GATEWAY_API") == "true" { | ||
| err = r.reconcileHTTPRoute(instance) | ||
| if err != nil { | ||
| return ctrl.Result{}, err | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Other controllers seem to have this logic wrapping in an if/else such that only ONE of these env vars would ever take effect.
But here in notebook-controller we are seemingly supporting reconciling BOTH "modes" at one? Is this intentional?
Feels like this should also be an if/else...
There was a problem hiding this comment.
Not sure why they implemented it to be different in this case, but I can't think of any reason to not use the same structure as the other places.
| // watch Istio virtual service | ||
| if os.Getenv("USE_ISTIO") == "true" { | ||
| virtualService := &unstructured.Unstructured{} | ||
| virtualService.SetAPIVersion("networking.istio.io/v1alpha3") | ||
| virtualService.SetKind("VirtualService") | ||
| builder.Owns(virtualService) | ||
| } | ||
|
|
||
| // watch HTTPRoute | ||
| if os.Getenv("USE_GATEWAY_API") == "true" { | ||
| builder.Owns(&gwapiv1beta1.HTTPRoute{}) | ||
| } | ||
|
|
There was a problem hiding this comment.
Similar observation with mutually exclusive if statements as raised here:
| func generateHTTPRoute(instance *v1beta1.Notebook) (*gwapiv1beta1.HTTPRoute, error) { | ||
| name := instance.Name | ||
| namespace := instance.Namespace | ||
| pathPrefix := fmt.Sprintf("/notebook/%s/%s", namespace, name) |
There was a problem hiding this comment.
I wonder if we should factor the generation of this path out into its own function. Particularly because of the similar-but-different structure in VirtualService:
As I understand it... VirtualService uses simple string prefix matching - whereas HttpRoute is segment-aware. As such - we WANT a trailing slash for VirtualService but NOT for HttpRoute
A simple helper function driven off our env variables (which specify what "mode" we are in) will help keep this aligned and also help future maintainers understand this relationship.
There was a problem hiding this comment.
Nah, the backslash at the end should also be used in the HTTPRoute as well. While indeed the path prefix of httproutes is different, still all routes that will be accessed are going to be /notebook/test-ns/test-nm/*
I pushed a commit in which I introduced the common function
fa51c8a
|
|
||
|
|
||
| func (r *PVCViewerReconciler) reconcileHTTPRoute(ctx context.Context, log logr.Logger, viewer *kubefloworgv1alpha1.PVCViewer, commonLabels map[string]string) error { | ||
| prefix := fmt.Sprintf("%s/%s/%s", viewer.Spec.Networking.BasePrefix, viewer.Namespace, viewer.Name) |
There was a problem hiding this comment.
Similar comment as in:
| } | ||
|
|
||
| func generateHTTPRoute(tb *tensorboardv1alpha1.Tensorboard) (*gwapiv1beta1.HTTPRoute, error) { | ||
| prefix := fmt.Sprintf("/tensorboard/%s/%s", tb.Namespace, tb.Name) |
There was a problem hiding this comment.
similar comment as in:
There was a problem hiding this comment.
If a deployment switches from USE_GATEWAY_API=true to USE_ISTIO=true (or vice versa), the previously created routing resource (HTTPRoute or VirtualService) is orphaned. The owner references will eventually clean it up when the parent resource is deleted, but as long as the parent exists, the old routing resource remains and could still be actively routing traffic -- potentially conflicting with the new one.
I'm not sure how we want to handle this (and/or the existing behavior of just ignoring this paradigm during a reconcile loop is sufficient)
But its at least probably worth discussing and maybe adding an inline comment (or something) in the reconcile function calling this out..
pvcviewer and tensorboard - but for the sake of brevity I will just leave this comment here.
There was a problem hiding this comment.
This is a very good point to sync on.
The current behavior follows the KServe pattern, in which it's not officially supported to switch between the two. Which in practical terms mean that VirtualServices will be leftover.
Fully agree to document this. I can definitely add this to the reconcile function, and also potentially to the READMEs of the components. Do you have other thoughts on this @andyatmiami
| "kind": "VirtualService", | ||
| }, | ||
| } | ||
| httpRouteTemplate = &gwapiv1.HTTPRoute{} |
There was a problem hiding this comment.
strictly speaking - we don't need to define this as a var and could just reference it inline.. but I can appreciate since the virtualservice counterpart is defined here (and needs to be a var) - co-locating has its benefits.
There was a problem hiding this comment.
yeh, I'm also OK with both. Happy to pick one if you have a preference
There was a problem hiding this comment.
could we add a trailing newline to make the computers happy ?
There was a problem hiding this comment.
oh interesing! what do you mean by that? 😅
There was a problem hiding this comment.
could we add a trailing newline to make the computers happy ?
|
|
||
| httpRoute := &gwapiv1beta1.HTTPRoute{ | ||
| ObjectMeta: metav1.ObjectMeta{ | ||
| Name: virtualServiceName(name, namespace), |
There was a problem hiding this comment.
totally minor nit you can ignore if you choose..
but having an httproute create its name through a function named virtualServiceName just feels a little off ..
routeResourceName - or something similarly vendor-agnostic - might be nice 🤷
There was a problem hiding this comment.
now it's too late to unsee, so I'm making the change 😄
81a0f25 to
41c45e0
Compare
|
/ok-to-test |
Signed-off-by: madmecodes <ayushguptadev1@gmail.com> Signed-off-by: kimwnasptd <kimonas.sotirchos@canonical.com>
Signed-off-by: madmecodes <ayushguptadev1@gmail.com> Signed-off-by: kimwnasptd <kimonas.sotirchos@canonical.com>
Signed-off-by: madmecodes <ayushguptadev1@gmail.com> Signed-off-by: kimwnasptd <kimonas.sotirchos@canonical.com>
Signed-off-by: madmecodes <ayushguptadev1@gmail.com> Signed-off-by: kimwnasptd <kimonas.sotirchos@canonical.com>
Signed-off-by: Kimonas Sotirchos <kimonas.sotirchos@canonical.com>
Signed-off-by: Kimonas Sotirchos <kimonas.sotirchos@canonical.com>
Signed-off-by: Kimonas Sotirchos <kimonas.sotirchos@canonical.com>
Signed-off-by: Kimonas Sotirchos <kimonas.sotirchos@canonical.com>
Signed-off-by: Kimonas Sotirchos <kimwnasptd@gmail.com>
41c45e0 to
f6d6d67
Compare
Signed-off-by: Kimonas Sotirchos <kimwnasptd@gmail.com>
Signed-off-by: Kimonas Sotirchos <kimwnasptd@gmail.com>
d67c2ef to
f608aa1
Compare
Signed-off-by: Kimonas Sotirchos <kimwnasptd@gmail.com>
f608aa1 to
fa51c8a
Compare
Signed-off-by: Kimonas Sotirchos <kimwnasptd@gmail.com>
Signed-off-by: Kimonas Sotirchos <kimwnasptd@gmail.com>
eba3925 to
a24da99
Compare
Signed-off-by: Kimonas Sotirchos <kimwnasptd@gmail.com>
515db6d to
c102e46
Compare
Signed-off-by: Kimonas Sotirchos <kimwnasptd@gmail.com>
|
Thank you so much for the detailed review @andyatmiami! I fully agree with your sentiment on the unit tests. I'll see if I can find some time to add some basic unit tests regarding the HTTPRoute and VirtualSerices. The goal will be though to not need to introduce any new packages, just to ensure we generate the expected resources. Although I can't promise I'll make it until next week, as my schedule is a bit hectic. The rest should be ready for another pass! |
Signed-off-by: Kimonas Sotirchos <kimwnasptd@gmail.com>
|
Also a note, I added the |
This PR is the clone of #628, with some extra fixes.
The PR only extends the PVCViewer, Notebook and Tensorboard Controllers to have 3 new env vars, for controlling if the Gateway API should be used instead of Istio's resources.
Extra Changes
Specifically, this PR introduces the following changes on top of the original PR:
NOTE: I've made minimal updates to
go.mod, as to not interfere with the current efforts of fixing CVEs.Review Notes
v1version ofgateway-apimodule, while the other two thev1beta1. This was what was initially done in the PR, which I guess was in order to avoid updating thecontroller-runtimemodule (which needs code changes)cc @deusebio
/cc @thesuperzapper
/cc @andyatmiami