-
Notifications
You must be signed in to change notification settings - Fork 118
Expand schduling.CycleState to the request lifecycle #1062
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: liu-cong The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/assign @nirrozenbaum /assign @kfswain |
This was a 3-way DM including myself, it wasn't really decided. The thought process was to emulate Additionally, in that DM I voiced these opinions to try to save time on this PR in favor of other efforts (such as deprecating the DecisionTreeFilter), which was ignored.
I do not think this is a good idea. And I even have some questions about how we handle CycleState now. We allow for multiple profiles to be selected at one time and ran (not quite in parallel, but in the same Regardless, my suggestion was to create an issue around this topic so that we could discuss this and design it into our system. And if a PR is needed, then we could cut it with intent. Otherwise we are just reactionary, which i dont think we need to be in a:
|
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
This seems to be questions on the existing CycleState management in general, which makes sense to me. Has this been discussed when we made the multi-profile change? Any context on that? |
Just want to clarify, didn't mean to say this was "decided". It was just a quick discussion initially between Nir and I and we think this approach works. And I'd like the continuing discussion to be in the open and show how this actually works in code so I put this PR. I think you prefer discussing this in an issue, that sounds good as well.
The CycleState was meant for plugins to communicate between different extension points without needing to add new interfaces. The way I see it, is a tradeoff between safety and flexibility, and should only be used when it's really necessary. I agree the RequestContext is growing too large and we should manage it properly. However I think the CycleState serves a different purpose though. I don't know a perfect answer but given it's used very sparsely, I have less concerns on it than the RequestContext.
Perhaps I missed this or this was in a different thread, I didn't see a mention of the DecisionTreeFilter deprecation, and I think they are unrelated if I am not mistaken. |
Is this really needed? why can't we do that in Scorer directly? Asking just to explore a path to not block the refactoring of prefix-aware scheduling on the decision about the scope of CycleState. |
The state sharing today:
The state sharing is to avoid duplicate calculation of the chunk hashes. We can avoid state sharing with re-computation, but that will double the overhead. |
Thanks @liu-cong I am supportive of having CycleState to be passed across layers, I think of it as a request processing state.
I don't think we should deep copy. It is up to the extension point to correctly handle the state they create, but we need to clearly state the expectations on how each extension point may get called (that filter and score extensions may get invoked multiple times for the same request). A plugin could "namespace", using the profile name (which we should pass as an argument to extensions that run under a profile), its state if it doesn't wish to share the state across profile runs for example. |
I have the same view as well. But I'd like to confirm with @kfswain , do you have concerns on this PR itself, which expands the scope of the CycleState, or the CycleState management in generally, especially with multi-profiles? If it's more about the CycleState management in general, can we move forward with this PR and discuss the CycleState management in a separate issue? I do agree we need to clarify how the CycleState is managed with multi profiles, as @ahg-g pointed out. I will create an issue to discuss that, whether it's just a documentation or actual changes. |
@kfswain @nirrozenbaum any objections to the above? |
Yes, I still think this is a bad idea. If we want an export field, we should add an export field, and think through the communication points. Heavy coupling of our system via a monolithic object is counter to what the director is used for. The director and request control library is intended to pass only the data that each subsystem I'm an advocate for moving quickly, but in this case it seems unnecessary. Ive attached the relevant conversation points. This conversation, along with this quote
And the existence of this issue: #967 Leave me asking larger questions about the intent of this PR |
What you are suggesting means a plugin should not implement extension points across layers, which is fair as a guiding principle for now, but it means we need to duplicate some extension points (the ones across layers, like PostCycle). This means we should keep PostCycle and remove the deprecation notice in https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/epp/scheduling/framework/plugins.go#L72 |
I would recommend changing the name as well (e.g., |
Yes, and I'm happy to discuss the merits of this argument, but I currently see no need to discuss it under time pressure. I agree with some sort of mechanism, within the scheduling system, to allow cleanup. Your suggested names: Post/Pre Scheduling, seem reasonable. To me, duplication is a worthy cost to reduce coupling. But again, happy to discuss. Perhaps there are more elegant approaches, we could investigate that also |
Actually, if we do this, how would we implement updating the prefix cache index at PostResponse? |
We could make a simple subscription mechanism. A scheduling plugin would subscribe to a certain data key with no knowledge of how that key will arrive there, the subscription list becomes part of a Profile. The director will know to include those keys from the unstructured datastore, and include in the CycleState. That would actually cleanly decouple datacollection from any source, and should work well with @elevran 's pluggable Data Layer design. |
And if we are worried about validation, we can have the data layer plugins specify their emission field (we probably need to do this anyway to ensure that any Data Layer plugin isnt competing over the same key). Then on plugin registration, we validate that all requested subscriptions have an emission, and fail on startup if its untenable. Or we could forgo validation, but this seems an easy check and keeps the user safe. |
I share the same concern that having a cycle state across all layers opens the door for inappropriate use of it and going around the well defined interfaces between the layers. I think a plugin should be able to implement extension points across layers, BUT the way the data is shared between the extension points should be through well defined interfaces and not through a general key value map. |
I am not following every detail of the above proposal, but involving the data layer in request scoped state may be an overkill, CycleState is already the simple subscription mechanism for request scoped state. In any case, happy to postpone the discussion for now, but @liu-cong how important is it to move updating the index to PostResponse? |
I am not sure what well defined interfaces mean in this context; handling request scoped state known only to the plugin itself, and only the said plugin generates and consumes means we need to offer a mechanism for abstract state sharing across layers. We can come up with complex architectures to handle this, but that seems unnecessary... |
I think the key point was mentioned by @kfswain:
I suggest we take few days to try and come up with a proposal that everyone are happy with (or at least ok with) 🙂 |
CycleState should not be populated with all data in the datastore. So in order for the request control lib to know what to populate it with, we have to be explicit about our subscriptions. I actually like the subscription model, it let's us reintroduce type safety. Data Plugins A,B, & C register with their emission key(s) & their associated types. Subscriptions (from any layer honestly) specify the key they are expecting, and the type, and we can validate that all on startup, so there should be no runtime surprises. That also lets data plugins be reused between many different plugins if needed |
I think we can live with keeping prefix use PostCycle until we design a better solution, while keeping the Deprecated comment so others won’t use this extension point |
I will reserve further comments until we have a concrete proposal, I agree we can postpone that for later, I wanted to understand if there is an obvious simple alternative to CycleState for passing request scoped state between the extension points of a plugin... |
SG, I'll take that AI |
Appreciate the discussion! There is no time pressure in this (I hope I didn't indicate that this is urgent). As @nirrozenbaum said, the intention was to cleanup the PostCycle. But given we are having a lot questions, happy to postpone and evaluate other options. Let me open an issue. |
should we close this PR? |
The
CycleState
was initially introduced as a flexible mechanism for scheduling plugins to share state. Currently the prefix plugin uses this to share the prompt chunk hashes between Scorer and PostCycle extension points.We have since expanded the plugin framework beyond the scheduler. And we have decided to deprecate the
PostCycle
extension point in favor of thePreRequest
. Expanding theCycleState
to the entire request lifecycle allows plugins to share state beyond those extension points defined in scheduler.I discussed this refactor with @nirrozenbaum in Slack.
Changes:
cycle_state.go
topkg/epp/plugins/cycle_state.go
CycleState
param toPreRequest
This is a PURE REFACTOR. No change of logic.
@kfswain Pls also comment.