Skip to content

Expand schduling.CycleState to the request lifecycle #1062

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

liu-cong
Copy link
Contributor

The CycleState was initially introduced as a flexible mechanism for scheduling plugins to share state. Currently the prefix plugin uses this to share the prompt chunk hashes between Scorer and PostCycle extension points.

We have since expanded the plugin framework beyond the scheduler. And we have decided to deprecate the PostCycle extension point in favor of the PreRequest. Expanding the CycleState to the entire request lifecycle allows plugins to share state beyond those extension points defined in scheduler.

I discussed this refactor with @nirrozenbaum in Slack.

Changes:

  • Moved cycle_state.go to pkg/epp/plugins/cycle_state.go
  • Added CycleState param to PreRequest
  • Updated references accordingly

This is a PURE REFACTOR. No change of logic.

@kfswain Pls also comment.

Copy link

netlify bot commented Jun 25, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 3f40e6a
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/685b6dcb19bce200083fdfe8
😎 Deploy Preview https://deploy-preview-1062--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 25, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: liu-cong
Once this PR has been reviewed and has the lgtm label, please assign ahg-g for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 25, 2025
@liu-cong
Copy link
Contributor Author

/assign @nirrozenbaum

/assign @kfswain

@kfswain
Copy link
Collaborator

kfswain commented Jun 25, 2025

I discussed this refactor with @nirrozenbaum in Slack.

This was a 3-way DM including myself, it wasn't really decided. The thought process was to emulate RequestContext which is a very large struct that had way too much scope. It's intentionally been contained to the Director and I would honestly like to decompose it further.

Additionally, in that DM I voiced these opinions to try to save time on this PR in favor of other efforts (such as deprecating the DecisionTreeFilter), which was ignored.

Expanding the CycleState to the entire request lifecycle allows plugins to share state beyond those extension points defined in scheduler.

I do not think this is a good idea. And I even have some questions about how we handle CycleState now. We allow for multiple profiles to be selected at one time and ran (not quite in parallel, but in the same Pick set) is it expected that data is shared between those cycles? Order isn't necessarily guaranteed, so we might want to deep copy the cycle state, and if so, how do we reconcile the differences and merge? Should we?

Regardless, my suggestion was to create an issue around this topic so that we could discuss this and design it into our system. And if a PR is needed, then we could cut it with intent. Otherwise we are just reactionary, which i dont think we need to be in a:

PURE REFACTOR. No change of logic.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 25, 2025
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@liu-cong
Copy link
Contributor Author

I do not think this is a good idea. And I even have some questions about how we handle CycleState now. We allow for multiple profiles to be selected at one time and ran (not quite in parallel, but in the same Pick set) is it expected that data is shared between those cycles? Order isn't necessarily guaranteed, so we might want to deep copy the cycle state, and if so, how do we reconcile the differences and merge? Should we?

This seems to be questions on the existing CycleState management in general, which makes sense to me. Has this been discussed when we made the multi-profile change? Any context on that?

@liu-cong
Copy link
Contributor Author

This was a 3-way DM including myself, it wasn't really decided.

Just want to clarify, didn't mean to say this was "decided". It was just a quick discussion initially between Nir and I and we think this approach works. And I'd like the continuing discussion to be in the open and show how this actually works in code so I put this PR.

I think you prefer discussing this in an issue, that sounds good as well.

The thought process was to emulate RequestContext which is a very large struct that had way too much scope. It's intentionally been contained to the Director and I would honestly like to decompose it further.

The CycleState was meant for plugins to communicate between different extension points without needing to add new interfaces. The way I see it, is a tradeoff between safety and flexibility, and should only be used when it's really necessary. I agree the RequestContext is growing too large and we should manage it properly. However I think the CycleState serves a different purpose though. I don't know a perfect answer but given it's used very sparsely, I have less concerns on it than the RequestContext.

Additionally, in that DM I voiced these opinions to try to save time on this PR in favor of other efforts (such as deprecating the DecisionTreeFilter), which was ignored.

Perhaps I missed this or this was in a different thread, I didn't see a mention of the DecisionTreeFilter deprecation, and I think they are unrelated if I am not mistaken.

@ahg-g
Copy link
Contributor

ahg-g commented Jun 25, 2025

Currently the prefix plugin uses this to share the prompt chunk hashes between Scorer and PostCycle extension points.

Is this really needed? why can't we do that in Scorer directly? Asking just to explore a path to not block the refactoring of prefix-aware scheduling on the decision about the scope of CycleState.

@liu-cong
Copy link
Contributor Author

Is this really needed? why can't we do that in Scorer directly? Asking just to explore a path to not block the refactoring of prefix-aware scheduling on the decision about the scope of CycleState.

The state sharing today:

  • When a request comes in, we calculate its chunk hashes in the Scorer extension point, and score based the max prefix cache match.
  • We cache those chunk hashes in the CycleState
  • When we reach the PostCycle (to be replaced by the PreRequest), we update the LRU with those hashes and the pod we send the request to.

The state sharing is to avoid duplicate calculation of the chunk hashes. We can avoid state sharing with re-computation, but that will double the overhead.

@ahg-g
Copy link
Contributor

ahg-g commented Jun 26, 2025

Thanks @liu-cong

I am supportive of having CycleState to be passed across layers, I think of it as a request processing state.

We allow for multiple profiles to be selected at one time and ran (not quite in parallel, but in the same Pick set) is it expected that data is shared between those cycles? Order isn't necessarily guaranteed, so we might want to deep copy the cycle state, and if so, how do we reconcile the differences and merge? Should we?

I don't think we should deep copy. It is up to the extension point to correctly handle the state they create, but we need to clearly state the expectations on how each extension point may get called (that filter and score extensions may get invoked multiple times for the same request). A plugin could "namespace", using the profile name (which we should pass as an argument to extensions that run under a profile), its state if it doesn't wish to share the state across profile runs for example.

@liu-cong
Copy link
Contributor Author

I am supportive of having CycleState to be passed across layers, I think of it as a request processing state.

I have the same view as well. But I'd like to confirm with @kfswain , do you have concerns on this PR itself, which expands the scope of the CycleState, or the CycleState management in generally, especially with multi-profiles?

If it's more about the CycleState management in general, can we move forward with this PR and discuss the CycleState management in a separate issue?

I do agree we need to clarify how the CycleState is managed with multi profiles, as @ahg-g pointed out. I will create an issue to discuss that, whether it's just a documentation or actual changes.

@ahg-g
Copy link
Contributor

ahg-g commented Jun 27, 2025

@kfswain @nirrozenbaum any objections to the above?

@kfswain
Copy link
Collaborator

kfswain commented Jun 27, 2025

Yes, I still think this is a bad idea. If we want an export field, we should add an export field, and think through the communication points.

Heavy coupling of our system via a monolithic object is counter to what the director is used for. The director and request control library is intended to pass only the data that each subsystem

I'm an advocate for moving quickly, but in this case it seems unnecessary.
image

Ive attached the relevant conversation points. This conversation, along with this quote

Perhaps I missed this or this was in a different thread, I didn't see a mention of the DecisionTreeFilter deprecation, and I think they are unrelated if I am not mistaken.
Source: #1062 (comment)

And the existence of this issue: #967

Leave me asking larger questions about the intent of this PR

@ahg-g
Copy link
Contributor

ahg-g commented Jun 27, 2025

What you are suggesting means a plugin should not implement extension points across layers, which is fair as a guiding principle for now, but it means we need to duplicate some extension points (the ones across layers, like PostCycle). This means we should keep PostCycle and remove the deprecation notice in https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/epp/scheduling/framework/plugins.go#L72

@ahg-g
Copy link
Contributor

ahg-g commented Jun 27, 2025

I would recommend changing the name as well (e.g., PostScheduling), and we may also need a PreScheduling extension point to allow computing state shared across profile runs.

@kfswain
Copy link
Collaborator

kfswain commented Jun 27, 2025

What you are suggesting means a plugin should not implement extension points across layers

Yes, and I'm happy to discuss the merits of this argument, but I currently see no need to discuss it under time pressure.

I agree with some sort of mechanism, within the scheduling system, to allow cleanup. Your suggested names: Post/Pre Scheduling, seem reasonable. To me, duplication is a worthy cost to reduce coupling. But again, happy to discuss. Perhaps there are more elegant approaches, we could investigate that also

@ahg-g
Copy link
Contributor

ahg-g commented Jun 27, 2025

What you are suggesting means a plugin should not implement extension points across layers, which is fair as a guiding principle for now, but it means we need to duplicate some extension points (the ones across layers, like PostCycle). This means we should keep PostCycle and remove the deprecation notice in https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/epp/scheduling/framework/plugins.go#L72

Actually, if we do this, how would we implement updating the prefix cache index at PostResponse?

@kfswain
Copy link
Collaborator

kfswain commented Jun 27, 2025

Actually, if we do this, how would we implement updating the prefix cache index at PostResponse?

We could make a simple subscription mechanism. A scheduling plugin would subscribe to a certain data key with no knowledge of how that key will arrive there, the subscription list becomes part of a Profile. The director will know to include those keys from the unstructured datastore, and include in the CycleState. That would actually cleanly decouple datacollection from any source, and should work well with @elevran 's pluggable Data Layer design.

@kfswain
Copy link
Collaborator

kfswain commented Jun 27, 2025

And if we are worried about validation, we can have the data layer plugins specify their emission field (we probably need to do this anyway to ensure that any Data Layer plugin isnt competing over the same key). Then on plugin registration, we validate that all requested subscriptions have an emission, and fail on startup if its untenable. Or we could forgo validation, but this seems an easy check and keeps the user safe.

@nirrozenbaum
Copy link
Contributor

I share the same concern that having a cycle state across all layers opens the door for inappropriate use of it and going around the well defined interfaces between the layers.

I think a plugin should be able to implement extension points across layers, BUT the way the data is shared between the extension points should be through well defined interfaces and not through a general key value map.

@ahg-g
Copy link
Contributor

ahg-g commented Jun 27, 2025

I am not following every detail of the above proposal, but involving the data layer in request scoped state may be an overkill, CycleState is already the simple subscription mechanism for request scoped state.

In any case, happy to postpone the discussion for now, but @liu-cong how important is it to move updating the index to PostResponse?

@ahg-g
Copy link
Contributor

ahg-g commented Jun 27, 2025

I think a plugin should be able to implement extension points across layers, BUT the way the data is shared between the extension points should be through well defined interfaces and not through a general key value map.

I am not sure what well defined interfaces mean in this context; handling request scoped state known only to the plugin itself, and only the said plugin generates and consumes means we need to offer a mechanism for abstract state sharing across layers. We can come up with complex architectures to handle this, but that seems unnecessary...

@nirrozenbaum
Copy link
Contributor

I think the key point was mentioned by @kfswain:

but I currently see no need to discuss it under time pressure.

I suggest we take few days to try and come up with a proposal that everyone are happy with (or at least ok with) 🙂

@kfswain
Copy link
Collaborator

kfswain commented Jun 27, 2025

CycleState should not be populated with all data in the datastore. So in order for the request control lib to know what to populate it with, we have to be explicit about our subscriptions.

I actually like the subscription model, it let's us reintroduce type safety.

Data Plugins A,B, & C register with their emission key(s) & their associated types. Subscriptions (from any layer honestly) specify the key they are expecting, and the type, and we can validate that all on startup, so there should be no runtime surprises. That also lets data plugins be reused between many different plugins if needed

@nirrozenbaum
Copy link
Contributor

nirrozenbaum commented Jun 27, 2025

I think we can live with keeping prefix use PostCycle until we design a better solution, while keeping the Deprecated comment so others won’t use this extension point

@ahg-g
Copy link
Contributor

ahg-g commented Jun 27, 2025

I will reserve further comments until we have a concrete proposal, I agree we can postpone that for later, I wanted to understand if there is an obvious simple alternative to CycleState for passing request scoped state between the extension points of a plugin...

@kfswain
Copy link
Collaborator

kfswain commented Jun 27, 2025

until we have a concrete proposal

SG, I'll take that AI

@liu-cong
Copy link
Contributor Author

In any case, happy to postpone the discussion for now, but @liu-cong how important is it to move updating the index to PostResponse?
I think we can live with keeping prefix use PostCycle until we design a better solution, while keeping the Deprecated comment so others won’t use this extension point

Appreciate the discussion!

There is no time pressure in this (I hope I didn't indicate that this is urgent). As @nirrozenbaum said, the intention was to cleanup the PostCycle. But given we are having a lot questions, happy to postpone and evaluate other options.

Let me open an issue.

@nirrozenbaum
Copy link
Contributor

should we close this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants