Expand schduling.CycleState to the request lifecycle #1062

liu-cong · 2025-06-25T03:32:25Z

The CycleState was initially introduced as a flexible mechanism for scheduling plugins to share state. Currently the prefix plugin uses this to share the prompt chunk hashes between Scorer and PostCycle extension points.

We have since expanded the plugin framework beyond the scheduler. And we have decided to deprecate the PostCycle extension point in favor of the PreRequest. Expanding the CycleState to the entire request lifecycle allows plugins to share state beyond those extension points defined in scheduler.

I discussed this refactor with @nirrozenbaum in Slack.

Changes:

Moved cycle_state.go to pkg/epp/plugins/cycle_state.go
Added CycleState param to PreRequest
Updated references accordingly

This is a PURE REFACTOR. No change of logic.

@kfswain Pls also comment.

netlify · 2025-06-25T03:32:30Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`3f40e6a`
🔍 Latest deploy log	https://app.netlify.com/projects/gateway-api-inference-extension/deploys/685b6dcb19bce200083fdfe8
😎 Deploy Preview	https://deploy-preview-1062--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

k8s-ci-robot · 2025-06-25T03:32:32Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: liu-cong
Once this PR has been reviewed and has the lgtm label, please assign ahg-g for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

liu-cong · 2025-06-25T03:33:07Z

/assign @nirrozenbaum

/assign @kfswain

kfswain · 2025-06-25T15:06:01Z

I discussed this refactor with @nirrozenbaum in Slack.

This was a 3-way DM including myself, it wasn't really decided. The thought process was to emulate RequestContext which is a very large struct that had way too much scope. It's intentionally been contained to the Director and I would honestly like to decompose it further.

Additionally, in that DM I voiced these opinions to try to save time on this PR in favor of other efforts (such as deprecating the DecisionTreeFilter), which was ignored.

Expanding the CycleState to the entire request lifecycle allows plugins to share state beyond those extension points defined in scheduler.

I do not think this is a good idea. And I even have some questions about how we handle CycleState now. We allow for multiple profiles to be selected at one time and ran (not quite in parallel, but in the same Pick set) is it expected that data is shared between those cycles? Order isn't necessarily guaranteed, so we might want to deep copy the cycle state, and if so, how do we reconcile the differences and merge? Should we?

Regardless, my suggestion was to create an issue around this topic so that we could discuss this and design it into our system. And if a PR is needed, then we could cut it with intent. Otherwise we are just reactionary, which i dont think we need to be in a:

PURE REFACTOR. No change of logic.

k8s-ci-robot · 2025-06-25T15:06:10Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

liu-cong · 2025-06-25T17:41:59Z

I do not think this is a good idea. And I even have some questions about how we handle CycleState now. We allow for multiple profiles to be selected at one time and ran (not quite in parallel, but in the same Pick set) is it expected that data is shared between those cycles? Order isn't necessarily guaranteed, so we might want to deep copy the cycle state, and if so, how do we reconcile the differences and merge? Should we?

This seems to be questions on the existing CycleState management in general, which makes sense to me. Has this been discussed when we made the multi-profile change? Any context on that?

liu-cong · 2025-06-25T18:08:20Z

This was a 3-way DM including myself, it wasn't really decided.

Just want to clarify, didn't mean to say this was "decided". It was just a quick discussion initially between Nir and I and we think this approach works. And I'd like the continuing discussion to be in the open and show how this actually works in code so I put this PR.

I think you prefer discussing this in an issue, that sounds good as well.

The thought process was to emulate RequestContext which is a very large struct that had way too much scope. It's intentionally been contained to the Director and I would honestly like to decompose it further.

The CycleState was meant for plugins to communicate between different extension points without needing to add new interfaces. The way I see it, is a tradeoff between safety and flexibility, and should only be used when it's really necessary. I agree the RequestContext is growing too large and we should manage it properly. However I think the CycleState serves a different purpose though. I don't know a perfect answer but given it's used very sparsely, I have less concerns on it than the RequestContext.

Additionally, in that DM I voiced these opinions to try to save time on this PR in favor of other efforts (such as deprecating the DecisionTreeFilter), which was ignored.

Perhaps I missed this or this was in a different thread, I didn't see a mention of the DecisionTreeFilter deprecation, and I think they are unrelated if I am not mistaken.

ahg-g · 2025-06-25T19:20:13Z

Currently the prefix plugin uses this to share the prompt chunk hashes between Scorer and PostCycle extension points.

Is this really needed? why can't we do that in Scorer directly? Asking just to explore a path to not block the refactoring of prefix-aware scheduling on the decision about the scope of CycleState.

liu-cong · 2025-06-25T20:22:08Z

Is this really needed? why can't we do that in Scorer directly? Asking just to explore a path to not block the refactoring of prefix-aware scheduling on the decision about the scope of CycleState.

The state sharing today:

When a request comes in, we calculate its chunk hashes in the Scorer extension point, and score based the max prefix cache match.
We cache those chunk hashes in the CycleState
When we reach the PostCycle (to be replaced by the PreRequest), we update the LRU with those hashes and the pod we send the request to.

The state sharing is to avoid duplicate calculation of the chunk hashes. We can avoid state sharing with re-computation, but that will double the overhead.

ahg-g · 2025-06-26T03:13:19Z

Thanks @liu-cong

I am supportive of having CycleState to be passed across layers, I think of it as a request processing state.

We allow for multiple profiles to be selected at one time and ran (not quite in parallel, but in the same Pick set) is it expected that data is shared between those cycles? Order isn't necessarily guaranteed, so we might want to deep copy the cycle state, and if so, how do we reconcile the differences and merge? Should we?

I don't think we should deep copy. It is up to the extension point to correctly handle the state they create, but we need to clearly state the expectations on how each extension point may get called (that filter and score extensions may get invoked multiple times for the same request). A plugin could "namespace", using the profile name (which we should pass as an argument to extensions that run under a profile), its state if it doesn't wish to share the state across profile runs for example.

liu-cong · 2025-06-26T14:43:02Z

I am supportive of having CycleState to be passed across layers, I think of it as a request processing state.

I have the same view as well. But I'd like to confirm with @kfswain , do you have concerns on this PR itself, which expands the scope of the CycleState, or the CycleState management in generally, especially with multi-profiles?

If it's more about the CycleState management in general, can we move forward with this PR and discuss the CycleState management in a separate issue?

I do agree we need to clarify how the CycleState is managed with multi profiles, as @ahg-g pointed out. I will create an issue to discuss that, whether it's just a documentation or actual changes.

ahg-g · 2025-06-27T14:24:35Z

@kfswain @nirrozenbaum any objections to the above?

kfswain · 2025-06-27T14:58:10Z

Yes, I still think this is a bad idea. If we want an export field, we should add an export field, and think through the communication points.

Heavy coupling of our system via a monolithic object is counter to what the director is used for. The director and request control library is intended to pass only the data that each subsystem

I'm an advocate for moving quickly, but in this case it seems unnecessary.

Ive attached the relevant conversation points. This conversation, along with this quote

Perhaps I missed this or this was in a different thread, I didn't see a mention of the DecisionTreeFilter deprecation, and I think they are unrelated if I am not mistaken.
Source: #1062 (comment)

And the existence of this issue: #967

Leave me asking larger questions about the intent of this PR

ahg-g · 2025-06-27T15:17:18Z

What you are suggesting means a plugin should not implement extension points across layers, which is fair as a guiding principle for now, but it means we need to duplicate some extension points (the ones across layers, like PostCycle). This means we should keep PostCycle and remove the deprecation notice in https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/epp/scheduling/framework/plugins.go#L72

ahg-g · 2025-06-27T15:24:31Z

I would recommend changing the name as well (e.g., PostScheduling), and we may also need a PreScheduling extension point to allow computing state shared across profile runs.

kfswain · 2025-06-27T15:28:20Z

What you are suggesting means a plugin should not implement extension points across layers

Yes, and I'm happy to discuss the merits of this argument, but I currently see no need to discuss it under time pressure.

I agree with some sort of mechanism, within the scheduling system, to allow cleanup. Your suggested names: Post/Pre Scheduling, seem reasonable. To me, duplication is a worthy cost to reduce coupling. But again, happy to discuss. Perhaps there are more elegant approaches, we could investigate that also

ahg-g · 2025-06-27T15:28:48Z

What you are suggesting means a plugin should not implement extension points across layers, which is fair as a guiding principle for now, but it means we need to duplicate some extension points (the ones across layers, like PostCycle). This means we should keep PostCycle and remove the deprecation notice in https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/epp/scheduling/framework/plugins.go#L72

Actually, if we do this, how would we implement updating the prefix cache index at PostResponse?

kfswain · 2025-06-27T15:35:19Z

Actually, if we do this, how would we implement updating the prefix cache index at PostResponse?

We could make a simple subscription mechanism. A scheduling plugin would subscribe to a certain data key with no knowledge of how that key will arrive there, the subscription list becomes part of a Profile. The director will know to include those keys from the unstructured datastore, and include in the CycleState. That would actually cleanly decouple datacollection from any source, and should work well with @elevran 's pluggable Data Layer design.

kfswain · 2025-06-27T15:40:13Z

And if we are worried about validation, we can have the data layer plugins specify their emission field (we probably need to do this anyway to ensure that any Data Layer plugin isnt competing over the same key). Then on plugin registration, we validate that all requested subscriptions have an emission, and fail on startup if its untenable. Or we could forgo validation, but this seems an easy check and keeps the user safe.

nirrozenbaum · 2025-06-27T15:52:46Z

I share the same concern that having a cycle state across all layers opens the door for inappropriate use of it and going around the well defined interfaces between the layers.

I think a plugin should be able to implement extension points across layers, BUT the way the data is shared between the extension points should be through well defined interfaces and not through a general key value map.

ahg-g · 2025-06-27T15:52:55Z

I am not following every detail of the above proposal, but involving the data layer in request scoped state may be an overkill, CycleState is already the simple subscription mechanism for request scoped state.

In any case, happy to postpone the discussion for now, but @liu-cong how important is it to move updating the index to PostResponse?

ahg-g · 2025-06-27T16:00:57Z

I think a plugin should be able to implement extension points across layers, BUT the way the data is shared between the extension points should be through well defined interfaces and not through a general key value map.

I am not sure what well defined interfaces mean in this context; handling request scoped state known only to the plugin itself, and only the said plugin generates and consumes means we need to offer a mechanism for abstract state sharing across layers. We can come up with complex architectures to handle this, but that seems unnecessary...

nirrozenbaum · 2025-06-27T16:03:38Z

I think the key point was mentioned by @kfswain:

but I currently see no need to discuss it under time pressure.

I suggest we take few days to try and come up with a proposal that everyone are happy with (or at least ok with) 🙂

kfswain · 2025-06-27T16:05:02Z

CycleState should not be populated with all data in the datastore. So in order for the request control lib to know what to populate it with, we have to be explicit about our subscriptions.

I actually like the subscription model, it let's us reintroduce type safety.

Data Plugins A,B, & C register with their emission key(s) & their associated types. Subscriptions (from any layer honestly) specify the key they are expecting, and the type, and we can validate that all on startup, so there should be no runtime surprises. That also lets data plugins be reused between many different plugins if needed

nirrozenbaum · 2025-06-27T16:05:28Z

I think we can live with keeping prefix use PostCycle until we design a better solution, while keeping the Deprecated comment so others won’t use this extension point

ahg-g · 2025-06-27T16:15:46Z

I will reserve further comments until we have a concrete proposal, I agree we can postpone that for later, I wanted to understand if there is an obvious simple alternative to CycleState for passing request scoped state between the extension points of a plugin...

kfswain · 2025-06-27T17:06:42Z

until we have a concrete proposal

SG, I'll take that AI

liu-cong · 2025-06-27T19:14:03Z

In any case, happy to postpone the discussion for now, but @liu-cong how important is it to move updating the index to PostResponse?
I think we can live with keeping prefix use PostCycle until we design a better solution, while keeping the Deprecated comment so others won’t use this extension point

Appreciate the discussion!

There is no time pressure in this (I hope I didn't indicate that this is urgent). As @nirrozenbaum said, the intention was to cleanup the PostCycle. But given we are having a lot questions, happy to postpone and evaluate other options.

Let me open an issue.

nirrozenbaum · 2025-06-29T07:45:53Z

should we close this PR?

Expand schduling.CycleState to the request lifecycle

3f40e6a

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 25, 2025

k8s-ci-robot requested review from danehans and robscott June 25, 2025 03:32

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 25, 2025

k8s-ci-robot assigned kfswain and nirrozenbaum Jun 25, 2025

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 25, 2025

liu-cong mentioned this pull request Jun 27, 2025

Mechanism for plugins to share state between extension points #1084

Open

Expand schduling.CycleState to the request lifecycle #1062

Are you sure you want to change the base?

Expand schduling.CycleState to the request lifecycle #1062

Uh oh!

Conversation

liu-cong commented Jun 25, 2025

Uh oh!

netlify bot commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for gateway-api-inference-extension ready!

Uh oh!

k8s-ci-robot commented Jun 25, 2025

Uh oh!

liu-cong commented Jun 25, 2025

Uh oh!

kfswain commented Jun 25, 2025

Uh oh!

k8s-ci-robot commented Jun 25, 2025

Uh oh!

liu-cong commented Jun 25, 2025

Uh oh!

liu-cong commented Jun 25, 2025

Uh oh!

ahg-g commented Jun 25, 2025

Uh oh!

liu-cong commented Jun 25, 2025

Uh oh!

ahg-g commented Jun 26, 2025

Uh oh!

liu-cong commented Jun 26, 2025

Uh oh!

ahg-g commented Jun 27, 2025

Uh oh!

kfswain commented Jun 27, 2025

Uh oh!

ahg-g commented Jun 27, 2025

Uh oh!

ahg-g commented Jun 27, 2025

Uh oh!

kfswain commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahg-g commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kfswain commented Jun 27, 2025

Uh oh!

kfswain commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nirrozenbaum commented Jun 27, 2025

Uh oh!

ahg-g commented Jun 27, 2025

Uh oh!

ahg-g commented Jun 27, 2025

Uh oh!

nirrozenbaum commented Jun 27, 2025

Uh oh!

kfswain commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nirrozenbaum commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahg-g commented Jun 27, 2025

Uh oh!

kfswain commented Jun 27, 2025

Uh oh!

liu-cong commented Jun 27, 2025

Uh oh!

nirrozenbaum commented Jun 29, 2025

Uh oh!

Uh oh!

netlify bot commented Jun 25, 2025 •

edited

Loading

kfswain commented Jun 27, 2025 •

edited

Loading

ahg-g commented Jun 27, 2025 •

edited

Loading

kfswain commented Jun 27, 2025 •

edited

Loading

kfswain commented Jun 27, 2025 •

edited

Loading

nirrozenbaum commented Jun 27, 2025 •

edited

Loading