-
Notifications
You must be signed in to change notification settings - Fork 118
Update lora affinity to be a scorer. #1121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: rlakhtakia The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @rlakhtakia. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
||
// LoraAffinityScorer scores list of candidate pods based on KV cache utilization. | ||
type LoraAffinityScorer struct { | ||
name plugins.TypedName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typedNamed
if active { | ||
scores[pod] = 1 | ||
} else if len(pod.GetMetrics().ActiveModels)+len(pod.GetMetrics().WaitingModels) < pod.GetMetrics().MaxActiveModels { | ||
scores[pod] = 0.8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how did you select these numbers? is it based on some tests? performance comparison? intuition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1; how does this compare to the approach we have in the decision tree plugin.
@kaushikmitr pls take a look as well
name string | ||
request *types.LLMRequest | ||
pods []types.Pod | ||
expectedScoresPod map[int]float64 // Map of pod index to expected score |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use a map[NamespacedName]float64 (map from pod identifier to expected score)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good suggestion. I wonder if map[string]float64
might be better. We can easily get the pod's name from the target pod directly, which would avoid needing to declare NamespacedName repeatedly in the test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Score function we get back map[Pod]float64, where the Pod representation includes NamespacedName, Address and Labels.
I assume we can either use NamespacedName.String() or Address. both will do the work.
/ok-to-test |
cc @mayabar |
This is great, this will help us move off of the decision tree plugin |
In the current Lora affinity algorithm we have loraaffinitythreshold = 0.99. We use this to prefer a pod that already has the Lora loaded over a pod that does not have the Lora loaded but has room with 99% probability. This ensures that we maintain Lora affinity while spreading it out at high Lora traffic. I am guessing scoring can conceptually achieve that but it seems we are always picking the highest score. |
7a8ba08
to
f8f767d
Compare
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
var _ framework.Scorer = &LoraAffinityScorer{} | ||
|
||
// LoraAffinityScorerFactory defines the factory function for LoraAffinityScorer. | ||
func LoraAffinityScorerFactory(name string, _ json.RawMessage, _ plugins.Handle) (plugins.Plugin, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: perhaps allow setting DefaultLoraAffinityScorerWeight via configuration parameters in the config file?
We're trying to avoid using environment variables (e.g., added in runner.go).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are planning to have GIE default weight for scorers. if no weight specified in the config the weight will default to 1 automatically by the config api code.
this will be pushed in a separate, not in scope of this PR of course.
Do you have some benchmark numbers to share? |
#967
Changes: Added new scorer for v2 scheduler.