Skip to content

Conversation

@Thenujan-Nagaratnam
Copy link
Contributor

@Thenujan-Nagaratnam Thenujan-Nagaratnam commented Jan 9, 2026

  • Model round robin
  • Model weighted round robin

Summary by CodeRabbit

  • New Features

    • Added Model Round Robin load balancing policy: distributes requests cyclically across configured LLM models with automatic suspension and recovery of failed models.
    • Added Model Weighted Round Robin policy: enables weighted distribution across models based on assigned weights, with configurable failure suspension.
    • Both policies support flexible model extraction from request payload, headers, query parameters, or path.
  • Documentation

    • Comprehensive guides for both new load balancing policies covering configuration, deployment, testing, and use cases.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 9, 2026

Walkthrough

This pull request introduces two new load-balancing policies for LLM models: Model Round-Robin and Model Weighted Round-Robin. The implementation includes comprehensive documentation, YAML policy definitions, Go implementations with request/response handling and model suspension logic, and updates to the gateway controller transformer to embed template parameters into policy configurations.

Changes

Cohort / File(s) Summary
Documentation
docs/ai-gateway/llm/load-balancing/model-round-robin.md, docs/ai-gateway/llm/load-balancing/model-weighted-round-robin.md
Complete reference guides for two load-balancing policies, including configuration schemas, operational workflows, request model extraction methods (payload, header, queryParam, pathParam), suspension behavior, and use cases.
Policy Definitions
gateway/gateway-controller/default-policies/model-round-robin-v0.1.0.yaml, gateway/gateway-controller/default-policies/model-weighted-round-robin-v0.1.0.yaml
YAML schema definitions for both policies specifying models array (with required model name and optional endpoint), suspendDuration configuration, and validation constraints (HTTP/HTTPS URL patterns, minimum values).
Model Round-Robin Implementation
gateway/policies/model-round-robin/v0.1.0/go.mod, gateway/policies/model-round-robin/v0.1.0/policy-definition.yaml, gateway/policies/model-round-robin/v0.1.0/roundrobin.go
Full round-robin policy implementation with cyclic model selection, failure detection (5xx/429 status codes), configurable suspension duration, request modification (payload/header/query/path), and thread-safe model state management.
Model Weighted Round-Robin Implementation
gateway/policies/model-weighted-round-robin/v0.1.0/go.mod, gateway/policies/model-weighted-round-robin/v0.1.0/policy-definition.yaml, gateway/policies/model-weighted-round-robin/v0.1.0/weightedroundrobin.go
Weighted distribution policy with pre-computed weighted sequence, proportional model selection based on weights, suspension tracking, extraction from multiple request locations (JSONPath, regex path patterns), and request routing modification.
Gateway Transformer Updates
gateway/gateway-controller/pkg/utils/llm_transformer.go
Enhanced to extract and merge template-derived parameters (requestModel, responseModel, token counts) from LLM provider templates into policy configurations during proxy and provider transformation.
Policy Manifest
gateway/policies/policy-manifest-lock.yaml
Registry entry additions for model-round-robin and model-weighted-round-robin policies with version v0.1.0 and file paths.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Gateway as Gateway<br/>(Policy)
    participant ModelSelector as Model Selection<br/>Logic
    participant RequestModifier as Request<br/>Modifier
    participant Models as LLM Models

    Client->>Gateway: HTTP Request
    Gateway->>ModelSelector: Identify model from request<br/>(payload/header/query/path)
    ModelSelector->>ModelSelector: Extract model identifier<br/>using configured location
    ModelSelector->>ModelSelector: Select next available model<br/>(round-robin or weighted)
    ModelSelector-->>Gateway: Selected model
    Gateway->>RequestModifier: Route to selected model
    RequestModifier->>RequestModifier: Modify request<br/>(update payload/header/query/path)
    RequestModifier-->>Gateway: Modified request
    Gateway->>Models: Forward to selected model
    Models-->>Gateway: Response (status, body)
    Gateway->>Gateway: Check response status<br/>(>= 500 or 429?)
    alt Failure Response
        Gateway->>ModelSelector: Suspend failed model<br/>for configured duration
        ModelSelector->>ModelSelector: Mark model unavailable
    end
    Gateway-->>Client: Return response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐰 Round and round the models go,

Weighted paths let traffic flow,

When one stumbles, rest take load,

Hopping smooth along the road!

Balance, surge, and suspend with grace—

Load-balancing wins the race! 🏁

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete and lacks critical sections required by the template such as Purpose, Goals, Approach, User stories, Documentation, Automation tests, Security checks, Samples, Related PRs, and Test environment. Complete the PR description using the template format with detailed information about the purpose, goals, approach, documentation links, test coverage, security verification, and test environment details.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: adding model-based load balancing policies (round robin and weighted round robin).
Docstring Coverage ✅ Passed Docstring coverage is 81.82% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In @docs/ai-gateway/llm/load-balancing/model-weighted-round-robin.md:
- Line 32: Fix the typo in the documentation table: replace "atleast 1" with "at
least 1" in the `weight` description (the table cell that reads "Must be atleast
1."). Update the sentence so it reads "Must be at least 1." ensuring spacing and
capitalization match the surrounding style.

In @gateway/policies/model-round-robin/v0.1.0/roundrobin.go:
- Around line 458-489: modifyModelInPayload currently swallows JSON
unmarshal/set/marshal errors and returns empty modifications while
MetadataKeySelectedModel may already be set, causing inconsistency; change
modifyModelInPayload to return an explicit failure RequestAction when any of the
JSON operations fail (instead of policy.UpstreamRequestModifications{}), e.g.,
construct and return a failure response with a 500 status and explanatory
message so the request is aborted/handled upstream; reference the
modifyModelInPayload function and ensure downstream code (e.g., OnResponse logic
that checks MetadataKeySelectedModel) can rely on the modification either
succeeding or the request having been failed.

In @gateway/policies/model-weighted-round-robin/v0.1.0/weightedroundrobin.go:
- Line 12: Fix the typo in the license header of weightedroundrobin.go by
inserting the missing words "express or" so the sentence reads "WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied."; update the
header text in the file's top comment block (same block containing "WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either implied.") to match the correct
phrasing used in the round-robin policy.
🧹 Nitpick comments (10)
gateway/gateway-controller/default-policies/model-round-robin-v0.1.0.yaml (2)

21-24: Consider strengthening the endpoint URL validation pattern.

The current pattern '^https?://' only validates the URL protocol prefix, potentially allowing malformed URLs like http:// or https://invalid. Consider using a more comprehensive URL validation pattern or relying on runtime validation.

♻️ Suggested enhancement
           endpoint:
             type: string
             description: The endpoint URL for the model (optional)
-            pattern: '^https?://'
+            pattern: '^https?://[a-zA-Z0-9]([a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(\.[a-zA-Z0-9]([a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*'

Alternatively, document that full URL validation occurs at runtime.


28-34: Consider if suspendDuration=0 is the right default for production.

The default of 0 means failed model invocations won't be tracked or suspended, potentially causing repeated failures to the same model. For production environments, a non-zero default (e.g., 60 or 300 seconds) might provide better resilience.

gateway/gateway-controller/default-policies/model-weighted-round-robin-v0.1.0.yaml (3)

22-25: Consider strengthening the endpoint URL validation pattern.

Similar to model-round-robin, the pattern '^https?://' only validates the URL protocol prefix. Consider using more comprehensive URL validation or document that runtime validation handles malformed URLs.


26-32: Consider allowing weight=0 for temporarily disabling models.

The minimum weight of 1 prevents temporarily disabling a model without removing it from the configuration. Setting minimum: 0 would allow operators to set weight to 0 for graceful model removal from the rotation while keeping it in the config for future re-enablement.

♻️ Suggested change
           weight:
             type: integer
             description: |
               The weight assigned to this model for distribution.
               Higher weights mean more requests will be routed to this model.
               Weight is relative to total weight of all models.
+              A weight of 0 effectively disables the model.
-            minimum: 1
+            minimum: 0

37-43: Consider if suspendDuration=0 is the right default for production.

Same concern as in model-round-robin: the default of 0 means failed model invocations won't be suspended, potentially causing repeated failures. Consider a non-zero default for better resilience.

gateway/gateway-controller/pkg/utils/llm_transformer.go (1)

596-606: Add documentation and consider validation for parameter merging.

The mergeParams function merges pathEntry.Params (user-defined policy parameters) with templateParams (system-level extraction configuration), with template parameters taking precedence on key conflicts.

This merge order is intentional and correct—a parallel mergeParams implementation in gateway/policy-engine/internal/registry/registry.go documents the same semantics: "Runtime params override init params when keys conflict." Since template parameters define required system behavior (e.g., how to extract requestModel, responseModel, token counts from responses), they should not be overridable by user-defined policy parameters.

In current usage, parameter sets don't overlap: template parameters (requestModel, responseModel, promptTokens, etc.) are distinct from user policy parameters (embeddingService, models, providers, similarity-threshold, etc.). However, for robustness, consider:

  1. Adding a comment explaining that template params intentionally override user params
  2. Adding validation to log or error if parameter name conflicts are detected, preventing silent overwrites
gateway/policies/model-round-robin/v0.1.0/roundrobin.go (1)

404-436: Consider caching compiled regex patterns.

The regex pattern is compiled on every request (regexp.Compile(regexPattern)). Since the pattern comes from policy configuration and doesn't change per request, consider compiling and caching it during GetPolicy initialization to avoid repeated compilation overhead.

♻️ Suggested approach

Cache the compiled regex in ModelRoundRobinPolicy when requestModel.location is "pathParam":

 type ModelRoundRobinPolicy struct {
 	currentIndex    int
 	mu              sync.Mutex
 	suspendedModels map[string]time.Time
 	params          ModelRoundRobinPolicyParams
+	compiledPathRegex *regexp.Regexp // Cached regex for pathParam extraction
 }

Then compile it once in GetPolicy and reuse it in extractModelFromPath and modifyModelInPathParam.

gateway/policies/model-weighted-round-robin/v0.1.0/weightedroundrobin.go (3)

107-252: Significant code duplication with round-robin policy.

The parseParams function shares substantial logic with gateway/policies/model-round-robin/v0.1.0/roundrobin.go, including validation of models, suspendDuration, and requestModel. Consider extracting the common parsing logic into a shared utility package to reduce maintenance burden and ensure consistency.


335-371: Consider removing unused sequenceMu or documenting its purpose.

The sequenceMu RWMutex is declared but only used with RLock/RUnlock in selectNextAvailableWeightedModel. Since weightedSequence is set once during GetPolicy initialization and never modified afterward (no Lock() calls exist), the mutex appears unnecessary. Either:

  1. Remove sequenceMu if weightedSequence is truly immutable after initialization, or
  2. Document why it exists (e.g., for future dynamic reconfiguration support)

373-401: Consider adding a maximum weight validation or alternative weighting algorithm to prevent excessive sequence memory usage.

The current implementation pre-computes a weighted sequence by repeating each model weight times. Without a maximum weight constraint (currently only weight >= 1 is enforced), very large weights could create large sequences in memory. For example, 10 models with weight: 1000 each would create a sequence with ~10,000 pointer entries.

Consider one of these approaches:

  • Add a maximum weight validation in parseParams (simplest)
  • Normalize weights by their GCD to reduce sequence size while preserving proportions
  • Use Smooth Weighted Round-Robin (SWRR): maintains dynamic current_weight per model; on each selection, increment by static weight, pick max, decrement by total. Produces smooth distribution without pre-computation but adds O(n) per-selection overhead.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f082675 and d3704ba.

⛔ Files ignored due to path filters (2)
  • gateway/policies/model-round-robin/v0.1.0/go.sum is excluded by !**/*.sum
  • gateway/policies/model-weighted-round-robin/v0.1.0/go.sum is excluded by !**/*.sum
📒 Files selected for processing (12)
  • docs/ai-gateway/llm/load-balancing/model-round-robin.md
  • docs/ai-gateway/llm/load-balancing/model-weighted-round-robin.md
  • gateway/gateway-controller/default-policies/model-round-robin-v0.1.0.yaml
  • gateway/gateway-controller/default-policies/model-weighted-round-robin-v0.1.0.yaml
  • gateway/gateway-controller/pkg/utils/llm_transformer.go
  • gateway/policies/model-round-robin/v0.1.0/go.mod
  • gateway/policies/model-round-robin/v0.1.0/policy-definition.yaml
  • gateway/policies/model-round-robin/v0.1.0/roundrobin.go
  • gateway/policies/model-weighted-round-robin/v0.1.0/go.mod
  • gateway/policies/model-weighted-round-robin/v0.1.0/policy-definition.yaml
  • gateway/policies/model-weighted-round-robin/v0.1.0/weightedroundrobin.go
  • gateway/policies/policy-manifest-lock.yaml
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-12-18T08:49:08.028Z
Learnt from: RakhithaRR
Repo: wso2/api-platform PR: 504
File: gateway/policies/mcp-auth/v0.1.0/policy-definition.yaml:118-191
Timestamp: 2025-12-18T08:49:08.028Z
Learning: Ensure the policy engine resolves configuration paths in a case-insensitive manner when matching keys (e.g., jwtauth_v010 should match JWTAuth_v010 regardless of case). In reviews, look for references to path resolution logic and verify it normalizes keys (e.g., to lower or upper case) before comparison. Add tests to cover mixed-case Helm values and config keys and document this behavior in the policy engine's docs.

Applied to files:

  • gateway/policies/policy-manifest-lock.yaml
  • gateway/policies/model-round-robin/v0.1.0/policy-definition.yaml
  • gateway/policies/model-weighted-round-robin/v0.1.0/policy-definition.yaml
📚 Learning: 2025-12-19T07:14:39.045Z
Learnt from: nimsara66
Repo: wso2/api-platform PR: 521
File: gateway/gateway-controller/pkg/utils/llm_transformer.go:229-243
Timestamp: 2025-12-19T07:14:39.045Z
Learning: In the gateway LLM configuration flow, validation is performed in pkg/config/llm_validator.go before transformation in pkg/utils/llm_transformer.go, ensuring that required fields like upstream.Auth.Header and upstream.Auth.Value are non-nil before the transformer accesses them.

Applied to files:

  • gateway/gateway-controller/pkg/utils/llm_transformer.go
📚 Learning: 2025-12-06T11:57:08.456Z
Learnt from: BLasan
Repo: wso2/api-platform PR: 140
File: gateway/gateway-controller/pkg/utils/api_deployment.go:88-105
Timestamp: 2025-12-06T11:57:08.456Z
Learning: In the gateway-controller Go codebase (file: gateway/gateway-controller/pkg/utils/api_deployment.go), only the async/websub API kind is currently supported. async/websocket and async/sse API kinds are not yet implemented.

Applied to files:

  • gateway/gateway-controller/pkg/utils/llm_transformer.go
🧬 Code graph analysis (1)
gateway/policies/model-weighted-round-robin/v0.1.0/weightedroundrobin.go (4)
gateway/policies/model-round-robin/v0.1.0/roundrobin.go (1)
  • RequestModelConfig (54-57)
sdk/gateway/policy/v1alpha/context.go (2)
  • RequestContext (56-70)
  • Body (4-16)
sdk/gateway/policy/v1alpha/action.go (3)
  • RequestAction (4-7)
  • ImmediateResponse (32-37)
  • UpstreamRequestModifications (16-24)
sdk/utils/jsonpath.go (2)
  • ExtractStringValueFromJsonpath (14-37)
  • SetValueAtJSONPath (114-212)
🪛 Gitleaks (8.30.0)
docs/ai-gateway/llm/load-balancing/model-round-robin.md

[high] 57-59: Discovered a potential authorization token provided in a curl command header, which could compromise the curl accessed resource.

(curl-auth-header)

docs/ai-gateway/llm/load-balancing/model-weighted-round-robin.md

[high] 72-74: Discovered a potential authorization token provided in a curl command header, which could compromise the curl accessed resource.

(curl-auth-header)

🪛 LanguageTool
docs/ai-gateway/llm/load-balancing/model-round-robin.md

[style] ~46-~46: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ured) and stores it for reference. 3. Model Modification: The policy modifies the...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

docs/ai-gateway/llm/load-balancing/model-weighted-round-robin.md

[grammar] ~32-~32: Ensure spelling is correct
Context: ... to total weight of all models. Must be atleast 1. | | endpoint | string | No | The e...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[style] ~49-~49: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ation and stores it for reference. 4. Model Modification: The policy modifies the...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🔇 Additional comments (32)
gateway/policies/model-weighted-round-robin/v0.1.0/go.mod (1)

1-5: LGTM! Consistent with model-round-robin module.

The module declaration is consistent with the model-round-robin policy, using the same Go version and SDK dependency.

gateway/policies/policy-manifest-lock.yaml (1)

84-90: LGTM! Policy manifest entries are correctly formatted.

The new policy entries for model-round-robin and model-weighted-round-robin follow the established pattern and are consistent with existing policies in the manifest.

gateway/policies/model-round-robin/v0.1.0/go.mod (1)

3-5: Go version 1.23.0 and SDK v0.3.0 are verified as valid and compatible.

Go 1.23.0 was released on August 13, 2024 as a stable version. SDK v0.3.0 is the internal wso2/api-platform/sdk module version used consistently across 20+ policy modules in the repository, with successful dependency resolution confirmed by the go.sum file containing the resolved hash for v0.3.0.

gateway/policies/model-round-robin/v0.1.0/policy-definition.yaml (1)

1-40: LGTM! Well-structured policy definition.

The schema validation rules are appropriate:

  • Required models array with at least one entry ensures valid configuration
  • URL pattern validation for optional endpoint field
  • suspendDuration with sensible default (0) and minimum constraint
  • Clear descriptions for all parameters
gateway/policies/model-weighted-round-robin/v0.1.0/policy-definition.yaml (1)

1-49: LGTM! Schema is consistent with round-robin variant.

The weighted round-robin schema appropriately extends the base round-robin policy:

  • Required weight field with minimum value of 1 ensures valid weight distribution
  • Consistent validation rules across both policy variants
  • Clear parameter descriptions
docs/ai-gateway/llm/load-balancing/model-round-robin.md (1)

1-210: LGTM! Comprehensive and well-structured documentation.

The documentation thoroughly covers:

  • Policy features and capabilities
  • Configuration parameters with clear tables
  • Multiple practical examples
  • Model suspension behavior
  • Various use cases
  • Request model extraction from different locations

The static analysis hints about the auth token and sentence structure are false positives in this context - the <openai-apikey> is an appropriate placeholder, and the technical writing style is clear.

gateway/gateway-controller/pkg/utils/llm_transformer.go (4)

50-64: LGTM! Proper error handling for template retrieval.

The template retrieval logic includes:

  • Type assertion validation with both ok and nil checks
  • Comprehensive error wrapping with context
  • Proper propagation of errors from nested calls

209-219: LGTM! Consistent template retrieval pattern.

The implementation follows the same pattern as transformProxy, ensuring consistency across both transform paths.


535-594: LGTM! Clean parameter extraction logic.

The buildTemplateParams function:

  • Properly validates nil template input
  • Consistently extracts location and identifier for each template parameter
  • Gracefully handles optional fields by checking for nil
  • Returns empty map when no template parameters are configured, which is appropriate

165-165: LGTM! Consistent parameter merging across all policy attachment points.

The merged parameters are consistently applied at all three policy attachment locations:

  • Line 165: transformProxy
  • Line 373: transformProvider (AllowAll mode)
  • Line 471: transformProvider (DenyAll mode)

This ensures that template parameters are available to policies regardless of the access control mode or whether it's a provider or proxy configuration.

Also applies to: 373-373, 471-471

docs/ai-gateway/llm/load-balancing/model-weighted-round-robin.md (1)

1-224: LGTM! Excellent documentation with clear weight distribution explanations.

The documentation provides:

  • Clear explanation of weighted distribution mechanics
  • Concrete examples showing weight calculations (lines 53-63, 206-213)
  • Comprehensive parameter tables
  • Practical use cases for weighted balancing
  • Detailed model suspension behavior

The weight distribution examples particularly help users understand how the policy works.

gateway/policies/model-round-robin/v0.1.0/roundrobin.go (15)

1-65: LGTM!

The license header, imports, constants, and type definitions are well-structured. The use of sync.Mutex for protecting shared state (currentIndex and suspendedModels) is appropriate for concurrent access.


67-84: LGTM!

The GetPolicy factory function correctly parses parameters and initializes the policy with proper default values.


86-214: LGTM!

The parameter parsing is thorough with clear error messages. The validation covers required fields, type checks, and value constraints appropriately.


216-231: LGTM!

The extractInt function correctly handles the various numeric types that may come from JSON unmarshaling, including the check for fractional parts in float64 values.


233-241: LGTM!

The processing mode configuration is appropriate for a policy that needs to read and potentially modify both request headers and body content.


243-272: LGTM!

The request handling flow is well-structured: extract original model, select next available model via round-robin, and modify the request. The 503 response when all models are unavailable is appropriate.


274-295: LGTM!

The response handling correctly identifies error conditions (5xx and 429) and implements thread-safe model suspension with proper mutex locking.


297-329: LGTM!

The round-robin selection with suspension handling is correctly implemented. The lock protects concurrent access, and expired suspensions are properly cleaned up.


331-355: LGTM!

Clean dispatch pattern for extracting the model from different request locations.


357-369: LGTM!

Proper null checks before attempting JSON path extraction.


371-402: LGTM!

The query parameter extraction properly handles URL decoding and query string parsing.


438-456: LGTM!

Clean dispatch pattern mirroring the extraction logic.


491-498: LGTM!

Simple and correct header modification.


500-549: LGTM!

The query parameter modification correctly handles both existing and new query strings, and properly uses the :path pseudo-header for Envoy compatibility.


551-616: LGTM!

The path parameter modification correctly uses regex capture group indices for precise replacement while preserving the query string. The same regex caching suggestion from the extraction function applies here.

gateway/policies/model-weighted-round-robin/v0.1.0/weightedroundrobin.go (6)

40-75: LGTM!

The type definitions appropriately extend the round-robin concept with weight support. Using a separate RWMutex for the weighted sequence is a good design choice for reducing lock contention.


77-105: LGTM!

The factory function correctly initializes the weighted sequence during policy creation.


254-279: LGTM!

The extractInt and Mode implementations are correct. These are also candidates for the shared utility extraction mentioned earlier.


281-333: LGTM!

The request/response handling follows the same pattern as the round-robin policy, correctly adapted for weighted selection.


403-509: LGTM with prior notes.

The extraction methods are correct. The same regex caching suggestion from the round-robin review applies to extractModelFromPath here. These methods are also candidates for shared utility extraction given the duplication.


511-689: LGTM with prior notes.

The modification methods mirror the round-robin implementation. The same observation about silent failure in modifyModelInPayload applies here as well.

| Property | Type | Required | Description |
|----------|------|----------|-------------|
| `model` | string | Yes | The AI model name to use for load balancing. |
| `weight` | integer | Yes | The weight assigned to this model for distribution. Higher weights mean more requests will be routed to this model. Weight is relative to total weight of all models. Must be atleast 1. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix the typo: "atleast" should be "at least".

📝 Proposed fix
-| `weight` | integer | Yes | The weight assigned to this model for distribution. Higher weights mean more requests will be routed to this model. Weight is relative to total weight of all models. Must be atleast 1. |
+| `weight` | integer | Yes | The weight assigned to this model for distribution. Higher weights mean more requests will be routed to this model. Weight is relative to total weight of all models. Must be at least 1. |
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| `weight` | integer | Yes | The weight assigned to this model for distribution. Higher weights mean more requests will be routed to this model. Weight is relative to total weight of all models. Must be atleast 1. |
| `weight` | integer | Yes | The weight assigned to this model for distribution. Higher weights mean more requests will be routed to this model. Weight is relative to total weight of all models. Must be at least 1. |
🧰 Tools
🪛 LanguageTool

[grammar] ~32-~32: Ensure spelling is correct
Context: ... to total weight of all models. Must be atleast 1. | | endpoint | string | No | The e...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🤖 Prompt for AI Agents
In @docs/ai-gateway/llm/load-balancing/model-weighted-round-robin.md at line 32,
Fix the typo in the documentation table: replace "atleast 1" with "at least 1"
in the `weight` description (the table cell that reads "Must be atleast 1.").
Update the sentence so it reads "Must be at least 1." ensuring spacing and
capitalization match the surrounding style.

Comment on lines +458 to +489
// modifyModelInPayload modifies the model in request body using JSONPath
func (p *ModelRoundRobinPolicy) modifyModelInPayload(ctx *policy.RequestContext, newModel string, jsonPath string) policy.RequestAction {
if ctx.Body == nil || ctx.Body.Content == nil {
return policy.UpstreamRequestModifications{}
}

// Parse request body
var payloadData map[string]interface{}
if err := json.Unmarshal(ctx.Body.Content, &payloadData); err != nil {
slog.Debug("ModelRoundRobin: Error unmarshaling request body", "error", err)
return policy.UpstreamRequestModifications{}
}

// Update model in payload
if err := utils.SetValueAtJSONPath(payloadData, jsonPath, newModel); err != nil {
slog.Debug("ModelRoundRobin: Error setting model in request body", "jsonPath", jsonPath, "error", err)
return policy.UpstreamRequestModifications{}
}

// Marshal back to JSON
updatedPayload, err := json.Marshal(payloadData)
if err != nil {
slog.Debug("ModelRoundRobin: Error marshaling updated request body", "error", err)
return policy.UpstreamRequestModifications{}
}

slog.Debug("ModelRoundRobin: Modified request model in payload", "originalModel", ctx.Metadata[MetadataKeyOriginalModel], "newModel", newModel, "jsonPath", jsonPath)

return policy.UpstreamRequestModifications{
Body: updatedPayload,
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Silent failure may cause unexpected behavior.

When JSON unmarshaling, path setting, or marshaling fails, the function logs at debug level and returns empty modifications. This means the request proceeds with the original model unchanged, but the policy has already recorded a different model in metadata (MetadataKeySelectedModel). This mismatch could cause issues in OnResponse where the wrong model might get suspended.

Consider returning an error response (e.g., 500) when the modification fails to maintain consistency between the selected model and what's actually sent upstream.

🤖 Prompt for AI Agents
In @gateway/policies/model-round-robin/v0.1.0/roundrobin.go around lines 458 -
489, modifyModelInPayload currently swallows JSON unmarshal/set/marshal errors
and returns empty modifications while MetadataKeySelectedModel may already be
set, causing inconsistency; change modifyModelInPayload to return an explicit
failure RequestAction when any of the JSON operations fail (instead of
policy.UpstreamRequestModifications{}), e.g., construct and return a failure
response with a 500 status and explanatory message so the request is
aborted/handled upstream; reference the modifyModelInPayload function and ensure
downstream code (e.g., OnResponse logic that checks MetadataKeySelectedModel)
can rely on the modification either succeeding or the request having been
failed.

*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either implied.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Typo in license header.

The license text is missing "express or" in this line. Compare with the round-robin policy's license which correctly states: "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied."

📝 Proposed fix
- *  distributed under the License is distributed on an "AS IS" BASIS,
- *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either implied.
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In @gateway/policies/model-weighted-round-robin/v0.1.0/weightedroundrobin.go at
line 12, Fix the typo in the license header of weightedroundrobin.go by
inserting the missing words "express or" so the sentence reads "WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied."; update the
header text in the file's top comment block (same block containing "WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either implied.") to match the correct
phrasing used in the round-robin policy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant