feat: add vllm rerank support with pricing and logging integration#1723
feat: add vllm rerank support with pricing and logging integration#1723
Conversation
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds end-to-end rerank support for vLLM: request/response converters and validation, provider call with /v1/rerank → /rerank fallback, usage parsing, pricing/logging/UI and DB migration, tests, and documentation updates. Changes
Sequence DiagramsequenceDiagram
participant Client
participant Gateway
participant Converter
participant vLLM
Client->>Gateway: BifrostRerankRequest
Gateway->>Converter: ToVLLMRerankRequest(bifrostReq)
Converter-->>Gateway: vLLMRerankRequest (JSON)
Gateway->>vLLM: POST /v1/rerank (body)
alt 2xx
vLLM-->>Gateway: 2xx payload
else 404/405/501
Gateway->>vLLM: POST /rerank (retry)
vLLM-->>Gateway: 2xx payload
end
Gateway->>Converter: ToBifrostRerankResponse(payload, docs, returnDocs)
Converter-->>Gateway: BifrostRerankResponse (usage, results)
Gateway->>Gateway: enrich ExtraFields (provider, model, latency, raw data)
Gateway-->>Client: Enriched BifrostRerankResponse
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🧪 Test Suite AvailableThis PR can be tested by a repository admin. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
docs/providers/supported-providers/overview.mdx (1)
18-59: Optional: Add a "Rerank" column to the support matrix for consistency.The table already has an
Embeddingscolumn to indicate per-provider support. Rerank has the same partial-coverage pattern (Cohere ✅, Bedrock ✅, Vertex AI ✅, vLLM ✅, others ❌) but is only described in the footnote. Adding aRerankcolumn would let users discover support at a glance, consistent with how all other operations are documented.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/providers/supported-providers/overview.mdx` around lines 18 - 59, Add a "Rerank" column to the provider support matrix: update the header row to include "Rerank", add a corresponding cell for each provider row (set ✅ for Cohere, Bedrock, Vertex AI, and vLLM; set ❌ for all other providers), and update the legend/footnote text and the Notes section (where reranking is mentioned) to reference the new "Rerank" column so the table and explanatory text stay consistent.core/providers/vllm/rerank_test.go (1)
16-49: Consider usingbifrost.Ptr()for pointer creation.Lines 17-19 use the address operator (
&) to create pointer values. The codebase convention prefersbifrost.Ptr()for consistency.♻️ Suggested refactor
func TestRerankToVLLMRerankRequest(t *testing.T) { - topN := 2 - maxTokens := 128 - priority := 5 - req := ToVLLMRerankRequest(&schemas.BifrostRerankRequest{ Model: "BAAI/bge-reranker-v2-m3", Query: "what is machine learning", @@ .. Params: &schemas.RerankParameters{ - TopN: &topN, - MaxTokensPerDoc: &maxTokens, - Priority: &priority, + TopN: bifrost.Ptr(2), + MaxTokensPerDoc: bifrost.Ptr(128), + Priority: bifrost.Ptr(5),Based on learnings: "In the maximhq/bifrost repository, prefer using bifrost.Ptr() to create pointers instead of the address operator (&) even when & would be valid syntactically."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/providers/vllm/rerank_test.go` around lines 16 - 49, Replace raw &-pointer creation in TestRerankToVLLMRerankRequest with the project's pointer helper: use bifrost.Ptr() when constructing the Params values (TopN, MaxTokensPerDoc, Priority) of the schemas.BifrostRerankRequest passed to ToVLLMRerankRequest; update the three places where &topN, &maxTokens, &priority are used so they call bifrost.Ptr(topN), bifrost.Ptr(maxTokens), bifrost.Ptr(priority) respectively to follow the codebase convention.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/logging/main.go`:
- Around line 600-601: UpdateLogData currently only includes EmbeddingOutput so
rerank results are dropped; add a new field RerankOutput []schemas.RerankData
(or similar) to UpdateLogData, populate it in the same content-logging block
where EmbeddingOutput is set (use the results from result.RerankResponse ->
iterate ranked items and map index, relevance_score, optional document into
schemas.RerankData), and then wire this new RerankOutput through updateLogEntry
and the log-store layer the same way EmbeddingOutput is passed/stored so the
Logs UI can read structured rerank outputs. Ensure you use the same naming as
UpdateLogData, updateLogEntry, and the storage DTOs so the flow mirrors
EmbeddingOutput end-to-end.
---
Nitpick comments:
In `@core/providers/vllm/rerank_test.go`:
- Around line 16-49: Replace raw &-pointer creation in
TestRerankToVLLMRerankRequest with the project's pointer helper: use
bifrost.Ptr() when constructing the Params values (TopN, MaxTokensPerDoc,
Priority) of the schemas.BifrostRerankRequest passed to ToVLLMRerankRequest;
update the three places where &topN, &maxTokens, &priority are used so they call
bifrost.Ptr(topN), bifrost.Ptr(maxTokens), bifrost.Ptr(priority) respectively to
follow the codebase convention.
In `@docs/providers/supported-providers/overview.mdx`:
- Around line 18-59: Add a "Rerank" column to the provider support matrix:
update the header row to include "Rerank", add a corresponding cell for each
provider row (set ✅ for Cohere, Bedrock, Vertex AI, and vLLM; set ❌ for all
other providers), and update the legend/footnote text and the Notes section
(where reranking is mentioned) to reference the new "Rerank" column so the table
and explanatory text stay consistent.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ui/app/workspace/logs/sheets/logDetailsSheet.tsx`:
- Around line 553-572: Remove the redundant header div inside the render block
that displays rerank output: locate the conditional block that checks
log.rerank_output && !log.error_details?.error.message and remove the standalone
<div className="mt-4 w-full text-left text-sm font-medium">Rerank Output</div>
so the CollapsibleBox (title={`Rerank Output (${log.rerank_output.length})`}) is
the sole header; keep the CollapsibleBox and its CodeEditor intact to match the
pattern used by list_models_output and avoid removing the similar div used for
embedding_output which relies on LogChatMessageView.
1b15f36 to
a93a3cb
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (5)
docs/quickstart/go-sdk/reranking.mdx (2)
72-73: vLLM prefix note is separated from the vLLM example by ~60 lines.The clarification about omitting the
vllm/prefix appears at the bottom of the Parameters section, but the vLLM entry (line 11) is the first place a user encounters it. If the note is kept here, consider whether it duplicates or supersedes the suggestion above; otherwise it can remain as a deeper-dive clarification.No action required if the callout approach above is adopted.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/quickstart/go-sdk/reranking.mdx` around lines 72 - 73, The vLLM note about omitting the "vllm/" prefix is placed far from the first mention (the vLLM entry in the Parameters section) which can confuse readers; move or duplicate that clarification so it's immediately visible where "vLLM" is first introduced (the vLLM entry in the Parameters list) — either add a short inline callout right after the vLLM entry or relocate the existing sentence from the bottom of the Parameters section up next to the vLLM entry so the guidance about using the upstream model ID without the "vllm/" prefix is seen at first mention.
9-11: Consider a Mintlify callout for the provider examples block.The provider/model examples are bare plain text outside any heading, which renders as floating content between the frontmatter description and
## Basic Example. A<Note>or<Tip>callout would integrate more cleanly with the surrounding Mintlify MDX layout and also anchor thevllm/-prefix note (currently far away at line 72) right where users first see the vLLM entry.📝 Suggested format
-Provider/model examples: -- Cohere: `Provider: schemas.Cohere`, `Model: "rerank-v3.5"` -- vLLM: `Provider: schemas.VLLM`, `Model: "BAAI/bge-reranker-v2-m3"` +<Note> +**Provider/model examples** +- Cohere: `Provider: schemas.Cohere`, `Model: "rerank-v3.5"` +- vLLM: `Provider: schemas.VLLM`, `Model: "BAAI/bge-reranker-v2-m3"` — use the upstream model ID without the `vllm/` prefix used in Gateway HTTP requests +</Note>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/quickstart/go-sdk/reranking.mdx` around lines 9 - 11, Wrap the bare provider/model examples into a Mintlify callout (e.g., <Note> or <Tip>) so they render as an integrated block before "## Basic Example"; specifically place the Cohere example ("Provider: schemas.Cohere", "Model: \"rerank-v3.5\"") and the vLLM example ("Provider: schemas.VLLM", "Model: \"BAAI/bge-reranker-v2-m3\"") inside the callout and move the existing vLLM prefix note (currently separated) next to the vLLM entry within that same callout for context.ui/app/workspace/logs/sheets/logDetailsSheet.tsx (1)
580-597: Inconsistent indentation — rerank block is one level deeper than sibling output blocks.Compare with the
embedding_outputblock (line 565) andlist_models_output(line 505): those are at 6-tab indent inside the{log.status !== "processing" && (fragment, but the rerank block here is at 7 tabs. This will likely be caught by Prettier, but worth aligning now.Also, the
<>...</>fragment wrapper (lines 581, 597) is unnecessary since there's only one child element (CollapsibleBox).🎨 Suggested indentation fix
)} - {log.rerank_output && !log.error_details?.error.message && ( - <> - <CollapsibleBox - title={`Rerank Output (${log.rerank_output.length})`} - onCopy={() => JSON.stringify(log.rerank_output, null, 2)} - > - <CodeEditor - className="z-0 w-full" - shouldAdjustInitialHeight={true} - maxHeight={450} - wrap={true} - code={JSON.stringify(log.rerank_output, null, 2)} - lang="json" - readonly={true} - options={{ scrollBeyondLastLine: false, lineNumbers: "off", alwaysConsumeMouseWheel: false }} - /> - </CollapsibleBox> - </> - )} + {log.rerank_output && !log.error_details?.error.message && ( + <CollapsibleBox + title={`Rerank Output (${log.rerank_output.length})`} + onCopy={() => JSON.stringify(log.rerank_output, null, 2)} + > + <CodeEditor + className="z-0 w-full" + shouldAdjustInitialHeight={true} + maxHeight={450} + wrap={true} + code={JSON.stringify(log.rerank_output, null, 2)} + lang="json" + readonly={true} + options={{ scrollBeyondLastLine: false, lineNumbers: "off", alwaysConsumeMouseWheel: false }} + /> + </CollapsibleBox> + )}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ui/app/workspace/logs/sheets/logDetailsSheet.tsx` around lines 580 - 597, The rerank output block is indented one level deeper than sibling blocks and uses an unnecessary fragment wrapper; move the entire rerank_output conditional block (the {log.rerank_output && !log.error_details?.error.message && ( ... )} section that contains CollapsibleBox and CodeEditor) up one indentation level to match the embedding_output and list_models_output blocks, and remove the surrounding <>...</> fragment so CollapsibleBox is the direct child of the conditional. Ensure the conditional, CollapsibleBox title/onCopy, and CodeEditor props remain unchanged.core/providers/vllm/rerank.go (1)
11-14: Redundant nil guard in converter function.Per codebase convention, nil validation of request objects belongs in the Bifrost core layer (
core/bifrost.go).CheckContextAndGetRequestBodyinvllm.goalready handles the case where the converter returnsnil. The guard here is redundant. Based on learnings, "Nil validation for request objects should occur at the Bifrost core layer. Provider-layer methods can assume the request has been validated and should not include redundant nil checks."♻️ Proposed refactor
func ToVLLMRerankRequest(bifrostReq *schemas.BifrostRerankRequest) *vLLMRerankRequest { - if bifrostReq == nil { - return nil - } - vllmReq := &vLLMRerankRequest{🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/providers/vllm/rerank.go` around lines 11 - 14, Remove the redundant nil guard in the provider converter function ToVLLMRerankRequest: since request validation is performed in the Bifrost core (see CheckContextAndGetRequestBody in vllm.go), assume bifrostReq (*schemas.BifrostRerankRequest) is non-nil and return a constructed *vLLMRerankRequest directly; update ToVLLMRerankRequest to stop checking bifrostReq == nil and simply map fields from bifrostReq into the new vLLMRerankRequest.core/providers/vllm/vllm.go (1)
285-291: Fallback path silently discards the primary call's latency.When the fallback is triggered,
latencyis overwritten with only the second call's duration. The time consumed by the first (failed) attempt is lost, so the reported latency understates the actual end-to-end duration seen by the caller.♻️ Proposed fix — accumulate latency across both calls
responsePayload, rawRequest, rawResponse, responseBody, statusCode, latency, bifrostErr := provider.callVLLMRerankEndpoint(ctx, key, request, resolvedPath, jsonData) if bifrostErr != nil && !hasPathOverride && isRerankFallbackStatus(statusCode) { - responsePayload, rawRequest, rawResponse, responseBody, statusCode, latency, bifrostErr = provider.callVLLMRerankEndpoint(ctx, key, request, "/rerank", jsonData) + var fallbackLatency time.Duration + responsePayload, rawRequest, rawResponse, responseBody, statusCode, fallbackLatency, bifrostErr = provider.callVLLMRerankEndpoint(ctx, key, request, "/rerank", jsonData) + latency += fallbackLatency }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/providers/vllm/vllm.go` around lines 285 - 291, The current fallback flow overwrites latency from the first call when callVLLMRerankEndpoint is retried, so update the logic to accumulate durations instead of discarding the first call's time: capture the first call's latency variable returned from provider.callVLLMRerankEndpoint, and if the fallback branch (hasPathOverride false and isRerankFallbackStatus(statusCode)) triggers and you invoke callVLLMRerankEndpoint a second time, add the first latency to the second (e.g., latency = firstLatency + secondLatency) before any error handling or return. Ensure you reference the existing variables latency, bifrostErr, hasPathOverride, isRerankFallbackStatus and the callVLLMRerankEndpoint invocation so the total end-to-end duration is preserved in all code paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@core/providers/vllm/rerank.go`:
- Around line 122-130: The current fallback uses zero-value checks on
promptTokens and completionTokens which incorrectly triggers when vLLM
legitimately returns 0; instead check existence of keys in usageMap before
falling back: for promptTokens, use a map lookup like _, ok :=
usageMap["prompt_tokens"] and only call schemas.SafeExtractInt on "input_tokens"
if "prompt_tokens" is missing, and similarly only fallback to "output_tokens" if
"completion_tokens" is absent; update the logic around usageMap, promptTokens,
completionTokens and keep using schemas.SafeExtractInt for extraction.
In `@docs/providers/supported-providers/vllm.mdx`:
- Around line 146-159: The curl example places return_documents at the top level
but BifrostRerankRequest only accepts provider, model, query, documents, params,
and fallbacks, so move "return_documents": true into the params object; update
the example payload to include a "params": { "return_documents": true } field so
the gateway's JSON deserializer recognizes it and the rerank behavior is applied
correctly.
---
Nitpick comments:
In `@core/providers/vllm/rerank.go`:
- Around line 11-14: Remove the redundant nil guard in the provider converter
function ToVLLMRerankRequest: since request validation is performed in the
Bifrost core (see CheckContextAndGetRequestBody in vllm.go), assume bifrostReq
(*schemas.BifrostRerankRequest) is non-nil and return a constructed
*vLLMRerankRequest directly; update ToVLLMRerankRequest to stop checking
bifrostReq == nil and simply map fields from bifrostReq into the new
vLLMRerankRequest.
In `@core/providers/vllm/vllm.go`:
- Around line 285-291: The current fallback flow overwrites latency from the
first call when callVLLMRerankEndpoint is retried, so update the logic to
accumulate durations instead of discarding the first call's time: capture the
first call's latency variable returned from provider.callVLLMRerankEndpoint, and
if the fallback branch (hasPathOverride false and
isRerankFallbackStatus(statusCode)) triggers and you invoke
callVLLMRerankEndpoint a second time, add the first latency to the second (e.g.,
latency = firstLatency + secondLatency) before any error handling or return.
Ensure you reference the existing variables latency, bifrostErr,
hasPathOverride, isRerankFallbackStatus and the callVLLMRerankEndpoint
invocation so the total end-to-end duration is preserved in all code paths.
In `@docs/quickstart/go-sdk/reranking.mdx`:
- Around line 72-73: The vLLM note about omitting the "vllm/" prefix is placed
far from the first mention (the vLLM entry in the Parameters section) which can
confuse readers; move or duplicate that clarification so it's immediately
visible where "vLLM" is first introduced (the vLLM entry in the Parameters list)
— either add a short inline callout right after the vLLM entry or relocate the
existing sentence from the bottom of the Parameters section up next to the vLLM
entry so the guidance about using the upstream model ID without the "vllm/"
prefix is seen at first mention.
- Around line 9-11: Wrap the bare provider/model examples into a Mintlify
callout (e.g., <Note> or <Tip>) so they render as an integrated block before "##
Basic Example"; specifically place the Cohere example ("Provider:
schemas.Cohere", "Model: \"rerank-v3.5\"") and the vLLM example ("Provider:
schemas.VLLM", "Model: \"BAAI/bge-reranker-v2-m3\"") inside the callout and move
the existing vLLM prefix note (currently separated) next to the vLLM entry
within that same callout for context.
In `@ui/app/workspace/logs/sheets/logDetailsSheet.tsx`:
- Around line 580-597: The rerank output block is indented one level deeper than
sibling blocks and uses an unnecessary fragment wrapper; move the entire
rerank_output conditional block (the {log.rerank_output &&
!log.error_details?.error.message && ( ... )} section that contains
CollapsibleBox and CodeEditor) up one indentation level to match the
embedding_output and list_models_output blocks, and remove the surrounding
<>...</> fragment so CollapsibleBox is the direct child of the conditional.
Ensure the conditional, CollapsibleBox title/onCopy, and CodeEditor props remain
unchanged.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
core/providers/vllm/rerank.go (1)
36-114: Well-structured validation with good edge-case handling.The response parser is thorough: it correctly validates index bounds, deduplicates, prioritizes
relevance_scoreoverscore(without falling back whenrelevance_scoreis legitimately0.0), and produces clear error messages.One optional nit on Line 100:
♻️ Use schemas.Ptr for consistency
if returnDocuments { - doc := documents[index] - result.Document = &doc + result.Document = schemas.Ptr(documents[index]) }Based on learnings: "prefer using bifrost.Ptr() to create pointers instead of the address operator (&) even when & would be valid syntactically." Since this is a
core/providerssubpackage that shouldn't import thecorepackage,schemas.Ptr(already used elsewhere in this file's sibling, e.g.,vllm.go:356) is the appropriate equivalent.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/providers/vllm/rerank.go` around lines 36 - 114, The code uses the address operator (&doc) to set result.Document inside ToBifrostRerankResponse; replace this with the helper pointer constructor to match project conventions—use schemas.Ptr(doc) when assigning result.Document (referencing the local variable doc and the result variable in ToBifrostRerankResponse) so the file stays consistent with other uses of schemas.Ptr instead of taking addresses directly.core/providers/vllm/vllm.go (1)
211-259: Fasthttp lifecycle is correct; consider a return struct for readability.
AcquireRequest/AcquireResponsewith deferredReleasefollows the expected pattern. The seven return values are all consumed by the caller but make the signature hard to parse at a glance. A small result struct could improve readability.♻️ Optional: result struct to tame the return signature
type rerankEndpointResult struct { payload map[string]interface{} rawRequest interface{} rawResponse interface{} body []byte statusCode int latency time.Duration }Then
callVLLMRerankEndpointreturns(*rerankEndpointResult, *schemas.BifrostError).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/providers/vllm/vllm.go` around lines 211 - 259, The function callVLLMRerankEndpoint currently returns seven loose values which harms readability; refactor by introducing a result struct (e.g., rerankEndpointResult with fields payload map[string]interface{}, rawRequest interface{}, rawResponse interface{}, body []byte, statusCode int, latency time.Duration) and change callVLLMRerankEndpoint to return (*rerankEndpointResult, *schemas.BifrostError). Update all return sites in callVLLMRerankEndpoint to construct and return the struct (including on error paths where appropriate) and update callers to unpack the struct instead of the seven separate values; keep existing logic around providerUtils.MakeRequestWithContext, providerUtils.CheckAndDecodeBody, HandleVLLMResponse and the fasthttp request/response lifecycle unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@core/providers/vllm/rerank.go`:
- Around line 36-114: The code uses the address operator (&doc) to set
result.Document inside ToBifrostRerankResponse; replace this with the helper
pointer constructor to match project conventions—use schemas.Ptr(doc) when
assigning result.Document (referencing the local variable doc and the result
variable in ToBifrostRerankResponse) so the file stays consistent with other
uses of schemas.Ptr instead of taking addresses directly.
In `@core/providers/vllm/vllm.go`:
- Around line 211-259: The function callVLLMRerankEndpoint currently returns
seven loose values which harms readability; refactor by introducing a result
struct (e.g., rerankEndpointResult with fields payload map[string]interface{},
rawRequest interface{}, rawResponse interface{}, body []byte, statusCode int,
latency time.Duration) and change callVLLMRerankEndpoint to return
(*rerankEndpointResult, *schemas.BifrostError). Update all return sites in
callVLLMRerankEndpoint to construct and return the struct (including on error
paths where appropriate) and update callers to unpack the struct instead of the
seven separate values; keep existing logic around
providerUtils.MakeRequestWithContext, providerUtils.CheckAndDecodeBody,
HandleVLLMResponse and the fasthttp request/response lifecycle unchanged.

Summary
Adds end-to-end rerank support for the vLLM provider and wires rerank into pricing/logging paths so requests are routed, parsed, costed, and visible in Logs UI.
This supports vLLM deployments that expose either
/v1/rerankor/rerank.What changed
1) vLLM provider rerank support
core/providers/vllm/models.gocore/providers/vllm/rerank.goVLLMProvider.Rerank(...):/v1/rerank/rerank404/405/501resultshandlingrelevance_scoreand legacyscorescorefallback whenrelevance_score: 0.0prompt_tokens/completion_tokensinput_tokens/output_tokensnilusage when all values are zero2) Pricing integration for rerank
framework/modelcatalog/utils.go(normalizeRequestType -> "rerank")framework/modelcatalog/main.go(GetPricingEntryForModelscan list includes rerank)framework/modelcatalog/pricing.go(CalculateCostreadsresult.RerankResponse.Usage)modecolumn already supports this).3) Logging integration for rerank
PreLLMHooknow logs rerank paramsextractInputHistorymaps rerank query intoinput_historyfor log message renderingPostLLMHooknow captures rerank token usage when presentui/lib/constants/logs.ts(RequestTypes,RequestTypeLabels,RequestTypeColors)4) Tests
core/providers/vllm/rerank_test.gorelevance_score: 0.0behaviorType of change
Affected areas
Validation performed