Skip to content

Comments

feat: add vllm rerank support with pricing and logging integration#1723

Open
jerkeyray wants to merge 6 commits intomainfrom
feat/vllm-rerank-provider
Open

feat: add vllm rerank support with pricing and logging integration#1723
jerkeyray wants to merge 6 commits intomainfrom
feat/vllm-rerank-provider

Conversation

@jerkeyray
Copy link
Contributor

@jerkeyray jerkeyray commented Feb 21, 2026

Summary

Adds end-to-end rerank support for the vLLM provider and wires rerank into pricing/logging paths so requests are routed, parsed, costed, and visible in Logs UI.

This supports vLLM deployments that expose either /v1/rerank or /rerank.

What changed

1) vLLM provider rerank support

  • Added vLLM rerank request model and converters:
    • core/providers/vllm/models.go
    • core/providers/vllm/rerank.go
  • Implemented VLLMProvider.Rerank(...):
    • primary endpoint: /v1/rerank
    • fallback endpoint: /rerank
    • fallback triggers only on 404/405/501
    • fallback is skipped when request path override is explicitly provided
  • Added robust response parsing/validation:
    • missing/invalid results handling
    • duplicate index detection
    • out-of-range index detection
    • supports relevance_score and legacy score
    • avoids score fallback when relevance_score: 0.0
  • Added usage parsing support for:
    • prompt_tokens / completion_tokens
    • input_tokens / output_tokens
    • returns nil usage when all values are zero

2) Pricing integration for rerank

  • Added rerank normalization and lookup support:
    • framework/modelcatalog/utils.go (normalizeRequestType -> "rerank")
    • framework/modelcatalog/main.go (GetPricingEntryForModel scan list includes rerank)
  • Added rerank usage cost path:
    • framework/modelcatalog/pricing.go (CalculateCost reads result.RerankResponse.Usage)
  • No DB schema changes required (existing mode column already supports this).

3) Logging integration for rerank

  • PreLLMHook now logs rerank params
  • extractInputHistory maps rerank query into input_history for log message rendering
  • PostLLMHook now captures rerank token usage when present
  • Logs UI supports rerank in type filter/badge mappings:
    • ui/lib/constants/logs.ts (RequestTypes, RequestTypeLabels, RequestTypeColors)

4) Tests

  • Added comprehensive rerank unit tests:
    • core/providers/vllm/rerank_test.go
      • nil request conversion
      • duplicate index / out-of-range validation
      • empty results
      • relevance_score: 0.0 behavior
      • zero-usage parsing guard

Type of change

  • Feature
  • Bug fix
  • Breaking change
  • Refactor
  • Docs-only

Affected areas

  • Core (Go)
  • Providers/Integrations
  • Framework (modelcatalog pricing)
  • Plugins (logging)
  • UI (Logs constants)
  • Docs

Validation performed

go test ./core/providers/vllm/... -run 'TestRerank|TestVLLM' -v
go test ./framework/modelcatalog/... -v
go test ./plugins/logging/... -v
go vet ./core/providers/vllm/... ./framework/modelcatalog/... ./plugins/logging/...

Notes:

- TestVLLM is env-gated and skips when VLLM_BASE_URL is not set.
- UI lint in this repo currently has an existing config issue (Cannot redefine plugin "import"), unrelated to this change.

## Manual verification

1. Start vLLM with a rerank-capable model.
2. Send rerank request through Bifrost:
    - POST /v1/rerank with model vllm/<upstream-model>
3. Verify fallback behavior by deployment mode:
    - works against /v1/rerank
    - retries /rerank on 404/405/501
4. Verify logs:
    - /api/logs?objects=rerank
    - request query appears in input_history
    - token_usage and cost populated when provider returns usage

## Breaking changes

- [x] No

## Security considerations

No new auth surface introduced. Reuses existing provider key handling and request validation paths.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 21, 2026

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • vLLM reranking added with /v1/rerank (automatic fallback to /rerank) and end-to-end response handling.
    • Rerank results shown in workspace logs UI with score/details and copyable JSON.
  • Documentation

    • Guides and quickstarts updated with vLLM rerank examples, endpoint compatibility, and SDK notes.
    • Provider overview lists reranking support.
  • Tests

    • Comprehensive unit tests for rerank request/response parsing and edge cases.
  • Chores

    • Logging, storage, pricing, and migrations updated to persist and account for rerank output.

Walkthrough

Adds end-to-end rerank support for vLLM: request/response converters and validation, provider call with /v1/rerank → /rerank fallback, usage parsing, pricing/logging/UI and DB migration, tests, and documentation updates.

Changes

Cohort / File(s) Summary
vLLM Rerank Core
core/providers/vllm/models.go, core/providers/vllm/rerank.go, core/providers/vllm/rerank_test.go
Adds vLLMRerankRequest and converters ToVLLMRerankRequest / ToBifrostRerankResponse with strict payload validation, usage parsing, sorting/deduplication, and comprehensive unit tests (success and error cases).
vLLM Provider Implementation
core/providers/vllm/vllm.go, core/providers/vllm/vllm_test.go
Implements rerank call flow: callVLLMRerankEndpoint, isRerankFallbackStatus (404/405/501), endpoint resolution (/v1/rerank with optional override and retry to /rerank), latency capture, response enrichment, and test wiring (RerankModel).
Framework: pricing & request type
framework/modelcatalog/main.go, framework/modelcatalog/pricing.go, framework/modelcatalog/utils.go
Integrates RerankRequest into pricing lookup and cost calculation; normalizes request type to "rerank".
Logging plugin & hooks
plugins/logging/main.go, plugins/logging/utils.go, plugins/logging/operations.go, plugins/logging/pool.go
Adds RerankOutput handling in logging lifecycle: pre/post hooks, usage extraction, update payloads, and pool reset.
Log storage schema & migration
framework/logstore/migrations.go, framework/logstore/tables.go
Adds DB migration for rerank_output column; Log gains RerankOutput (serialized) and RerankOutputParsed (deserialized) with serialization/deserialization and content-summary inclusion.
UI & types
ui/lib/constants/logs.ts, ui/lib/types/logs.ts, ui/app/workspace/logs/sheets/logDetailsSheet.tsx
Adds "rerank" request type, label, color; defines RerankDocument/RerankResult types and LogEntry.rerank_output; renders Rerank Output in log details UI.
Documentation & Quickstarts
docs/providers/supported-providers/overview.mdx, docs/providers/supported-providers/vllm.mdx, docs/quickstart/gateway/reranking.mdx, docs/quickstart/go-sdk/reranking.mdx
Documents vLLM rerank support, endpoint behavior and fallback, provider model examples, and Gateway/SDK configuration guidance.
Tests/config & misc
core/internal/llmtests/..., framework/logstore/*, ui/package.json, go.mod
Wires RerankModel into test config; various wiring across migrations, logging, and packaging updates.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Gateway
    participant Converter
    participant vLLM

    Client->>Gateway: BifrostRerankRequest
    Gateway->>Converter: ToVLLMRerankRequest(bifrostReq)
    Converter-->>Gateway: vLLMRerankRequest (JSON)
    Gateway->>vLLM: POST /v1/rerank (body)
    alt 2xx
        vLLM-->>Gateway: 2xx payload
    else 404/405/501
        Gateway->>vLLM: POST /rerank (retry)
        vLLM-->>Gateway: 2xx payload
    end
    Gateway->>Converter: ToBifrostRerankResponse(payload, docs, returnDocs)
    Converter-->>Gateway: BifrostRerankResponse (usage, results)
    Gateway->>Gateway: enrich ExtraFields (provider, model, latency, raw data)
    Gateway-->>Client: Enriched BifrostRerankResponse
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰
I hop through requests with nary a stall,
Scores sorted neat, I answer the call.
Fallbacks ready, indices align,
vLLM and Gateway—rerank on time.
A joyful rabbit hums: metrics and logs all fine.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 21.05% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main feature: adding vLLM rerank support with pricing and logging integration, which matches the primary changes across the codebase.
Description check ✅ Passed The description follows the template structure with all major sections filled: Summary, Changes (detailed breakdown), Type of change, Affected areas, Validation performed, Manual verification, Breaking changes, and Security considerations. Minor gaps in optional sections (no screenshots/recordings provided, but not applicable to this backend feature) do not warrant failure.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/vllm-rerank-provider

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@github-actions
Copy link
Contributor

🧪 Test Suite Available

This PR can be tested by a repository admin.

Run tests for PR #1723

@jerkeyray jerkeyray changed the title feat: add vllm rerank support feat: add vllm rerank support, Feb 21, 2026
@jerkeyray jerkeyray changed the title feat: add vllm rerank support, feat: add vllm rerank support with pricing and logging integration Feb 21, 2026
@jerkeyray jerkeyray marked this pull request as ready for review February 21, 2026 13:56
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
docs/providers/supported-providers/overview.mdx (1)

18-59: Optional: Add a "Rerank" column to the support matrix for consistency.

The table already has an Embeddings column to indicate per-provider support. Rerank has the same partial-coverage pattern (Cohere ✅, Bedrock ✅, Vertex AI ✅, vLLM ✅, others ❌) but is only described in the footnote. Adding a Rerank column would let users discover support at a glance, consistent with how all other operations are documented.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/providers/supported-providers/overview.mdx` around lines 18 - 59, Add a
"Rerank" column to the provider support matrix: update the header row to include
"Rerank", add a corresponding cell for each provider row (set ✅ for Cohere,
Bedrock, Vertex AI, and vLLM; set ❌ for all other providers), and update the
legend/footnote text and the Notes section (where reranking is mentioned) to
reference the new "Rerank" column so the table and explanatory text stay
consistent.
core/providers/vllm/rerank_test.go (1)

16-49: Consider using bifrost.Ptr() for pointer creation.

Lines 17-19 use the address operator (&) to create pointer values. The codebase convention prefers bifrost.Ptr() for consistency.

♻️ Suggested refactor
 func TestRerankToVLLMRerankRequest(t *testing.T) {
-	topN := 2
-	maxTokens := 128
-	priority := 5
-
 	req := ToVLLMRerankRequest(&schemas.BifrostRerankRequest{
 		Model: "BAAI/bge-reranker-v2-m3",
 		Query: "what is machine learning",
@@ ..
 		Params: &schemas.RerankParameters{
-			TopN:            &topN,
-			MaxTokensPerDoc: &maxTokens,
-			Priority:        &priority,
+			TopN:            bifrost.Ptr(2),
+			MaxTokensPerDoc: bifrost.Ptr(128),
+			Priority:        bifrost.Ptr(5),

Based on learnings: "In the maximhq/bifrost repository, prefer using bifrost.Ptr() to create pointers instead of the address operator (&) even when & would be valid syntactically."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/providers/vllm/rerank_test.go` around lines 16 - 49, Replace raw
&-pointer creation in TestRerankToVLLMRerankRequest with the project's pointer
helper: use bifrost.Ptr() when constructing the Params values (TopN,
MaxTokensPerDoc, Priority) of the schemas.BifrostRerankRequest passed to
ToVLLMRerankRequest; update the three places where &topN, &maxTokens, &priority
are used so they call bifrost.Ptr(topN), bifrost.Ptr(maxTokens),
bifrost.Ptr(priority) respectively to follow the codebase convention.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/logging/main.go`:
- Around line 600-601: UpdateLogData currently only includes EmbeddingOutput so
rerank results are dropped; add a new field RerankOutput []schemas.RerankData
(or similar) to UpdateLogData, populate it in the same content-logging block
where EmbeddingOutput is set (use the results from result.RerankResponse ->
iterate ranked items and map index, relevance_score, optional document into
schemas.RerankData), and then wire this new RerankOutput through updateLogEntry
and the log-store layer the same way EmbeddingOutput is passed/stored so the
Logs UI can read structured rerank outputs. Ensure you use the same naming as
UpdateLogData, updateLogEntry, and the storage DTOs so the flow mirrors
EmbeddingOutput end-to-end.

---

Nitpick comments:
In `@core/providers/vllm/rerank_test.go`:
- Around line 16-49: Replace raw &-pointer creation in
TestRerankToVLLMRerankRequest with the project's pointer helper: use
bifrost.Ptr() when constructing the Params values (TopN, MaxTokensPerDoc,
Priority) of the schemas.BifrostRerankRequest passed to ToVLLMRerankRequest;
update the three places where &topN, &maxTokens, &priority are used so they call
bifrost.Ptr(topN), bifrost.Ptr(maxTokens), bifrost.Ptr(priority) respectively to
follow the codebase convention.

In `@docs/providers/supported-providers/overview.mdx`:
- Around line 18-59: Add a "Rerank" column to the provider support matrix:
update the header row to include "Rerank", add a corresponding cell for each
provider row (set ✅ for Cohere, Bedrock, Vertex AI, and vLLM; set ❌ for all
other providers), and update the legend/footnote text and the Notes section
(where reranking is mentioned) to reference the new "Rerank" column so the table
and explanatory text stay consistent.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ui/app/workspace/logs/sheets/logDetailsSheet.tsx`:
- Around line 553-572: Remove the redundant header div inside the render block
that displays rerank output: locate the conditional block that checks
log.rerank_output && !log.error_details?.error.message and remove the standalone
<div className="mt-4 w-full text-left text-sm font-medium">Rerank Output</div>
so the CollapsibleBox (title={`Rerank Output (${log.rerank_output.length})`}) is
the sole header; keep the CollapsibleBox and its CodeEditor intact to match the
pattern used by list_models_output and avoid removing the similar div used for
embedding_output which relies on LogChatMessageView.

@jerkeyray jerkeyray force-pushed the feat/vllm-rerank-provider branch from 1b15f36 to a93a3cb Compare February 21, 2026 15:49
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (5)
docs/quickstart/go-sdk/reranking.mdx (2)

72-73: vLLM prefix note is separated from the vLLM example by ~60 lines.

The clarification about omitting the vllm/ prefix appears at the bottom of the Parameters section, but the vLLM entry (line 11) is the first place a user encounters it. If the note is kept here, consider whether it duplicates or supersedes the suggestion above; otherwise it can remain as a deeper-dive clarification.

No action required if the callout approach above is adopted.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/quickstart/go-sdk/reranking.mdx` around lines 72 - 73, The vLLM note
about omitting the "vllm/" prefix is placed far from the first mention (the vLLM
entry in the Parameters section) which can confuse readers; move or duplicate
that clarification so it's immediately visible where "vLLM" is first introduced
(the vLLM entry in the Parameters list) — either add a short inline callout
right after the vLLM entry or relocate the existing sentence from the bottom of
the Parameters section up next to the vLLM entry so the guidance about using the
upstream model ID without the "vllm/" prefix is seen at first mention.

9-11: Consider a Mintlify callout for the provider examples block.

The provider/model examples are bare plain text outside any heading, which renders as floating content between the frontmatter description and ## Basic Example. A <Note> or <Tip> callout would integrate more cleanly with the surrounding Mintlify MDX layout and also anchor the vllm/-prefix note (currently far away at line 72) right where users first see the vLLM entry.

📝 Suggested format
-Provider/model examples:
-- Cohere: `Provider: schemas.Cohere`, `Model: "rerank-v3.5"`
-- vLLM: `Provider: schemas.VLLM`, `Model: "BAAI/bge-reranker-v2-m3"`
+<Note>
+**Provider/model examples**
+- Cohere: `Provider: schemas.Cohere`, `Model: "rerank-v3.5"`
+- vLLM: `Provider: schemas.VLLM`, `Model: "BAAI/bge-reranker-v2-m3"` — use the upstream model ID without the `vllm/` prefix used in Gateway HTTP requests
+</Note>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/quickstart/go-sdk/reranking.mdx` around lines 9 - 11, Wrap the bare
provider/model examples into a Mintlify callout (e.g., <Note> or <Tip>) so they
render as an integrated block before "## Basic Example"; specifically place the
Cohere example ("Provider: schemas.Cohere", "Model: \"rerank-v3.5\"") and the
vLLM example ("Provider: schemas.VLLM", "Model: \"BAAI/bge-reranker-v2-m3\"")
inside the callout and move the existing vLLM prefix note (currently separated)
next to the vLLM entry within that same callout for context.
ui/app/workspace/logs/sheets/logDetailsSheet.tsx (1)

580-597: Inconsistent indentation — rerank block is one level deeper than sibling output blocks.

Compare with the embedding_output block (line 565) and list_models_output (line 505): those are at 6-tab indent inside the {log.status !== "processing" && ( fragment, but the rerank block here is at 7 tabs. This will likely be caught by Prettier, but worth aligning now.

Also, the <>...</> fragment wrapper (lines 581, 597) is unnecessary since there's only one child element (CollapsibleBox).

🎨 Suggested indentation fix
 						)}
-							{log.rerank_output && !log.error_details?.error.message && (
-								<>
-									<CollapsibleBox
-										title={`Rerank Output (${log.rerank_output.length})`}
-										onCopy={() => JSON.stringify(log.rerank_output, null, 2)}
-									>
-									<CodeEditor
-										className="z-0 w-full"
-										shouldAdjustInitialHeight={true}
-										maxHeight={450}
-										wrap={true}
-										code={JSON.stringify(log.rerank_output, null, 2)}
-										lang="json"
-										readonly={true}
-										options={{ scrollBeyondLastLine: false, lineNumbers: "off", alwaysConsumeMouseWheel: false }}
-									/>
-								</CollapsibleBox>
-							</>
-						)}
+						{log.rerank_output && !log.error_details?.error.message && (
+							<CollapsibleBox
+								title={`Rerank Output (${log.rerank_output.length})`}
+								onCopy={() => JSON.stringify(log.rerank_output, null, 2)}
+							>
+								<CodeEditor
+									className="z-0 w-full"
+									shouldAdjustInitialHeight={true}
+									maxHeight={450}
+									wrap={true}
+									code={JSON.stringify(log.rerank_output, null, 2)}
+									lang="json"
+									readonly={true}
+									options={{ scrollBeyondLastLine: false, lineNumbers: "off", alwaysConsumeMouseWheel: false }}
+								/>
+							</CollapsibleBox>
+						)}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ui/app/workspace/logs/sheets/logDetailsSheet.tsx` around lines 580 - 597, The
rerank output block is indented one level deeper than sibling blocks and uses an
unnecessary fragment wrapper; move the entire rerank_output conditional block
(the {log.rerank_output && !log.error_details?.error.message && ( ... )} section
that contains CollapsibleBox and CodeEditor) up one indentation level to match
the embedding_output and list_models_output blocks, and remove the surrounding
<>...</> fragment so CollapsibleBox is the direct child of the conditional.
Ensure the conditional, CollapsibleBox title/onCopy, and CodeEditor props remain
unchanged.
core/providers/vllm/rerank.go (1)

11-14: Redundant nil guard in converter function.

Per codebase convention, nil validation of request objects belongs in the Bifrost core layer (core/bifrost.go). CheckContextAndGetRequestBody in vllm.go already handles the case where the converter returns nil. The guard here is redundant. Based on learnings, "Nil validation for request objects should occur at the Bifrost core layer. Provider-layer methods can assume the request has been validated and should not include redundant nil checks."

♻️ Proposed refactor
 func ToVLLMRerankRequest(bifrostReq *schemas.BifrostRerankRequest) *vLLMRerankRequest {
-	if bifrostReq == nil {
-		return nil
-	}
-
 	vllmReq := &vLLMRerankRequest{
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/providers/vllm/rerank.go` around lines 11 - 14, Remove the redundant nil
guard in the provider converter function ToVLLMRerankRequest: since request
validation is performed in the Bifrost core (see CheckContextAndGetRequestBody
in vllm.go), assume bifrostReq (*schemas.BifrostRerankRequest) is non-nil and
return a constructed *vLLMRerankRequest directly; update ToVLLMRerankRequest to
stop checking bifrostReq == nil and simply map fields from bifrostReq into the
new vLLMRerankRequest.
core/providers/vllm/vllm.go (1)

285-291: Fallback path silently discards the primary call's latency.

When the fallback is triggered, latency is overwritten with only the second call's duration. The time consumed by the first (failed) attempt is lost, so the reported latency understates the actual end-to-end duration seen by the caller.

♻️ Proposed fix — accumulate latency across both calls
 responsePayload, rawRequest, rawResponse, responseBody, statusCode, latency, bifrostErr := provider.callVLLMRerankEndpoint(ctx, key, request, resolvedPath, jsonData)
 if bifrostErr != nil && !hasPathOverride && isRerankFallbackStatus(statusCode) {
-	responsePayload, rawRequest, rawResponse, responseBody, statusCode, latency, bifrostErr = provider.callVLLMRerankEndpoint(ctx, key, request, "/rerank", jsonData)
+	var fallbackLatency time.Duration
+	responsePayload, rawRequest, rawResponse, responseBody, statusCode, fallbackLatency, bifrostErr = provider.callVLLMRerankEndpoint(ctx, key, request, "/rerank", jsonData)
+	latency += fallbackLatency
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/providers/vllm/vllm.go` around lines 285 - 291, The current fallback
flow overwrites latency from the first call when callVLLMRerankEndpoint is
retried, so update the logic to accumulate durations instead of discarding the
first call's time: capture the first call's latency variable returned from
provider.callVLLMRerankEndpoint, and if the fallback branch (hasPathOverride
false and isRerankFallbackStatus(statusCode)) triggers and you invoke
callVLLMRerankEndpoint a second time, add the first latency to the second (e.g.,
latency = firstLatency + secondLatency) before any error handling or return.
Ensure you reference the existing variables latency, bifrostErr,
hasPathOverride, isRerankFallbackStatus and the callVLLMRerankEndpoint
invocation so the total end-to-end duration is preserved in all code paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@core/providers/vllm/rerank.go`:
- Around line 122-130: The current fallback uses zero-value checks on
promptTokens and completionTokens which incorrectly triggers when vLLM
legitimately returns 0; instead check existence of keys in usageMap before
falling back: for promptTokens, use a map lookup like _, ok :=
usageMap["prompt_tokens"] and only call schemas.SafeExtractInt on "input_tokens"
if "prompt_tokens" is missing, and similarly only fallback to "output_tokens" if
"completion_tokens" is absent; update the logic around usageMap, promptTokens,
completionTokens and keep using schemas.SafeExtractInt for extraction.

In `@docs/providers/supported-providers/vllm.mdx`:
- Around line 146-159: The curl example places return_documents at the top level
but BifrostRerankRequest only accepts provider, model, query, documents, params,
and fallbacks, so move "return_documents": true into the params object; update
the example payload to include a "params": { "return_documents": true } field so
the gateway's JSON deserializer recognizes it and the rerank behavior is applied
correctly.

---

Nitpick comments:
In `@core/providers/vllm/rerank.go`:
- Around line 11-14: Remove the redundant nil guard in the provider converter
function ToVLLMRerankRequest: since request validation is performed in the
Bifrost core (see CheckContextAndGetRequestBody in vllm.go), assume bifrostReq
(*schemas.BifrostRerankRequest) is non-nil and return a constructed
*vLLMRerankRequest directly; update ToVLLMRerankRequest to stop checking
bifrostReq == nil and simply map fields from bifrostReq into the new
vLLMRerankRequest.

In `@core/providers/vllm/vllm.go`:
- Around line 285-291: The current fallback flow overwrites latency from the
first call when callVLLMRerankEndpoint is retried, so update the logic to
accumulate durations instead of discarding the first call's time: capture the
first call's latency variable returned from provider.callVLLMRerankEndpoint, and
if the fallback branch (hasPathOverride false and
isRerankFallbackStatus(statusCode)) triggers and you invoke
callVLLMRerankEndpoint a second time, add the first latency to the second (e.g.,
latency = firstLatency + secondLatency) before any error handling or return.
Ensure you reference the existing variables latency, bifrostErr,
hasPathOverride, isRerankFallbackStatus and the callVLLMRerankEndpoint
invocation so the total end-to-end duration is preserved in all code paths.

In `@docs/quickstart/go-sdk/reranking.mdx`:
- Around line 72-73: The vLLM note about omitting the "vllm/" prefix is placed
far from the first mention (the vLLM entry in the Parameters section) which can
confuse readers; move or duplicate that clarification so it's immediately
visible where "vLLM" is first introduced (the vLLM entry in the Parameters list)
— either add a short inline callout right after the vLLM entry or relocate the
existing sentence from the bottom of the Parameters section up next to the vLLM
entry so the guidance about using the upstream model ID without the "vllm/"
prefix is seen at first mention.
- Around line 9-11: Wrap the bare provider/model examples into a Mintlify
callout (e.g., <Note> or <Tip>) so they render as an integrated block before "##
Basic Example"; specifically place the Cohere example ("Provider:
schemas.Cohere", "Model: \"rerank-v3.5\"") and the vLLM example ("Provider:
schemas.VLLM", "Model: \"BAAI/bge-reranker-v2-m3\"") inside the callout and move
the existing vLLM prefix note (currently separated) next to the vLLM entry
within that same callout for context.

In `@ui/app/workspace/logs/sheets/logDetailsSheet.tsx`:
- Around line 580-597: The rerank output block is indented one level deeper than
sibling blocks and uses an unnecessary fragment wrapper; move the entire
rerank_output conditional block (the {log.rerank_output &&
!log.error_details?.error.message && ( ... )} section that contains
CollapsibleBox and CodeEditor) up one indentation level to match the
embedding_output and list_models_output blocks, and remove the surrounding
<>...</> fragment so CollapsibleBox is the direct child of the conditional.
Ensure the conditional, CollapsibleBox title/onCopy, and CodeEditor props remain
unchanged.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
core/providers/vllm/rerank.go (1)

36-114: Well-structured validation with good edge-case handling.

The response parser is thorough: it correctly validates index bounds, deduplicates, prioritizes relevance_score over score (without falling back when relevance_score is legitimately 0.0), and produces clear error messages.

One optional nit on Line 100:

♻️ Use schemas.Ptr for consistency
 		if returnDocuments {
-			doc := documents[index]
-			result.Document = &doc
+			result.Document = schemas.Ptr(documents[index])
 		}

Based on learnings: "prefer using bifrost.Ptr() to create pointers instead of the address operator (&) even when & would be valid syntactically." Since this is a core/providers subpackage that shouldn't import the core package, schemas.Ptr (already used elsewhere in this file's sibling, e.g., vllm.go:356) is the appropriate equivalent.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/providers/vllm/rerank.go` around lines 36 - 114, The code uses the
address operator (&doc) to set result.Document inside ToBifrostRerankResponse;
replace this with the helper pointer constructor to match project
conventions—use schemas.Ptr(doc) when assigning result.Document (referencing the
local variable doc and the result variable in ToBifrostRerankResponse) so the
file stays consistent with other uses of schemas.Ptr instead of taking addresses
directly.
core/providers/vllm/vllm.go (1)

211-259: Fasthttp lifecycle is correct; consider a return struct for readability.

AcquireRequest/AcquireResponse with deferred Release follows the expected pattern. The seven return values are all consumed by the caller but make the signature hard to parse at a glance. A small result struct could improve readability.

♻️ Optional: result struct to tame the return signature
type rerankEndpointResult struct {
	payload     map[string]interface{}
	rawRequest  interface{}
	rawResponse interface{}
	body        []byte
	statusCode  int
	latency     time.Duration
}

Then callVLLMRerankEndpoint returns (*rerankEndpointResult, *schemas.BifrostError).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/providers/vllm/vllm.go` around lines 211 - 259, The function
callVLLMRerankEndpoint currently returns seven loose values which harms
readability; refactor by introducing a result struct (e.g., rerankEndpointResult
with fields payload map[string]interface{}, rawRequest interface{}, rawResponse
interface{}, body []byte, statusCode int, latency time.Duration) and change
callVLLMRerankEndpoint to return (*rerankEndpointResult, *schemas.BifrostError).
Update all return sites in callVLLMRerankEndpoint to construct and return the
struct (including on error paths where appropriate) and update callers to unpack
the struct instead of the seven separate values; keep existing logic around
providerUtils.MakeRequestWithContext, providerUtils.CheckAndDecodeBody,
HandleVLLMResponse and the fasthttp request/response lifecycle unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@core/providers/vllm/rerank.go`:
- Around line 36-114: The code uses the address operator (&doc) to set
result.Document inside ToBifrostRerankResponse; replace this with the helper
pointer constructor to match project conventions—use schemas.Ptr(doc) when
assigning result.Document (referencing the local variable doc and the result
variable in ToBifrostRerankResponse) so the file stays consistent with other
uses of schemas.Ptr instead of taking addresses directly.

In `@core/providers/vllm/vllm.go`:
- Around line 211-259: The function callVLLMRerankEndpoint currently returns
seven loose values which harms readability; refactor by introducing a result
struct (e.g., rerankEndpointResult with fields payload map[string]interface{},
rawRequest interface{}, rawResponse interface{}, body []byte, statusCode int,
latency time.Duration) and change callVLLMRerankEndpoint to return
(*rerankEndpointResult, *schemas.BifrostError). Update all return sites in
callVLLMRerankEndpoint to construct and return the struct (including on error
paths where appropriate) and update callers to unpack the struct instead of the
seven separate values; keep existing logic around
providerUtils.MakeRequestWithContext, providerUtils.CheckAndDecodeBody,
HandleVLLMResponse and the fasthttp request/response lifecycle unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant