Skip to content

perf(gateway): fix high CPU spike when streaming large image payloads from Google#1707

Open
steebchen wants to merge 2 commits intomainfrom
fix-streaming-cpu-spike-google
Open

perf(gateway): fix high CPU spike when streaming large image payloads from Google#1707
steebchen wants to merge 2 commits intomainfrom
fix-streaming-cpu-spike-google

Conversation

@steebchen
Copy link
Copy Markdown
Member

@steebchen steebchen commented Feb 20, 2026

Summary

  • Fixes high CPU spikes when streaming large image payloads (base64 inlineData) from Google/Gemini providers (google-ai-studio, google-vertex, obsidian), particularly with gemini-3-pro-image-preview
  • Reduces redundant O(n) string operations on multi-MB base64 data from 4-5 per SSE chunk down to 1
  • Eliminates unnecessary .trim() string copies, defers URL construction in extractImages, adds efficient serializeStreamingChunk that skips JSON.stringify escape scanning on base64 data, and removes duplicate extractTokenUsage calls for Google providers

Test plan

  • All 140 existing tests pass (7 test files across chat tools)
  • Added new tests for whitespace handling in mightBeCompleteJson (including large payloads with surrounding whitespace)
  • TypeScript type check passes cleanly
  • Lint-staged hooks pass
  • Verify streaming image generation with gemini-3-pro-image-preview in staging
  • Monitor CPU utilization during streaming image responses post-deploy

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Chores
    • Optimized memory allocation and streaming serialization to reduce large-payload copy overhead and improve streaming performance
    • Deferred image URL construction to lower memory use for large embedded images and speed up image handling
    • Improved token/usage extraction logic across providers to avoid duplicate work and reduce extra allocations
    • Refined large-JSON validation to avoid unnecessary string copies for whitespace-wrapped or very large payloads
  • Tests
    • Added coverage for whitespace and large-payload JSON cases

… from Google

The streaming hot path for Google/Gemini image models (e.g. gemini-3-pro-image-preview)
performed 4-5 redundant O(n) string operations on multi-MB base64 image data per SSE
chunk, causing CPU spikes. This reduces it to 1 O(n) operation.

Key changes:
- mightBeCompleteJson: scan whitespace boundaries in-place instead of .trim() copy
- extractImages: defer URL construction via _mime/_base64 refs instead of eager concat
- serializeStreamingChunk: build JSON directly with base64 data, skip JSON.stringify
  escape scanning on base64 (only A-Za-z0-9+/= chars, no escaping needed)
- writeSSEAndCache/SSE parser: skip debug string copies for payloads >64KB
- Reuse extractTokenUsage result for Google providers instead of calling twice

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings February 20, 2026 16:47
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 20, 2026

Walkthrough

Optimizes streaming memory use by deferring large string construction, adding a streaming serializer for image-containing chunks, avoiding full-string trims for large JSON, and reusing computed token-usage data; no public API signature changes.

Changes

Cohort / File(s) Summary
Streaming serialization & chat flow
apps/gateway/src/chat/chat.ts, apps/gateway/src/chat/tools/serialize-streaming-chunk.ts
Add serializeStreamingChunk and replace large-path JSON.stringify calls with it; introduce LARGE_SSE_THRESHOLD (65536) to skip heavy debug/logging for big payloads; reuse extracted Google usage across branches to avoid duplicate work.
Image handling & types
apps/gateway/src/chat/tools/extract-images.ts, apps/gateway/src/chat/tools/types.ts
Image objects now store deferred _mime and _base64 fields and use url: "" to avoid building big data URLs; add serializeImageObject to lazily reconstruct data URLs when needed.
JSON completeness checks & tests
apps/gateway/src/chat/tools/might-be-complete-json.ts, apps/gateway/src/chat/tools/might-be-complete-json.spec.ts
Replace full-string trim() with start/end index scanning to avoid allocations on large inputs; update large-path helper signature to accept indices; add tests for whitespace and large-payload cases (120KB+).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~35 minutes

Possibly related PRs

Suggested labels

auto-merge

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the main objective of the PR: fixing high CPU spikes during streaming of large image payloads from Google providers.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix-streaming-cpu-spike-google
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes CPU performance when streaming large image payloads (multi-MB base64 data) from Google/Gemini providers by reducing redundant O(n) string operations. The key optimization strategies are: (1) deferring data URL construction by storing MIME type and base64 data separately, (2) implementing efficient manual JSON serialization that avoids intermediate string concatenation, (3) eliminating unnecessary .trim() calls on large payloads, and (4) removing duplicate extractTokenUsage calls for Google providers.

Changes:

  • Introduced deferred URL construction mechanism with _mime and _base64 fields in ImageObject type
  • Added serializeStreamingChunk function for efficient JSON serialization of chunks with deferred images
  • Refactored mightBeCompleteJson to avoid .trim() allocations on multi-MB strings
  • Optimized debug logging and SSE data collection to skip large payloads
  • Cached extractTokenUsage results for Google providers to avoid redundant calls

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
apps/gateway/src/chat/tools/types.ts Added internal _mime and _base64 fields to ImageObject for deferred URL construction
apps/gateway/src/chat/tools/serialize-streaming-chunk.ts New function for efficient JSON serialization avoiding data URL concatenation
apps/gateway/src/chat/tools/might-be-complete-json.ts Refactored to eliminate .trim() allocations, scan whitespace manually with indices
apps/gateway/src/chat/tools/might-be-complete-json.spec.ts Added tests for whitespace handling without .trim() copies
apps/gateway/src/chat/tools/extract-images.ts Modified to store MIME/base64 separately; added unused serializeImageObject function
apps/gateway/src/chat/chat.ts Integrated optimizations: skip .trim() on large payloads, use serializeStreamingChunk, cache Google usage results, skip debug logging for large payloads

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +37 to +49
/**
* Serializes an image object to a JSON string fragment without creating
* an intermediate concatenated data URL. This avoids allocating multi-MB
* strings just to immediately re-serialize them.
*/
export function serializeImageObject(img: ImageObject): string {
const imgUrl = img.image_url as any;
if (imgUrl._mime && imgUrl._base64) {
// Build the JSON directly, embedding the base64 data in place
return `{"type":"image_url","image_url":{"url":"data:${imgUrl._mime};base64,${imgUrl._base64}"}}`;
}
return JSON.stringify(img);
}
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exported function serializeImageObject is never used in the codebase. The image serialization for streaming is handled by serializeStreamingChunk in serialize-streaming-chunk.ts instead. Consider removing this unused export to reduce code maintenance burden, or document if it's intended for future use or external consumption.

Suggested change
/**
* Serializes an image object to a JSON string fragment without creating
* an intermediate concatenated data URL. This avoids allocating multi-MB
* strings just to immediately re-serialize them.
*/
export function serializeImageObject(img: ImageObject): string {
const imgUrl = img.image_url as any;
if (imgUrl._mime && imgUrl._base64) {
// Build the JSON directly, embedding the base64 data in place
return `{"type":"image_url","image_url":{"url":"data:${imgUrl._mime};base64,${imgUrl._base64}"}}`;
}
return JSON.stringify(img);
}

Copilot uses AI. Check for mistakes.
Comment on lines +36 to +38
// Build JSON directly - base64 data is safe (no JSON escaping needed for A-Za-z0-9+/=)
imageJsonParts.push(
`{"type":"image_url","image_url":{"url":"data:${imgUrl._mime};base64,${imgUrl._base64}"}}`,
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The manual JSON construction assumes MIME types contain only safe characters that don't require JSON escaping. While standard MIME types like "image/png" or "image/jpeg" are safe, this could be problematic if the upstream provider returns malformed or unusual MIME types containing quotes, backslashes, or control characters. Consider validating or sanitizing the MIME type, or using JSON.stringify to escape it before embedding it in the manual JSON string. For example: JSON.stringify({type:"image_url",image_url:{url:\data:${imgUrl._mime};base64,${imgUrl._base64}`}})` would ensure proper escaping while still avoiding the data URL concatenation overhead.

Suggested change
// Build JSON directly - base64 data is safe (no JSON escaping needed for A-Za-z0-9+/=)
imageJsonParts.push(
`{"type":"image_url","image_url":{"url":"data:${imgUrl._mime};base64,${imgUrl._base64}"}}`,
// Build JSON via JSON.stringify so _mime is properly escaped; base64 data is safe (A-Za-z0-9+/=)
imageJsonParts.push(
JSON.stringify({
type: "image_url",
image_url: {
url: `data:${imgUrl._mime};base64,${imgUrl._base64}`,
},
}),

Copilot uses AI. Check for mistakes.
Comment on lines +12 to +78
export function serializeStreamingChunk(chunk: any): string {
// Fast path: no images, use standard serialization
const images: ImageObject[] | undefined = chunk?.choices?.[0]?.delta?.images;
if (!images || images.length === 0) {
return JSON.stringify(chunk);
}

// Check if any images have deferred base64 references
const hasDeferredImages = images.some(
(img) => (img.image_url as any)._base64,
);
if (!hasDeferredImages) {
return JSON.stringify(chunk);
}

// Build the JSON manually to avoid creating intermediate concatenated URL strings.
// We serialize everything except the images array normally, then splice in the
// image data using direct string building.

// Serialize the images array manually
const imageJsonParts: string[] = [];
for (const img of images) {
const imgUrl = img.image_url as any;
if (imgUrl._mime && imgUrl._base64) {
// Build JSON directly - base64 data is safe (no JSON escaping needed for A-Za-z0-9+/=)
imageJsonParts.push(
`{"type":"image_url","image_url":{"url":"data:${imgUrl._mime};base64,${imgUrl._base64}"}}`,
);
} else {
imageJsonParts.push(JSON.stringify(img));
}
}
const imagesJson = `[${imageJsonParts.join(",")}]`;

// Build the chunk without images, then splice images in
const chunkCopy = {
...chunk,
choices: chunk.choices.map((choice: any, idx: number) => {
if (idx === 0 && choice.delta?.images) {
const { images: _images, ...restDelta } = choice.delta;
return { ...choice, delta: restDelta };
}
return choice;
}),
};

const baseJson = JSON.stringify(chunkCopy);

// Find the delta object in the serialized JSON and insert images
// The delta is inside choices[0], we need to add "images":<imagesJson> to it
const deltaEndIdx = findDeltaObjectEnd(baseJson);
if (deltaEndIdx === -1) {
// Fallback: couldn't find delta object, just do full serialization
// This shouldn't happen but is a safety net
return JSON.stringify(chunk);
}

// Insert images field before the closing } of the delta object
const hasOtherDeltaFields = baseJson[deltaEndIdx - 1] !== "{";
const separator = hasOtherDeltaFields ? "," : "";
return (
baseJson.slice(0, deltaEndIdx) +
separator +
`"images":${imagesJson}` +
baseJson.slice(deltaEndIdx)
);
}
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no test coverage for the new serializeStreamingChunk function, which implements complex manual JSON construction logic with string manipulation and brace matching. Given that this is a critical performance optimization path handling multi-MB payloads, and that it has multiple code paths (fast path, deferred images, fallback), comprehensive tests should be added to verify correctness. Consider testing: chunks without images, chunks with deferred images (_mime/_base64), chunks with already-constructed image URLs, chunks with multiple images, chunks with empty delta, and edge cases in the findDeltaObjectEnd function.

Copilot uses AI. Check for mistakes.
> | null = null;
if (
usedProvider === "google-ai-studio" ||
usedProvider === "google-vertex"
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optimization to avoid redundant extractTokenUsage calls for Google providers should include "obsidian" alongside "google-ai-studio" and "google-vertex". The PR description mentions obsidian as one of the affected Google/Gemini providers, and the extractTokenUsage function treats obsidian identically to the other Google providers (line 52 in extract-token-usage.ts). Without including obsidian here, the optimization will miss this provider, and extractTokenUsage will be called twice for obsidian requests (once here would be skipped, then again at line 4037).

Suggested change
usedProvider === "google-vertex"
usedProvider === "google-vertex" ||
usedProvider === "obsidian"

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/gateway/src/chat/tools/extract-images.ts`:
- Around line 42-48: In serializeImageObject, remove the unnecessary `as any`
cast on img.image_url and avoid manual string interpolation for JSON; instead
build a plain object containing the data URL (compose it from
img.image_url._mime and img.image_url._base64) and use JSON.stringify to produce
the final string so values are properly escaped and no injection can occur
(refer to serializeImageObject and img.image_url/_mime/_base64).

In `@apps/gateway/src/chat/tools/serialize-streaming-chunk.ts`:
- Around line 12-55: The serializeStreamingChunk function currently uses `any`
in multiple places (parameter `chunk`, `imgUrl`, and the `choice` mapping) and
inlines `_mime` into a JSON literal without escaping; replace `any` with the
correct types (e.g., use the existing ImageObject type for elements of `images`,
define or reuse a StreamingChunk/Choice/Delta interface for `chunk` and `choice`
in serializeStreamingChunk and in the map callback) so TypeScript checks the
fields, and when constructing the data URL ensure the MIME string is properly
escaped/serialized (e.g., produce an escaped string for `_mime` instead of raw
interpolation) before building the JSON image entry.

Comment on lines +42 to +48
export function serializeImageObject(img: ImageObject): string {
const imgUrl = img.image_url as any;
if (imgUrl._mime && imgUrl._base64) {
// Build the JSON directly, embedding the base64 data in place
return `{"type":"image_url","image_url":{"url":"data:${imgUrl._mime};base64,${imgUrl._base64}"}}`;
}
return JSON.stringify(img);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n apps/gateway/src/chat/tools/extract-images.ts | head -60

Repository: theopenco/llmgateway

Length of output: 2100


🏁 Script executed:

rg -n "type ImageObject|interface ImageObject" apps/gateway/src/chat/tools/extract-images.ts

Repository: theopenco/llmgateway

Length of output: 46


🏁 Script executed:

rg -n "type ImageObject|interface ImageObject" --type ts --type tsx

Repository: theopenco/llmgateway

Length of output: 91


🏁 Script executed:

cat -n apps/gateway/src/chat/tools/types.ts 2>/dev/null || cat -n apps/gateway/src/chat/tools/types.js

Repository: theopenco/llmgateway

Length of output: 1626


🏁 Script executed:

find apps/gateway/src/chat/tools -name "types.*" -type f

Repository: theopenco/llmgateway

Length of output: 100


Remove the unnecessary as any cast and escape values in the manual JSON string to prevent injection.

The ImageObject type already includes _mime and _base64 as optional properties on image_url, making the cast redundant. Additionally, directly injecting unescaped string values into a JSON template literal can break the output if the value contains quotes or backslashes.

🔧 Suggested fix
 export function serializeImageObject(img: ImageObject): string {
-	const imgUrl = img.image_url as any;
-	if (imgUrl._mime && imgUrl._base64) {
+	const { _mime, _base64 } = img.image_url;
+	if (_mime && _base64) {
+		const safeMime = _mime.replace(/[\\"]/g, "\\$&");
 		// Build the JSON directly, embedding the base64 data in place
-		return `{"type":"image_url","image_url":{"url":"data:${imgUrl._mime};base64,${imgUrl._base64}"}}`;
+		return `{"type":"image_url","image_url":{"url":"data:${safeMime};base64,${_base64}"}}`;
 	}
 	return JSON.stringify(img);
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/gateway/src/chat/tools/extract-images.ts` around lines 42 - 48, In
serializeImageObject, remove the unnecessary `as any` cast on img.image_url and
avoid manual string interpolation for JSON; instead build a plain object
containing the data URL (compose it from img.image_url._mime and
img.image_url._base64) and use JSON.stringify to produce the final string so
values are properly escaped and no injection can occur (refer to
serializeImageObject and img.image_url/_mime/_base64).

Comment on lines +12 to +55
export function serializeStreamingChunk(chunk: any): string {
// Fast path: no images, use standard serialization
const images: ImageObject[] | undefined = chunk?.choices?.[0]?.delta?.images;
if (!images || images.length === 0) {
return JSON.stringify(chunk);
}

// Check if any images have deferred base64 references
const hasDeferredImages = images.some(
(img) => (img.image_url as any)._base64,
);
if (!hasDeferredImages) {
return JSON.stringify(chunk);
}

// Build the JSON manually to avoid creating intermediate concatenated URL strings.
// We serialize everything except the images array normally, then splice in the
// image data using direct string building.

// Serialize the images array manually
const imageJsonParts: string[] = [];
for (const img of images) {
const imgUrl = img.image_url as any;
if (imgUrl._mime && imgUrl._base64) {
// Build JSON directly - base64 data is safe (no JSON escaping needed for A-Za-z0-9+/=)
imageJsonParts.push(
`{"type":"image_url","image_url":{"url":"data:${imgUrl._mime};base64,${imgUrl._base64}"}}`,
);
} else {
imageJsonParts.push(JSON.stringify(img));
}
}
const imagesJson = `[${imageJsonParts.join(",")}]`;

// Build the chunk without images, then splice images in
const chunkCopy = {
...chunk,
choices: chunk.choices.map((choice: any, idx: number) => {
if (idx === 0 && choice.delta?.images) {
const { images: _images, ...restDelta } = choice.delta;
return { ...choice, delta: restDelta };
}
return choice;
}),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, let's read the file to see the actual code
cat -n apps/gateway/src/chat/tools/serialize-streaming-chunk.ts

Repository: theopenco/llmgateway

Length of output: 4373


🏁 Script executed:

#!/bin/bash
# Also check the ImageObject type definition
fd -t f "types.ts|types.tsx" apps/gateway/src/chat/tools/ -x cat -n {}

Repository: theopenco/llmgateway

Length of output: 1626


🏁 Script executed:

#!/bin/bash
# Verify any/as any usage in the file
rg -n "\bas any\b|:\s*any\b" apps/gateway/src/chat/tools/serialize-streaming-chunk.ts

Repository: theopenco/llmgateway

Length of output: 280


Replace any types with proper type definitions and escape _mime in the JSON string.

This module violates the typing guidelines by using any in four places (lines 12, 21, 34, 49) when proper types are already available. Additionally, _mime should be escaped before embedding into the JSON literal on line 38 to prevent breaking the JSON if the MIME type contains special characters.

🔧 Suggested fix
-import type { ImageObject } from "./types.js";
+import type { ImageObject } from "./types.js";
+
+type StreamingChunk = {
+	choices?: Array<{
+		delta?: { images?: ImageObject[] };
+	} & Record<string, unknown>>;
+} & Record<string, unknown>;

-export function serializeStreamingChunk(chunk: any): string {
+export function serializeStreamingChunk(chunk: StreamingChunk): string {
 	// Fast path: no images, use standard serialization
 	const images: ImageObject[] | undefined = chunk?.choices?.[0]?.delta?.images;
 	if (!images || images.length === 0) {
 		return JSON.stringify(chunk);
 	}

 	// Check if any images have deferred base64 references
-	const hasDeferredImages = images.some(
-		(img) => (img.image_url as any)._base64,
-	);
+	const hasDeferredImages = images.some((img) => img.image_url._base64);
 	if (!hasDeferredImages) {
 		return JSON.stringify(chunk);
 	}

 	// Serialize the images array manually
 	const imageJsonParts: string[] = [];
 	for (const img of images) {
-		const imgUrl = img.image_url as any;
-		if (imgUrl._mime && imgUrl._base64) {
+		const { _mime, _base64 } = img.image_url;
+		if (_mime && _base64) {
+			const safeMime = _mime.replace(/[\\"]/g, "\\$&");
 			// Build JSON directly - base64 data is safe (no JSON escaping needed for A-Za-z0-9+/=)
 			imageJsonParts.push(
-				`{"type":"image_url","image_url":{"url":"data:${imgUrl._mime};base64,${imgUrl._base64}"}}`,
+				`{"type":"image_url","image_url":{"url":"data:${safeMime};base64,${_base64}"}}`,
 			);
 		} else {
 			imageJsonParts.push(JSON.stringify(img));
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/gateway/src/chat/tools/serialize-streaming-chunk.ts` around lines 12 -
55, The serializeStreamingChunk function currently uses `any` in multiple places
(parameter `chunk`, `imgUrl`, and the `choice` mapping) and inlines `_mime` into
a JSON literal without escaping; replace `any` with the correct types (e.g., use
the existing ImageObject type for elements of `images`, define or reuse a
StreamingChunk/Choice/Delta interface for `chunk` and `choice` in
serializeStreamingChunk and in the map callback) so TypeScript checks the
fields, and when constructing the data URL ensure the MIME string is properly
escaped/serialized (e.g., produce an escaped string for `_mime` instead of raw
interpolation) before building the JSON image entry.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
apps/gateway/src/chat/tools/extract-images.ts (1)

18-30: ⚠️ Potential issue | 🟠 Major

Empty url field breaks MCP handlers and E2E tests that expect fully-formed data URLs.

The MCP handlers at apps/gateway/src/mcp/mcp.ts (lines 503, 690) explicitly check image.image_url?.url, which will be falsy for empty strings and cause images to be silently skipped. The E2E tests at apps/gateway/src/chat-full.e2e.ts (lines 83–84, 173–174) also fail because empty strings don't match the expected data URL pattern ^data:image\/(png|jpeg|jpg|webp);base64,. Multiple code paths throughout the codebase (images.ts:181, convert-images-to-base64.ts:65) depend on url containing the full data URL.

To address the deferred URL construction goal:

  1. Construct the full URL immediately in extractImages since the data is already available
  2. Or update all consumers (MCP handlers, E2E tests, image processing code) to check _mime and _base64 and reconstruct the URL when url is empty

Additionally, remove the unnecessary as any cast on line 43 of serializeImageObject. The ImageObject.image_url type properly includes the optional _mime and _base64 fields, so the cast violates the coding guideline against using any unless absolutely necessary.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/gateway/src/chat/tools/extract-images.ts` around lines 18 - 30, The
image objects returned by extractImages currently leave image_url.url as an
empty string which breaks MCP handlers and tests; change the mapping in
extractImages to build the full data URL (e.g.,
`data:${part.inlineData.mimeType};base64,${part.inlineData.data}`) and assign
that to image_url.url while still keeping _mime and _base64, and also remove the
unnecessary `as any` cast in serializeImageObject so image_url is used with its
proper typed shape instead of casting to any.
♻️ Duplicate comments (1)
apps/gateway/src/chat/tools/extract-images.ts (1)

42-48: 🛠️ Refactor suggestion | 🟠 Major

Remove unnecessary as any cast and escape the mime type.

The as any cast violates the coding guideline prohibiting any unless absolutely necessary. If ImageObject.image_url already includes optional _mime and _base64 properties (as noted in the AI summary), the cast is redundant.

Additionally, directly interpolating _mime into the JSON string without escaping could break the output if the value contains quotes or backslashes—though unlikely for standard mime types, it's a latent injection risk.

🔧 Suggested fix
 export function serializeImageObject(img: ImageObject): string {
-	const imgUrl = img.image_url as any;
-	if (imgUrl._mime && imgUrl._base64) {
+	const { _mime, _base64 } = img.image_url;
+	if (_mime && _base64) {
+		const safeMime = _mime.replace(/[\\"]/g, "\\$&");
 		// Build the JSON directly, embedding the base64 data in place
-		return `{"type":"image_url","image_url":{"url":"data:${imgUrl._mime};base64,${imgUrl._base64}"}}`;
+		return `{"type":"image_url","image_url":{"url":"data:${safeMime};base64,${_base64}"}}`;
 	}
 	return JSON.stringify(img);
 }

As per coding guidelines: "Never use any or as any in TypeScript unless absolutely necessary".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/gateway/src/chat/tools/extract-images.ts` around lines 42 - 48, In
serializeImageObject, remove the unnecessary "as any" cast on
ImageObject.image_url and instead narrow/check the type (e.g., verify image_url
is an object and that image_url._mime and image_url._base64 are strings) before
using them; to avoid injection/escaping issues, construct a plain JS object with
the data URL assembled safely (escape the mime by letting JSON.stringify handle
it or validate it as a token) and then return JSON.stringify(obj) rather than
interpolating values directly into a JSON string; update references to
serializeImageObject, ImageObject.image_url, _mime and _base64 accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@apps/gateway/src/chat/tools/extract-images.ts`:
- Around line 18-30: The image objects returned by extractImages currently leave
image_url.url as an empty string which breaks MCP handlers and tests; change the
mapping in extractImages to build the full data URL (e.g.,
`data:${part.inlineData.mimeType};base64,${part.inlineData.data}`) and assign
that to image_url.url while still keeping _mime and _base64, and also remove the
unnecessary `as any` cast in serializeImageObject so image_url is used with its
proper typed shape instead of casting to any.

---

Duplicate comments:
In `@apps/gateway/src/chat/tools/extract-images.ts`:
- Around line 42-48: In serializeImageObject, remove the unnecessary "as any"
cast on ImageObject.image_url and instead narrow/check the type (e.g., verify
image_url is an object and that image_url._mime and image_url._base64 are
strings) before using them; to avoid injection/escaping issues, construct a
plain JS object with the data URL assembled safely (escape the mime by letting
JSON.stringify handle it or validate it as a token) and then return
JSON.stringify(obj) rather than interpolating values directly into a JSON
string; update references to serializeImageObject, ImageObject.image_url, _mime
and _base64 accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 474bb414-cdd1-4c9a-8ca5-d0bc547803c5

📥 Commits

Reviewing files that changed from the base of the PR and between 2f8a68e and 87ac628.

📒 Files selected for processing (2)
  • apps/gateway/src/chat/chat.ts
  • apps/gateway/src/chat/tools/extract-images.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • apps/gateway/src/chat/chat.ts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants