feat: add custom moderation mode by steebchen · Pull Request #1993 · theopenco/llmgateway

steebchen · 2026-04-08T18:18:26Z

Summary

add a custom content filter method alongside keywords and openai
route moderation through LLMGateway /v1/chat/completions using LLM_CONTENT_FILTER_CUSTOM_API_KEY and LLM_CONTENT_FILTER_CUSTOM_MODEL
preserve existing fail-open logging behavior and store custom moderation results in the existing gateway moderation log payload shape

Testing

pnpm exec vitest run --no-file-parallelism apps/gateway/src/chat/tools/custom-content-filter.spec.ts
pnpm exec vitest run --no-file-parallelism apps/gateway/src/chat/tools/check-content-filter.spec.ts apps/gateway/src/chat/tools/custom-content-filter.spec.ts apps/gateway/src/api.spec.ts -t "custom content filter|getCustomContentFilterConfig|returns custom when method is set to custom"
pnpm build:core
pnpm format
pnpm build

Summary by CodeRabbit

New Features
- Added support for custom content filtering: integrate external moderation services, configure custom moderation models and endpoints, and combine internal + custom moderation responses.
- Content-filter behaviors: explicit "enabled" vs "monitor" handling, graceful "fails open" when config is missing, and model-targeting to skip non‑targeted models.
- Improved logging and persisted moderation metadata for clearer observability.
Tests
- Added comprehensive tests covering modes, config validation/overrides, request/response parsing (including JSON fences), and logging/outputs.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 01a1cc8f60

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-08T18:22:03Z

apps/gateway/src/chat/tools/custom-content-filter.ts

+		upstreamResponse = await fetch(getCustomContentFilterUrl(), {
+			method: "POST",
+			headers: {
+				"Content-Type": "application/json",
+				Authorization: `Bearer ${config.apiKey}`,
+				"X-Client-Request-Id": context.requestId,
+			},


Bypass content filter for internal custom moderation requests

Calling fetch(getCustomContentFilterUrl(), ...) here sends moderation traffic back to /v1/chat/completions with no internal bypass signal, so when GATEWAY_URL resolves to this same gateway (the default path via getGatewayPublicBaseUrl) and LLM_CONTENT_FILTER_MODELS is unset or includes LLM_CONTENT_FILTER_CUSTOM_MODEL, the nested request re-enters chat.ts and executes the contentFilterMethod === "custom" branch again, recursively issuing more moderation calls until timeout. This can make custom moderation effectively unusable and create load amplification; the internal moderation request needs an explicit skip path.

Useful? React with 👍 / 👎.

coderabbitai · 2026-04-08T18:23:32Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f98c3ffa-34b1-494d-b232-0ccd34c87e2f

📥 Commits

Reviewing files that changed from the base of the PR and between 3017e46 and d0a319a.

📒 Files selected for processing (5)

apps/gateway/src/api.spec.ts
apps/gateway/src/chat/tools/check-content-filter.spec.ts
apps/gateway/src/chat/tools/check-content-filter.ts
apps/gateway/src/chat/tools/custom-content-filter.spec.ts
apps/gateway/src/chat/tools/custom-content-filter.ts

✅ Files skipped from review due to trivial changes (1)

apps/gateway/src/api.spec.ts

🚧 Files skipped from review as they are similar to previous changes (2)

apps/gateway/src/chat/tools/custom-content-filter.spec.ts
apps/gateway/src/chat/tools/check-content-filter.spec.ts

Walkthrough

Adds a new "custom" content-filter: configuration helpers, a custom moderation checker that POSTs to an upstream /v1/chat/completions endpoint, integrates results into the chat handler flow, and adds comprehensive tests covering success, overrides, parsing, and failure modes.

Changes

Cohort / File(s)	Summary
Custom Content Filter Implementation `apps/gateway/src/chat/tools/custom-content-filter.ts`, `apps/gateway/src/chat/tools/custom-content-filter.spec.ts`	New module implementing `checkCustomContentFilter`, Zod/JSON-schema validation, request construction (system prompt + messages + image metadata), timeout/abort handling, parsing of JSON (including `json fenced` content), verdict normalization, logging, and tests for request shape, response parsing, base URL override, and fail-open behavior.
Content Filter Configuration `apps/gateway/src/chat/tools/check-content-filter.ts`, `apps/gateway/src/chat/tools/check-content-filter.spec.ts`	Adds `"custom"` to `ContentFilterMethod`, new `CustomContentFilterConfig` and helpers (`getCustomContentFilterBaseUrl`, `getCustomContentFilterConfig`) for env-driven config, base URL normalization, includeImages flag, required-env validation, and tests for env handling and normalization.
Chat Handler Integration `apps/gateway/src/chat/chat.ts`	Calls `checkCustomContentFilter` when method is `"custom"`, treats its `flagged` result as a content-filter match, and changes logging to aggregate `gatewayContentFilterResponses` from both OpenAI and custom checks.
API Integration Tests `apps/gateway/src/api.spec.ts`	Adds four tests validating gateway behavior with custom filter in `enabled` (block), `monitor` (observe-only), missing-config (logs error, fails open), and model-not-targeted (skips custom filter) scenarios, asserting logs, persisted fields, and response modifications.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant ChatHandler as Chat Handler
    participant CustomFilter as Custom Content Filter
    participant ModerationAPI as External Moderation API
    participant Logger

    Client->>ChatHandler: POST /v1/chat/completions
    activate ChatHandler

    ChatHandler->>CustomFilter: checkCustomContentFilter(messages, context, signal)
    activate CustomFilter

    CustomFilter->>CustomFilter: Load env config, build system prompt + payload
    CustomFilter->>ModerationAPI: POST <base>/v1/chat/completions (model, response_format=json_schema)
    activate ModerationAPI
    ModerationAPI-->>CustomFilter: HTTP response (verdict in assistant content)
    deactivate ModerationAPI

    CustomFilter->>CustomFilter: Parse/validate verdict JSON, derive flagged + responses
    CustomFilter-->>ChatHandler: CustomContentFilterCheckResult
    deactivate CustomFilter

    ChatHandler->>ChatHandler: Aggregate gatewayContentFilterResponses (OpenAI + Custom)
    alt flagged
        ChatHandler->>ChatHandler: Null message.content, set finish_reason = "content_filter"
    else not flagged
        ChatHandler->>ChatHandler: Preserve completion content (or monitor behavior)
    end

    ChatHandler->>Logger: Persist logs including gatewayContentFilterResponses
    ChatHandler-->>Client: Completion response
    deactivate ChatHandler

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

#1936: Overlaps with changes to apps/gateway/src/chat/chat.ts and content-filter integration logic.
#1922: Adds gateway content-filter response storage and logging propagation, related to aggregation/persistence changes here.
#1908: Refactors OpenAI moderation response aggregation and signatures—related to combining OpenAI and custom moderation responses.

Suggested labels

auto-merge

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 5.26% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: add custom moderation mode' is clear, specific, and accurately reflects the main change—adding a new custom content filter method as an alternative to existing moderation approaches.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch custom-moderation-mode

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

Adds a new “custom” gateway content-filter method that performs moderation by calling back into the LLMGateway /v1/chat/completions endpoint with a dedicated API key/model, and integrates its results into the existing moderation logging shape.

Changes:

Introduces custom as a supported LLM_CONTENT_FILTER_METHOD, with env-based config (LLM_CONTENT_FILTER_CUSTOM_API_KEY, LLM_CONTENT_FILTER_CUSTOM_MODEL).
Implements custom moderation execution/parsing + fail-open logging in a new tool module.
Wires custom moderation into /v1/chat/completions flow and adds unit + API-level tests covering block/monitor/missing-config/skip-by-model behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
apps/gateway/src/chat/tools/custom-content-filter.ts	New custom moderation implementation that calls gateway chat completions, parses verdict JSON, and logs results/errors.
apps/gateway/src/chat/tools/custom-content-filter.spec.ts	Unit tests for custom moderation request shape, JSON parsing, and fail-open behavior.
apps/gateway/src/chat/tools/check-content-filter.ts	Extends content-filter method enum and adds env-driven custom moderation config helper.
apps/gateway/src/chat/tools/check-content-filter.spec.ts	Tests for new `custom` method selection and config validation.
apps/gateway/src/chat/chat.ts	Integrates custom moderation into content-filter match logic and log payload aggregation.
apps/gateway/src/api.spec.ts	Adds end-to-end API tests for custom method (block/monitor/missing-config/skip model).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-08T18:24:22Z

apps/gateway/src/chat/chat.ts

+	const customContentFilterResult =
+		shouldApplyGatewayContentFilter && contentFilterMethod === "custom"
+			? await checkCustomContentFilter(
+					messages as BaseMessage[],
+					{
+						requestId,
+						organizationId: project.organizationId,
+						projectId: project.id,
+						apiKeyId: apiKey.id,
+					},
+					c.req.raw.signal,
+				)
+			: null;


Custom moderation calls back into the gateway’s own /v1/chat/completions. If LLM_CONTENT_FILTER_MODELS is unset/empty (meaning “apply to all models”) and LLM_CONTENT_FILTER_METHOD=custom, the moderation request itself will also go through this same gateway content filter path and trigger another custom moderation call, causing an infinite recursion / request storm. Add an explicit internal bypass (e.g., send a dedicated header on the internal moderation fetch and have chat.ts skip gateway content filtering when that header is present), or otherwise ensure the moderation model is always excluded from filtering in a robust, non-config-dependent way.

Copilot · 2026-04-08T18:24:22Z

apps/gateway/src/chat/tools/custom-content-filter.ts

+		upstreamResponse = await fetch(getCustomContentFilterUrl(), {
+			method: "POST",
+			headers: {
+				"Content-Type": "application/json",
+				Authorization: `Bearer ${config.apiKey}`,
+				"X-Client-Request-Id": context.requestId,
+			},
+			body: JSON.stringify({
+				model: config.model,
+				temperature: 0,
+				max_tokens: CUSTOM_CONTENT_FILTER_MAX_TOKENS,
+				messages: [
+					{
+						role: "system",
+						content: CUSTOM_CONTENT_FILTER_SYSTEM_PROMPT,
+					},
+					{
+						role: "user",
+						content: buildCustomContentFilterInput(messages),
+					},
+				],
+			}),
+			signal,
+		});


The custom moderation fetch hits the gateway’s own chat-completions endpoint. Without an explicit bypass header/flag, this can recurse when custom content filtering is enabled for the moderation model (default LLM_CONTENT_FILTER_MODELS behavior applies filtering to all models). Consider adding an internal-only header to this request and updating the chat-completions handler to skip gateway content filtering when that header is present.

Copilot · 2026-04-08T18:24:22Z

apps/gateway/src/chat/tools/custom-content-filter.ts

+	}
+
+	if (part.type === "image") {
+		return `inline-image: media_type=${part.source.media_type}, bytes=${part.source.data.length}`;


getImageReference() reports bytes=${part.source.data.length} for inline images, but source.data is typically base64 text (string length), not actual bytes. Either rename the field (e.g., base64Length) or compute real byte size if needed to avoid misleading moderation inputs/logs.

Suggested change

return `inline-image: media_type=${part.source.media_type}, bytes=${part.source.data.length}`;

return `inline-image: media_type=${part.source.media_type}, base64Length=${part.source.data.length}`;

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

apps/gateway/src/chat/tools/check-content-filter.ts (1)

47-53: ⚠️ Potential issue | 🟡 Minor

Let explicit LLM_CONTENT_FILTER_METHOD override legacy mode.

At Line 47, legacy LLM_CONTENT_FILTER_MODE=openai takes precedence over LLM_CONTENT_FILTER_METHOD=custom, making custom mode unreachable in mixed env configurations.

🔁 Proposed precedence fix

-	if (envValue === "openai" || legacyModeEnvValue === "openai") {
-		return "openai";
-	}
-
 	if (envValue === "custom") {
 		return "custom";
 	}
+
+	if (envValue === "openai" || legacyModeEnvValue === "openai") {
+		return "openai";
+	}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/gateway/src/chat/tools/check-content-filter.ts` around lines 47 - 53,
The current logic gives legacyModeEnvValue ("LLM_CONTENT_FILTER_MODE")
precedence over envValue ("LLM_CONTENT_FILTER_METHOD"), causing "custom" to be
unreachable when legacyModeEnvValue === "openai"; change the precedence so
envValue is checked first (i.e., evaluate envValue === "custom" or envValue ===
"openai" before checking legacyModeEnvValue), or explicitly prefer envValue when
it is set (use envValue if truthy, otherwise fall back to legacyModeEnvValue) so
that envValue="custom" can override legacyModeEnvValue="openai".

🧹 Nitpick comments (1)

apps/gateway/src/chat/tools/custom-content-filter.spec.ts (1)

34-53: Add a regression assertion for the internal moderation bypass header.

Given custom moderation calls back into /v1/chat/completions, this test should also assert the internal bypass header once implemented, so recursion cannot regress silently.

🧪 Suggested assertion

 				const headers = new Headers(init?.headers);
 				expect(headers.get("authorization")).toBe("Bearer custom-api-key");
 				expect(headers.get("x-client-request-id")).toBe("request-id");
+				expect(headers.get("x-llmgateway-internal-content-filter")).toBe("1");

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/gateway/src/chat/tools/custom-content-filter.spec.ts` around lines 34 -
53, The test needs an assertion that the internal moderation-bypass header is
sent when custom moderation calls back into /v1/chat/completions; inside the
mocked fetch in custom-content-filter.spec.ts (where fetchSpy is created and
headers are checked), add an assertion that
headers.get("<internal-moderation-bypass-header>") equals the same value used by
the implementation (or use the implementation constant, e.g.
INTERNAL_MODERATION_BYPASS_HEADER and its expected value such as "1" or "true")
so recursion cannot regress silently.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/gateway/src/chat/chat.ts`:
- Around line 1913-1925: The custom content-filter branch can recurse because
checkCustomContentFilter posts back to the gateway endpoint and re-enters the
same contentFilterMethod === "custom" path; modify the gateway branch that calls
checkCustomContentFilter (the code using
shouldApplyGatewayContentFilter/contentFilterMethod and function
checkCustomContentFilter) to detect and skip moderation when a special internal
header is present, and update the internal fetch inside
apps/gateway/src/chat/tools/custom-content-filter.ts (the request to
GATEWAY_URL/v1/chat/completions) to include a unique bypass header (e.g.,
X-Gateway-Internal-Moderation: 1) so the gateway can short-circuit and avoid
re-invoking the custom content filter.

In `@apps/gateway/src/chat/tools/check-content-filter.spec.ts`:
- Around line 247-263: The tests for getCustomContentFilterConfig only cover
missing (undefined) env vars; also add cases where
LLM_CONTENT_FILTER_CUSTOM_API_KEY or LLM_CONTENT_FILTER_CUSTOM_MODEL are present
but empty strings (""), because empty values should be treated the same as
missing and cause the same error; update the two specs ("throws when the custom
api key is missing" and "throws when the custom model is missing") or add new
tests to set the respective env var to "" before calling
getCustomContentFilterConfig() and expect the same toThrow messages.

In `@apps/gateway/src/chat/tools/custom-content-filter.ts`:
- Around line 395-418: The moderation POST is re-entering the gateway's content
filter causing recursion; mark the internal moderation request and skip gateway
filtering. Add the internal-moderation header (e.g.,
"x-llmgateway-internal-moderation": "true") to the fetch call created by
getCustomContentFilterUrl()/the upstreamResponse POST so the request can be
identified, and ensure the gating logic in checkCustomContentFilter()/chat.ts
uses that header (isInternalModerationRequest) to bypass applying the gateway
content filter to requests with that header set.
- Around line 139-146: The getImageReference function currently copies
part.image_url.url verbatim for part.type === "image_url", which can leak
sensitive query params or hostnames; change it to never include the raw URL
string — instead return a sanitized placeholder (e.g., "remote-image:
[redacted]") or a safe summary that omits the URL/hostname/query (you may
include non-identifying metadata like image size or safe media type if
available), and ensure the same treatment is applied for part.type === "image"
if any source fields could leak identifying info; update getImageReference to
reference part.image_url.url only to extract non-sensitive metadata (if needed)
but do not emit the URL itself.

---

Outside diff comments:
In `@apps/gateway/src/chat/tools/check-content-filter.ts`:
- Around line 47-53: The current logic gives legacyModeEnvValue
("LLM_CONTENT_FILTER_MODE") precedence over envValue
("LLM_CONTENT_FILTER_METHOD"), causing "custom" to be unreachable when
legacyModeEnvValue === "openai"; change the precedence so envValue is checked
first (i.e., evaluate envValue === "custom" or envValue === "openai" before
checking legacyModeEnvValue), or explicitly prefer envValue when it is set (use
envValue if truthy, otherwise fall back to legacyModeEnvValue) so that
envValue="custom" can override legacyModeEnvValue="openai".

---

Nitpick comments:
In `@apps/gateway/src/chat/tools/custom-content-filter.spec.ts`:
- Around line 34-53: The test needs an assertion that the internal
moderation-bypass header is sent when custom moderation calls back into
/v1/chat/completions; inside the mocked fetch in custom-content-filter.spec.ts
(where fetchSpy is created and headers are checked), add an assertion that
headers.get("<internal-moderation-bypass-header>") equals the same value used by
the implementation (or use the implementation constant, e.g.
INTERNAL_MODERATION_BYPASS_HEADER and its expected value such as "1" or "true")
so recursion cannot regress silently.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 9e28cf75-15b4-4484-a285-efec64d1977c

📥 Commits

Reviewing files that changed from the base of the PR and between 6b064a1 and 01a1cc8.

📒 Files selected for processing (6)

apps/gateway/src/api.spec.ts
apps/gateway/src/chat/chat.ts
apps/gateway/src/chat/tools/check-content-filter.spec.ts
apps/gateway/src/chat/tools/check-content-filter.ts
apps/gateway/src/chat/tools/custom-content-filter.spec.ts
apps/gateway/src/chat/tools/custom-content-filter.ts

coderabbitai · 2026-04-08T18:34:40Z

apps/gateway/src/chat/chat.ts

+	const customContentFilterResult =
+		shouldApplyGatewayContentFilter && contentFilterMethod === "custom"
+			? await checkCustomContentFilter(
+					messages as BaseMessage[],
+					{
+						requestId,
+						organizationId: project.organizationId,
+						projectId: project.id,
+						apiKeyId: apiKey.id,
+					},
+					c.req.raw.signal,
+				)
+			: null;


⚠️ Potential issue | 🔴 Critical

Prevent self-recursive moderation requests in custom mode.

At Line 1913, this branch invokes checkCustomContentFilter, and that helper posts back to GATEWAY_URL/v1/chat/completions (apps/gateway/src/chat/tools/custom-content-filter.ts, Lines 384-420) without a bypass marker. That internal request can re-enter the same contentFilterMethod === "custom" path and loop indefinitely.

🔧 Proposed guard + internal marker

+ const isInternalContentFilterRequest = + c.req.header("x-llmgateway-internal-content-filter") === "1"; const shouldApplyGatewayContentFilter = + !isInternalContentFilterRequest && contentFilterMode !== "disabled" && shouldApplyContentFilterToModel(requestedModel);

Also add this header on the internal moderation fetch in apps/gateway/src/chat/tools/custom-content-filter.ts:

headers: { "Content-Type": "application/json", Authorization: `Bearer ${config.apiKey}`, "X-Client-Request-Id": context.requestId, + "X-LLMGateway-Internal-Content-Filter": "1", },

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@apps/gateway/src/chat/chat.ts` around lines 1913 - 1925, The custom content-filter branch can recurse because checkCustomContentFilter posts back to the gateway endpoint and re-enters the same contentFilterMethod === "custom" path; modify the gateway branch that calls checkCustomContentFilter (the code using shouldApplyGatewayContentFilter/contentFilterMethod and function checkCustomContentFilter) to detect and skip moderation when a special internal header is present, and update the internal fetch inside apps/gateway/src/chat/tools/custom-content-filter.ts (the request to GATEWAY_URL/v1/chat/completions) to include a unique bypass header (e.g., X-Gateway-Internal-Moderation: 1) so the gateway can short-circuit and avoid re-invoking the custom content filter.

coderabbitai · 2026-04-08T18:34:40Z

apps/gateway/src/chat/tools/check-content-filter.spec.ts

+	it("throws when the custom api key is missing", () => {
+		delete process.env.LLM_CONTENT_FILTER_CUSTOM_API_KEY;
+		process.env.LLM_CONTENT_FILTER_CUSTOM_MODEL = "openai/gpt-5-mini";
+
+		expect(() => getCustomContentFilterConfig()).toThrow(
+			"LLM_CONTENT_FILTER_CUSTOM_API_KEY environment variable is required for custom content filter",
+		);
+	});
+
+	it("throws when the custom model is missing", () => {
+		process.env.LLM_CONTENT_FILTER_CUSTOM_API_KEY = "custom-key";
+		delete process.env.LLM_CONTENT_FILTER_CUSTOM_MODEL;
+
+		expect(() => getCustomContentFilterConfig()).toThrow(
+			"LLM_CONTENT_FILTER_CUSTOM_MODEL environment variable is required for custom content filter",
+		);
+	});


⚠️ Potential issue | 🟡 Minor

Cover blank env-var values in these new config tests.

These cases only assert undefined, but misconfigured deployments usually fail as "". That leaves the new validation path partially untested.

🧪 Suggested test additions

+ it("throws when the custom api key is empty", () => { + process.env.LLM_CONTENT_FILTER_CUSTOM_API_KEY = ""; + process.env.LLM_CONTENT_FILTER_CUSTOM_MODEL = "openai/gpt-5-mini"; + + expect(() => getCustomContentFilterConfig()).toThrow( + "LLM_CONTENT_FILTER_CUSTOM_API_KEY environment variable is required for custom content filter", + ); + }); + + it("throws when the custom model is empty", () => { + process.env.LLM_CONTENT_FILTER_CUSTOM_API_KEY = "custom-key"; + process.env.LLM_CONTENT_FILTER_CUSTOM_MODEL = ""; + + expect(() => getCustomContentFilterConfig()).toThrow( + "LLM_CONTENT_FILTER_CUSTOM_MODEL environment variable is required for custom content filter", + ); + });

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

it("throws when the custom api key is missing", () => {

delete process.env.LLM_CONTENT_FILTER_CUSTOM_API_KEY;

process.env.LLM_CONTENT_FILTER_CUSTOM_MODEL = "openai/gpt-5-mini";

expect(() => getCustomContentFilterConfig()).toThrow(

"LLM_CONTENT_FILTER_CUSTOM_API_KEY environment variable is required for custom content filter",

);

});

it("throws when the custom model is missing", () => {

process.env.LLM_CONTENT_FILTER_CUSTOM_API_KEY = "custom-key";

delete process.env.LLM_CONTENT_FILTER_CUSTOM_MODEL;

expect(() => getCustomContentFilterConfig()).toThrow(

"LLM_CONTENT_FILTER_CUSTOM_MODEL environment variable is required for custom content filter",

);

});

it("throws when the custom api key is missing", () => {

delete process.env.LLM_CONTENT_FILTER_CUSTOM_API_KEY;

process.env.LLM_CONTENT_FILTER_CUSTOM_MODEL = "openai/gpt-5-mini";

expect(() => getCustomContentFilterConfig()).toThrow(

"LLM_CONTENT_FILTER_CUSTOM_API_KEY environment variable is required for custom content filter",

);

});

it("throws when the custom model is missing", () => {

process.env.LLM_CONTENT_FILTER_CUSTOM_API_KEY = "custom-key";

delete process.env.LLM_CONTENT_FILTER_CUSTOM_MODEL;

expect(() => getCustomContentFilterConfig()).toThrow(

"LLM_CONTENT_FILTER_CUSTOM_MODEL environment variable is required for custom content filter",

);

});

it("throws when the custom api key is empty", () => {

process.env.LLM_CONTENT_FILTER_CUSTOM_API_KEY = "";

process.env.LLM_CONTENT_FILTER_CUSTOM_MODEL = "openai/gpt-5-mini";

expect(() => getCustomContentFilterConfig()).toThrow(

"LLM_CONTENT_FILTER_CUSTOM_API_KEY environment variable is required for custom content filter",

);

});

it("throws when the custom model is empty", () => {

process.env.LLM_CONTENT_FILTER_CUSTOM_API_KEY = "custom-key";

process.env.LLM_CONTENT_FILTER_CUSTOM_MODEL = "";

expect(() => getCustomContentFilterConfig()).toThrow(

"LLM_CONTENT_FILTER_CUSTOM_MODEL environment variable is required for custom content filter",

);

});

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@apps/gateway/src/chat/tools/check-content-filter.spec.ts` around lines 247 - 263, The tests for getCustomContentFilterConfig only cover missing (undefined) env vars; also add cases where LLM_CONTENT_FILTER_CUSTOM_API_KEY or LLM_CONTENT_FILTER_CUSTOM_MODEL are present but empty strings (""), because empty values should be treated the same as missing and cause the same error; update the two specs ("throws when the custom api key is missing" and "throws when the custom model is missing") or add new tests to set the respective env var to "" before calling getCustomContentFilterConfig() and expect the same toThrow messages.

coderabbitai · 2026-04-08T18:34:40Z

apps/gateway/src/chat/tools/custom-content-filter.ts

+function getImageReference(part: MessageContent): string | null {
+	if (part.type === "image_url") {
+		return `remote-image: ${part.image_url.url}`;
+	}
+
+	if (part.type === "image") {
+		return `inline-image: media_type=${part.source.media_type}, bytes=${part.source.data.length}`;
+	}


⚠️ Potential issue | 🟠 Major

Do not send raw image URLs to the moderator.

part.image_url.url is copied verbatim into the prompt, which can leak presigned query params, internal hostnames, or user identifiers to the moderation model even though it only sees the URL as text.

🔐 Safer handling

function getImageReference(part: MessageContent): string | null { if (part.type === "image_url") { - return `remote-image: ${part.image_url.url}`; + try { + const url = new URL(part.image_url.url); + if (url.protocol !== "http:" && url.protocol !== "https:") { + return `remote-image: [${url.protocol.replace(":", "")}]`; + } + return `remote-image: ${url.origin}${url.pathname}`; + } catch { + return "remote-image: [redacted]"; + } }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@apps/gateway/src/chat/tools/custom-content-filter.ts` around lines 139 - 146, The getImageReference function currently copies part.image_url.url verbatim for part.type === "image_url", which can leak sensitive query params or hostnames; change it to never include the raw URL string — instead return a sanitized placeholder (e.g., "remote-image: [redacted]") or a safe summary that omits the URL/hostname/query (you may include non-identifying metadata like image size or safe media type if available), and ensure the same treatment is applied for part.type === "image" if any source fields could leak identifying info; update getImageReference to reference part.image_url.url only to extract non-sensitive metadata (if needed) but do not emit the URL itself.

coderabbitai · 2026-04-08T18:34:40Z

apps/gateway/src/chat/tools/custom-content-filter.ts

+		upstreamResponse = await fetch(getCustomContentFilterUrl(), {
+			method: "POST",
+			headers: {
+				"Content-Type": "application/json",
+				Authorization: `Bearer ${config.apiKey}`,
+				"X-Client-Request-Id": context.requestId,
+			},
+			body: JSON.stringify({
+				model: config.model,
+				temperature: 0,
+				max_tokens: CUSTOM_CONTENT_FILTER_MAX_TOKENS,
+				messages: [
+					{
+						role: "system",
+						content: CUSTOM_CONTENT_FILTER_SYSTEM_PROMPT,
+					},
+					{
+						role: "user",
+						content: buildCustomContentFilterInput(messages),
+					},
+				],
+			}),
+			signal,
+		});


⚠️ Potential issue | 🔴 Critical

Skip content filtering on the internal moderation request.

This POST goes through the same /v1/chat/completions handler that invokes checkCustomContentFilter() in apps/gateway/src/chat/chat.ts:1891-1932. With LLM_CONTENT_FILTER_MODELS unset — the default apply-to-all behavior — or configured to include config.model, the moderation request re-enters this code and recursively moderates itself until timeout, so custom filtering fail-opens instead of classifying the original request.

🔁 One way to break the recursion

upstreamResponse = await fetch(getCustomContentFilterUrl(), { method: "POST", headers: { "Content-Type": "application/json", Authorization: `Bearer ${config.apiKey}`, "X-Client-Request-Id": context.requestId, + "X-LLMGateway-Internal-Moderation": "true", },

// apps/gateway/src/chat/chat.ts const isInternalModerationRequest = c.req.header("x-llmgateway-internal-moderation") === "true"; const shouldApplyGatewayContentFilter = !isInternalModerationRequest && contentFilterMode !== "disabled" && shouldApplyContentFilterToModel(requestedModel);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@apps/gateway/src/chat/tools/custom-content-filter.ts` around lines 395 - 418, The moderation POST is re-entering the gateway's content filter causing recursion; mark the internal moderation request and skip gateway filtering. Add the internal-moderation header (e.g., "x-llmgateway-internal-moderation": "true") to the fetch call created by getCustomContentFilterUrl()/the upstreamResponse POST so the request can be identified, and ensure the gating logic in checkCustomContentFilter()/chat.ts uses that header (isInternalModerationRequest) to bypass applying the gateway content filter to requests with that header set.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d2b640048d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-08T20:02:43Z

apps/gateway/src/chat/tools/custom-content-filter.ts

+	if (part.type === "image") {
+		return `inline-image: media_type=${part.source.media_type}, bytes=${part.source.data.length}`;
+	}


Send real inline images to custom moderation

When contentFilterMethod === "custom", inline images are converted to metadata (media_type and byte length) instead of image data, so the moderation model cannot inspect the actual pixels. In enabled mode this allows image-only unsafe content to pass as unflagged whenever accompanying text is benign or empty, which undermines the filter for multimodal requests. The moderation request should include the real image payload (or use a moderation endpoint that accepts image inputs) rather than a textual placeholder.

Useful? React with 👍 / 👎.

coderabbitai

🧹 Nitpick comments (2)

apps/gateway/src/chat/tools/custom-content-filter.ts (1)

324-330: Consider extracting the hardcoded score threshold.

The 0.5 threshold for category scores appears here and again in buildCustomModerationPayload (line 397). Consider extracting to a named constant for consistency and easier tuning.

♻️ Optional: Extract threshold constant

+const CUSTOM_CONTENT_FILTER_SCORE_THRESHOLD = 0.5;
+
 function getFlaggedCategories(payload: ModerationApiPayload): string[] {
 	// ...
 	for (const [category, score] of Object.entries(
 		result.category_scores ?? {},
 	)) {
-		if (score > 0.5) {
+		if (score > CUSTOM_CONTENT_FILTER_SCORE_THRESHOLD) {
 			categories.add(category);
 		}
 	}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/gateway/src/chat/tools/custom-content-filter.ts` around lines 324 - 330,
Extract the hardcoded 0.5 into a single named constant (e.g.,
CATEGORY_SCORE_THRESHOLD = 0.5) and replace the literal in the category loop
inside custom-content-filter.ts (the for (const [category, score] of
Object.entries(...) { if (score > 0.5) ... }) and the other occurrence in
buildCustomModerationPayload) so both sites reference the same constant for
consistency and easier tuning; ensure the constant is exported or colocated at
the top of the module so both functions use it.

apps/gateway/src/chat/tools/custom-content-filter.spec.ts (1)

7-227: Consider additional test coverage.

The current tests cover the happy path and missing config scenario well. Consider adding tests for:

Network timeout handling
Non-2xx upstream responses
Invalid/malformed upstream JSON responses
Request cancellation via AbortSignal

These would increase confidence in the fail-open behavior under various failure modes.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/gateway/src/chat/tools/custom-content-filter.spec.ts` around lines 7 -
227, Add tests for failure modes of checkCustomContentFilter: (1) simulate a
network timeout by mocking global.fetch to reject (e.g., throw a TypeError or a
custom timeout Error) and assert the function returns fail-open (flagged false),
does not crash, and logs an error; (2) simulate a non-2xx upstream response by
returning a Response with status 500 and optional error body, then assert
fail-open behavior and logged error; (3) simulate malformed/invalid JSON by
returning a 200 Response whose choices.message.content is not valid JSON and
assert the parser falls back safely and returns fail-open while logging; and (4)
test request cancellation by creating an AbortController, passing its signal
into checkCustomContentFilter (where supported) and mocking fetch to reject with
an AbortError or to observe the signal, then assert the function handles
cancellation by returning fail-open and logging. Reference the test file and the
checkCustomContentFilter function to add these cases, mock logger.error to
inspect logs, and ensure expectations mirror existing tests (fetch calls,
result.flagged false, result.responses empty or sanitized).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@apps/gateway/src/chat/tools/custom-content-filter.spec.ts`:
- Around line 7-227: Add tests for failure modes of checkCustomContentFilter:
(1) simulate a network timeout by mocking global.fetch to reject (e.g., throw a
TypeError or a custom timeout Error) and assert the function returns fail-open
(flagged false), does not crash, and logs an error; (2) simulate a non-2xx
upstream response by returning a Response with status 500 and optional error
body, then assert fail-open behavior and logged error; (3) simulate
malformed/invalid JSON by returning a 200 Response whose choices.message.content
is not valid JSON and assert the parser falls back safely and returns fail-open
while logging; and (4) test request cancellation by creating an AbortController,
passing its signal into checkCustomContentFilter (where supported) and mocking
fetch to reject with an AbortError or to observe the signal, then assert the
function handles cancellation by returning fail-open and logging. Reference the
test file and the checkCustomContentFilter function to add these cases, mock
logger.error to inspect logs, and ensure expectations mirror existing tests
(fetch calls, result.flagged false, result.responses empty or sanitized).

In `@apps/gateway/src/chat/tools/custom-content-filter.ts`:
- Around line 324-330: Extract the hardcoded 0.5 into a single named constant
(e.g., CATEGORY_SCORE_THRESHOLD = 0.5) and replace the literal in the category
loop inside custom-content-filter.ts (the for (const [category, score] of
Object.entries(...) { if (score > 0.5) ... }) and the other occurrence in
buildCustomModerationPayload) so both sites reference the same constant for
consistency and easier tuning; ensure the constant is exported or colocated at
the top of the module so both functions use it.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f8081615-32b3-458e-b58b-08f8ec5996d6

📥 Commits

Reviewing files that changed from the base of the PR and between 01a1cc8 and d2b6400.

📒 Files selected for processing (3)

apps/gateway/src/api.spec.ts
apps/gateway/src/chat/tools/custom-content-filter.spec.ts
apps/gateway/src/chat/tools/custom-content-filter.ts

🚧 Files skipped from review as they are similar to previous changes (1)

apps/gateway/src/api.spec.ts

feat: add custom moderation mode

01a1cc8

Copilot AI review requested due to automatic review settings April 8, 2026 18:18

Copilot started reviewing on behalf of steebchen April 8, 2026 18:19 View session

chatgpt-codex-connector bot reviewed Apr 8, 2026

View reviewed changes

Copilot AI reviewed Apr 8, 2026

View reviewed changes

coderabbitai bot reviewed Apr 8, 2026

View reviewed changes

feat: schema moderation output

d2b6400

chatgpt-codex-connector bot reviewed Apr 8, 2026

View reviewed changes

coderabbitai bot reviewed Apr 8, 2026

View reviewed changes

steebchen added 2 commits April 9, 2026 18:28

feat: add moderation base url

3017e46

feat: toggle moderation images

d0a319a

	return `inline-image: media_type=${part.source.media_type}, bytes=${part.source.data.length}`;
	return `inline-image: media_type=${part.source.media_type}, base64Length=${part.source.data.length}`;

Conversation

steebchen commented Apr 8, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Summary by CodeRabbit

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

❌ Failed checks (1 warning)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

steebchen commented Apr 8, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 8, 2026 •

edited

Loading