Skip to content

feat: add custom moderation mode#1993

Open
steebchen wants to merge 4 commits intomainfrom
custom-moderation-mode
Open

feat: add custom moderation mode#1993
steebchen wants to merge 4 commits intomainfrom
custom-moderation-mode

Conversation

@steebchen
Copy link
Copy Markdown
Member

@steebchen steebchen commented Apr 8, 2026

Summary

  • add a custom content filter method alongside keywords and openai
  • route moderation through LLMGateway /v1/chat/completions using LLM_CONTENT_FILTER_CUSTOM_API_KEY and LLM_CONTENT_FILTER_CUSTOM_MODEL
  • preserve existing fail-open logging behavior and store custom moderation results in the existing gateway moderation log payload shape

Testing

  • pnpm exec vitest run --no-file-parallelism apps/gateway/src/chat/tools/custom-content-filter.spec.ts
  • pnpm exec vitest run --no-file-parallelism apps/gateway/src/chat/tools/check-content-filter.spec.ts apps/gateway/src/chat/tools/custom-content-filter.spec.ts apps/gateway/src/api.spec.ts -t "custom content filter|getCustomContentFilterConfig|returns custom when method is set to custom"
  • pnpm build:core
  • pnpm format
  • pnpm build

Summary by CodeRabbit

  • New Features

    • Added support for custom content filtering: integrate external moderation services, configure custom moderation models and endpoints, and combine internal + custom moderation responses.
    • Content-filter behaviors: explicit "enabled" vs "monitor" handling, graceful "fails open" when config is missing, and model-targeting to skip non‑targeted models.
    • Improved logging and persisted moderation metadata for clearer observability.
  • Tests

    • Added comprehensive tests covering modes, config validation/overrides, request/response parsing (including JSON fences), and logging/outputs.

Copilot AI review requested due to automatic review settings April 8, 2026 18:18
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 01a1cc8f60

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +395 to +401
upstreamResponse = await fetch(getCustomContentFilterUrl(), {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: `Bearer ${config.apiKey}`,
"X-Client-Request-Id": context.requestId,
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Bypass content filter for internal custom moderation requests

Calling fetch(getCustomContentFilterUrl(), ...) here sends moderation traffic back to /v1/chat/completions with no internal bypass signal, so when GATEWAY_URL resolves to this same gateway (the default path via getGatewayPublicBaseUrl) and LLM_CONTENT_FILTER_MODELS is unset or includes LLM_CONTENT_FILTER_CUSTOM_MODEL, the nested request re-enters chat.ts and executes the contentFilterMethod === "custom" branch again, recursively issuing more moderation calls until timeout. This can make custom moderation effectively unusable and create load amplification; the internal moderation request needs an explicit skip path.

Useful? React with 👍 / 👎.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 8, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f98c3ffa-34b1-494d-b232-0ccd34c87e2f

📥 Commits

Reviewing files that changed from the base of the PR and between 3017e46 and d0a319a.

📒 Files selected for processing (5)
  • apps/gateway/src/api.spec.ts
  • apps/gateway/src/chat/tools/check-content-filter.spec.ts
  • apps/gateway/src/chat/tools/check-content-filter.ts
  • apps/gateway/src/chat/tools/custom-content-filter.spec.ts
  • apps/gateway/src/chat/tools/custom-content-filter.ts
✅ Files skipped from review due to trivial changes (1)
  • apps/gateway/src/api.spec.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • apps/gateway/src/chat/tools/custom-content-filter.spec.ts
  • apps/gateway/src/chat/tools/check-content-filter.spec.ts

Walkthrough

Adds a new "custom" content-filter: configuration helpers, a custom moderation checker that POSTs to an upstream /v1/chat/completions endpoint, integrates results into the chat handler flow, and adds comprehensive tests covering success, overrides, parsing, and failure modes.

Changes

Cohort / File(s) Summary
Custom Content Filter Implementation
apps/gateway/src/chat/tools/custom-content-filter.ts, apps/gateway/src/chat/tools/custom-content-filter.spec.ts
New module implementing checkCustomContentFilter, Zod/JSON-schema validation, request construction (system prompt + messages + image metadata), timeout/abort handling, parsing of JSON (including json fenced content), verdict normalization, logging, and tests for request shape, response parsing, base URL override, and fail-open behavior.
Content Filter Configuration
apps/gateway/src/chat/tools/check-content-filter.ts, apps/gateway/src/chat/tools/check-content-filter.spec.ts
Adds "custom" to ContentFilterMethod, new CustomContentFilterConfig and helpers (getCustomContentFilterBaseUrl, getCustomContentFilterConfig) for env-driven config, base URL normalization, includeImages flag, required-env validation, and tests for env handling and normalization.
Chat Handler Integration
apps/gateway/src/chat/chat.ts
Calls checkCustomContentFilter when method is "custom", treats its flagged result as a content-filter match, and changes logging to aggregate gatewayContentFilterResponses from both OpenAI and custom checks.
API Integration Tests
apps/gateway/src/api.spec.ts
Adds four tests validating gateway behavior with custom filter in enabled (block), monitor (observe-only), missing-config (logs error, fails open), and model-not-targeted (skips custom filter) scenarios, asserting logs, persisted fields, and response modifications.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant ChatHandler as Chat Handler
    participant CustomFilter as Custom Content Filter
    participant ModerationAPI as External Moderation API
    participant Logger

    Client->>ChatHandler: POST /v1/chat/completions
    activate ChatHandler

    ChatHandler->>CustomFilter: checkCustomContentFilter(messages, context, signal)
    activate CustomFilter

    CustomFilter->>CustomFilter: Load env config, build system prompt + payload
    CustomFilter->>ModerationAPI: POST <base>/v1/chat/completions (model, response_format=json_schema)
    activate ModerationAPI
    ModerationAPI-->>CustomFilter: HTTP response (verdict in assistant content)
    deactivate ModerationAPI

    CustomFilter->>CustomFilter: Parse/validate verdict JSON, derive flagged + responses
    CustomFilter-->>ChatHandler: CustomContentFilterCheckResult
    deactivate CustomFilter

    ChatHandler->>ChatHandler: Aggregate gatewayContentFilterResponses (OpenAI + Custom)
    alt flagged
        ChatHandler->>ChatHandler: Null message.content, set finish_reason = "content_filter"
    else not flagged
        ChatHandler->>ChatHandler: Preserve completion content (or monitor behavior)
    end

    ChatHandler->>Logger: Persist logs including gatewayContentFilterResponses
    ChatHandler-->>Client: Completion response
    deactivate ChatHandler
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • #1936: Overlaps with changes to apps/gateway/src/chat/chat.ts and content-filter integration logic.
  • #1922: Adds gateway content-filter response storage and logging propagation, related to aggregation/persistence changes here.
  • #1908: Refactors OpenAI moderation response aggregation and signatures—related to combining OpenAI and custom moderation responses.

Suggested labels

auto-merge

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 5.26% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add custom moderation mode' is clear, specific, and accurately reflects the main change—adding a new custom content filter method as an alternative to existing moderation approaches.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch custom-moderation-mode

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new “custom” gateway content-filter method that performs moderation by calling back into the LLMGateway /v1/chat/completions endpoint with a dedicated API key/model, and integrates its results into the existing moderation logging shape.

Changes:

  • Introduces custom as a supported LLM_CONTENT_FILTER_METHOD, with env-based config (LLM_CONTENT_FILTER_CUSTOM_API_KEY, LLM_CONTENT_FILTER_CUSTOM_MODEL).
  • Implements custom moderation execution/parsing + fail-open logging in a new tool module.
  • Wires custom moderation into /v1/chat/completions flow and adds unit + API-level tests covering block/monitor/missing-config/skip-by-model behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
apps/gateway/src/chat/tools/custom-content-filter.ts New custom moderation implementation that calls gateway chat completions, parses verdict JSON, and logs results/errors.
apps/gateway/src/chat/tools/custom-content-filter.spec.ts Unit tests for custom moderation request shape, JSON parsing, and fail-open behavior.
apps/gateway/src/chat/tools/check-content-filter.ts Extends content-filter method enum and adds env-driven custom moderation config helper.
apps/gateway/src/chat/tools/check-content-filter.spec.ts Tests for new custom method selection and config validation.
apps/gateway/src/chat/chat.ts Integrates custom moderation into content-filter match logic and log payload aggregation.
apps/gateway/src/api.spec.ts Adds end-to-end API tests for custom method (block/monitor/missing-config/skip model).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1913 to +1925
const customContentFilterResult =
shouldApplyGatewayContentFilter && contentFilterMethod === "custom"
? await checkCustomContentFilter(
messages as BaseMessage[],
{
requestId,
organizationId: project.organizationId,
projectId: project.id,
apiKeyId: apiKey.id,
},
c.req.raw.signal,
)
: null;
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Custom moderation calls back into the gateway’s own /v1/chat/completions. If LLM_CONTENT_FILTER_MODELS is unset/empty (meaning “apply to all models”) and LLM_CONTENT_FILTER_METHOD=custom, the moderation request itself will also go through this same gateway content filter path and trigger another custom moderation call, causing an infinite recursion / request storm. Add an explicit internal bypass (e.g., send a dedicated header on the internal moderation fetch and have chat.ts skip gateway content filtering when that header is present), or otherwise ensure the moderation model is always excluded from filtering in a robust, non-config-dependent way.

Copilot uses AI. Check for mistakes.
Comment on lines +395 to +418
upstreamResponse = await fetch(getCustomContentFilterUrl(), {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: `Bearer ${config.apiKey}`,
"X-Client-Request-Id": context.requestId,
},
body: JSON.stringify({
model: config.model,
temperature: 0,
max_tokens: CUSTOM_CONTENT_FILTER_MAX_TOKENS,
messages: [
{
role: "system",
content: CUSTOM_CONTENT_FILTER_SYSTEM_PROMPT,
},
{
role: "user",
content: buildCustomContentFilterInput(messages),
},
],
}),
signal,
});
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The custom moderation fetch hits the gateway’s own chat-completions endpoint. Without an explicit bypass header/flag, this can recurse when custom content filtering is enabled for the moderation model (default LLM_CONTENT_FILTER_MODELS behavior applies filtering to all models). Consider adding an internal-only header to this request and updating the chat-completions handler to skip gateway content filtering when that header is present.

Copilot uses AI. Check for mistakes.
}

if (part.type === "image") {
return `inline-image: media_type=${part.source.media_type}, bytes=${part.source.data.length}`;
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getImageReference() reports bytes=${part.source.data.length} for inline images, but source.data is typically base64 text (string length), not actual bytes. Either rename the field (e.g., base64Length) or compute real byte size if needed to avoid misleading moderation inputs/logs.

Suggested change
return `inline-image: media_type=${part.source.media_type}, bytes=${part.source.data.length}`;
return `inline-image: media_type=${part.source.media_type}, base64Length=${part.source.data.length}`;

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
apps/gateway/src/chat/tools/check-content-filter.ts (1)

47-53: ⚠️ Potential issue | 🟡 Minor

Let explicit LLM_CONTENT_FILTER_METHOD override legacy mode.

At Line 47, legacy LLM_CONTENT_FILTER_MODE=openai takes precedence over LLM_CONTENT_FILTER_METHOD=custom, making custom mode unreachable in mixed env configurations.

🔁 Proposed precedence fix
-	if (envValue === "openai" || legacyModeEnvValue === "openai") {
-		return "openai";
-	}
-
 	if (envValue === "custom") {
 		return "custom";
 	}
+
+	if (envValue === "openai" || legacyModeEnvValue === "openai") {
+		return "openai";
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/gateway/src/chat/tools/check-content-filter.ts` around lines 47 - 53,
The current logic gives legacyModeEnvValue ("LLM_CONTENT_FILTER_MODE")
precedence over envValue ("LLM_CONTENT_FILTER_METHOD"), causing "custom" to be
unreachable when legacyModeEnvValue === "openai"; change the precedence so
envValue is checked first (i.e., evaluate envValue === "custom" or envValue ===
"openai" before checking legacyModeEnvValue), or explicitly prefer envValue when
it is set (use envValue if truthy, otherwise fall back to legacyModeEnvValue) so
that envValue="custom" can override legacyModeEnvValue="openai".
🧹 Nitpick comments (1)
apps/gateway/src/chat/tools/custom-content-filter.spec.ts (1)

34-53: Add a regression assertion for the internal moderation bypass header.

Given custom moderation calls back into /v1/chat/completions, this test should also assert the internal bypass header once implemented, so recursion cannot regress silently.

🧪 Suggested assertion
 				const headers = new Headers(init?.headers);
 				expect(headers.get("authorization")).toBe("Bearer custom-api-key");
 				expect(headers.get("x-client-request-id")).toBe("request-id");
+				expect(headers.get("x-llmgateway-internal-content-filter")).toBe("1");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/gateway/src/chat/tools/custom-content-filter.spec.ts` around lines 34 -
53, The test needs an assertion that the internal moderation-bypass header is
sent when custom moderation calls back into /v1/chat/completions; inside the
mocked fetch in custom-content-filter.spec.ts (where fetchSpy is created and
headers are checked), add an assertion that
headers.get("<internal-moderation-bypass-header>") equals the same value used by
the implementation (or use the implementation constant, e.g.
INTERNAL_MODERATION_BYPASS_HEADER and its expected value such as "1" or "true")
so recursion cannot regress silently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/gateway/src/chat/chat.ts`:
- Around line 1913-1925: The custom content-filter branch can recurse because
checkCustomContentFilter posts back to the gateway endpoint and re-enters the
same contentFilterMethod === "custom" path; modify the gateway branch that calls
checkCustomContentFilter (the code using
shouldApplyGatewayContentFilter/contentFilterMethod and function
checkCustomContentFilter) to detect and skip moderation when a special internal
header is present, and update the internal fetch inside
apps/gateway/src/chat/tools/custom-content-filter.ts (the request to
GATEWAY_URL/v1/chat/completions) to include a unique bypass header (e.g.,
X-Gateway-Internal-Moderation: 1) so the gateway can short-circuit and avoid
re-invoking the custom content filter.

In `@apps/gateway/src/chat/tools/check-content-filter.spec.ts`:
- Around line 247-263: The tests for getCustomContentFilterConfig only cover
missing (undefined) env vars; also add cases where
LLM_CONTENT_FILTER_CUSTOM_API_KEY or LLM_CONTENT_FILTER_CUSTOM_MODEL are present
but empty strings (""), because empty values should be treated the same as
missing and cause the same error; update the two specs ("throws when the custom
api key is missing" and "throws when the custom model is missing") or add new
tests to set the respective env var to "" before calling
getCustomContentFilterConfig() and expect the same toThrow messages.

In `@apps/gateway/src/chat/tools/custom-content-filter.ts`:
- Around line 395-418: The moderation POST is re-entering the gateway's content
filter causing recursion; mark the internal moderation request and skip gateway
filtering. Add the internal-moderation header (e.g.,
"x-llmgateway-internal-moderation": "true") to the fetch call created by
getCustomContentFilterUrl()/the upstreamResponse POST so the request can be
identified, and ensure the gating logic in checkCustomContentFilter()/chat.ts
uses that header (isInternalModerationRequest) to bypass applying the gateway
content filter to requests with that header set.
- Around line 139-146: The getImageReference function currently copies
part.image_url.url verbatim for part.type === "image_url", which can leak
sensitive query params or hostnames; change it to never include the raw URL
string — instead return a sanitized placeholder (e.g., "remote-image:
[redacted]") or a safe summary that omits the URL/hostname/query (you may
include non-identifying metadata like image size or safe media type if
available), and ensure the same treatment is applied for part.type === "image"
if any source fields could leak identifying info; update getImageReference to
reference part.image_url.url only to extract non-sensitive metadata (if needed)
but do not emit the URL itself.

---

Outside diff comments:
In `@apps/gateway/src/chat/tools/check-content-filter.ts`:
- Around line 47-53: The current logic gives legacyModeEnvValue
("LLM_CONTENT_FILTER_MODE") precedence over envValue
("LLM_CONTENT_FILTER_METHOD"), causing "custom" to be unreachable when
legacyModeEnvValue === "openai"; change the precedence so envValue is checked
first (i.e., evaluate envValue === "custom" or envValue === "openai" before
checking legacyModeEnvValue), or explicitly prefer envValue when it is set (use
envValue if truthy, otherwise fall back to legacyModeEnvValue) so that
envValue="custom" can override legacyModeEnvValue="openai".

---

Nitpick comments:
In `@apps/gateway/src/chat/tools/custom-content-filter.spec.ts`:
- Around line 34-53: The test needs an assertion that the internal
moderation-bypass header is sent when custom moderation calls back into
/v1/chat/completions; inside the mocked fetch in custom-content-filter.spec.ts
(where fetchSpy is created and headers are checked), add an assertion that
headers.get("<internal-moderation-bypass-header>") equals the same value used by
the implementation (or use the implementation constant, e.g.
INTERNAL_MODERATION_BYPASS_HEADER and its expected value such as "1" or "true")
so recursion cannot regress silently.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 9e28cf75-15b4-4484-a285-efec64d1977c

📥 Commits

Reviewing files that changed from the base of the PR and between 6b064a1 and 01a1cc8.

📒 Files selected for processing (6)
  • apps/gateway/src/api.spec.ts
  • apps/gateway/src/chat/chat.ts
  • apps/gateway/src/chat/tools/check-content-filter.spec.ts
  • apps/gateway/src/chat/tools/check-content-filter.ts
  • apps/gateway/src/chat/tools/custom-content-filter.spec.ts
  • apps/gateway/src/chat/tools/custom-content-filter.ts

Comment on lines +1913 to +1925
const customContentFilterResult =
shouldApplyGatewayContentFilter && contentFilterMethod === "custom"
? await checkCustomContentFilter(
messages as BaseMessage[],
{
requestId,
organizationId: project.organizationId,
projectId: project.id,
apiKeyId: apiKey.id,
},
c.req.raw.signal,
)
: null;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Prevent self-recursive moderation requests in custom mode.

At Line 1913, this branch invokes checkCustomContentFilter, and that helper posts back to GATEWAY_URL/v1/chat/completions (apps/gateway/src/chat/tools/custom-content-filter.ts, Lines 384-420) without a bypass marker. That internal request can re-enter the same contentFilterMethod === "custom" path and loop indefinitely.

🔧 Proposed guard + internal marker
+	const isInternalContentFilterRequest =
+		c.req.header("x-llmgateway-internal-content-filter") === "1";
 	const shouldApplyGatewayContentFilter =
+		!isInternalContentFilterRequest &&
 		contentFilterMode !== "disabled" &&
 		shouldApplyContentFilterToModel(requestedModel);

Also add this header on the internal moderation fetch in apps/gateway/src/chat/tools/custom-content-filter.ts:

headers: {
	"Content-Type": "application/json",
	Authorization: `Bearer ${config.apiKey}`,
	"X-Client-Request-Id": context.requestId,
+	"X-LLMGateway-Internal-Content-Filter": "1",
},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/gateway/src/chat/chat.ts` around lines 1913 - 1925, The custom
content-filter branch can recurse because checkCustomContentFilter posts back to
the gateway endpoint and re-enters the same contentFilterMethod === "custom"
path; modify the gateway branch that calls checkCustomContentFilter (the code
using shouldApplyGatewayContentFilter/contentFilterMethod and function
checkCustomContentFilter) to detect and skip moderation when a special internal
header is present, and update the internal fetch inside
apps/gateway/src/chat/tools/custom-content-filter.ts (the request to
GATEWAY_URL/v1/chat/completions) to include a unique bypass header (e.g.,
X-Gateway-Internal-Moderation: 1) so the gateway can short-circuit and avoid
re-invoking the custom content filter.

Comment on lines +247 to +263
it("throws when the custom api key is missing", () => {
delete process.env.LLM_CONTENT_FILTER_CUSTOM_API_KEY;
process.env.LLM_CONTENT_FILTER_CUSTOM_MODEL = "openai/gpt-5-mini";

expect(() => getCustomContentFilterConfig()).toThrow(
"LLM_CONTENT_FILTER_CUSTOM_API_KEY environment variable is required for custom content filter",
);
});

it("throws when the custom model is missing", () => {
process.env.LLM_CONTENT_FILTER_CUSTOM_API_KEY = "custom-key";
delete process.env.LLM_CONTENT_FILTER_CUSTOM_MODEL;

expect(() => getCustomContentFilterConfig()).toThrow(
"LLM_CONTENT_FILTER_CUSTOM_MODEL environment variable is required for custom content filter",
);
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Cover blank env-var values in these new config tests.

These cases only assert undefined, but misconfigured deployments usually fail as "". That leaves the new validation path partially untested.

🧪 Suggested test additions
+	it("throws when the custom api key is empty", () => {
+		process.env.LLM_CONTENT_FILTER_CUSTOM_API_KEY = "";
+		process.env.LLM_CONTENT_FILTER_CUSTOM_MODEL = "openai/gpt-5-mini";
+
+		expect(() => getCustomContentFilterConfig()).toThrow(
+			"LLM_CONTENT_FILTER_CUSTOM_API_KEY environment variable is required for custom content filter",
+		);
+	});
+
+	it("throws when the custom model is empty", () => {
+		process.env.LLM_CONTENT_FILTER_CUSTOM_API_KEY = "custom-key";
+		process.env.LLM_CONTENT_FILTER_CUSTOM_MODEL = "";
+
+		expect(() => getCustomContentFilterConfig()).toThrow(
+			"LLM_CONTENT_FILTER_CUSTOM_MODEL environment variable is required for custom content filter",
+		);
+	});
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
it("throws when the custom api key is missing", () => {
delete process.env.LLM_CONTENT_FILTER_CUSTOM_API_KEY;
process.env.LLM_CONTENT_FILTER_CUSTOM_MODEL = "openai/gpt-5-mini";
expect(() => getCustomContentFilterConfig()).toThrow(
"LLM_CONTENT_FILTER_CUSTOM_API_KEY environment variable is required for custom content filter",
);
});
it("throws when the custom model is missing", () => {
process.env.LLM_CONTENT_FILTER_CUSTOM_API_KEY = "custom-key";
delete process.env.LLM_CONTENT_FILTER_CUSTOM_MODEL;
expect(() => getCustomContentFilterConfig()).toThrow(
"LLM_CONTENT_FILTER_CUSTOM_MODEL environment variable is required for custom content filter",
);
});
it("throws when the custom api key is missing", () => {
delete process.env.LLM_CONTENT_FILTER_CUSTOM_API_KEY;
process.env.LLM_CONTENT_FILTER_CUSTOM_MODEL = "openai/gpt-5-mini";
expect(() => getCustomContentFilterConfig()).toThrow(
"LLM_CONTENT_FILTER_CUSTOM_API_KEY environment variable is required for custom content filter",
);
});
it("throws when the custom model is missing", () => {
process.env.LLM_CONTENT_FILTER_CUSTOM_API_KEY = "custom-key";
delete process.env.LLM_CONTENT_FILTER_CUSTOM_MODEL;
expect(() => getCustomContentFilterConfig()).toThrow(
"LLM_CONTENT_FILTER_CUSTOM_MODEL environment variable is required for custom content filter",
);
});
it("throws when the custom api key is empty", () => {
process.env.LLM_CONTENT_FILTER_CUSTOM_API_KEY = "";
process.env.LLM_CONTENT_FILTER_CUSTOM_MODEL = "openai/gpt-5-mini";
expect(() => getCustomContentFilterConfig()).toThrow(
"LLM_CONTENT_FILTER_CUSTOM_API_KEY environment variable is required for custom content filter",
);
});
it("throws when the custom model is empty", () => {
process.env.LLM_CONTENT_FILTER_CUSTOM_API_KEY = "custom-key";
process.env.LLM_CONTENT_FILTER_CUSTOM_MODEL = "";
expect(() => getCustomContentFilterConfig()).toThrow(
"LLM_CONTENT_FILTER_CUSTOM_MODEL environment variable is required for custom content filter",
);
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/gateway/src/chat/tools/check-content-filter.spec.ts` around lines 247 -
263, The tests for getCustomContentFilterConfig only cover missing (undefined)
env vars; also add cases where LLM_CONTENT_FILTER_CUSTOM_API_KEY or
LLM_CONTENT_FILTER_CUSTOM_MODEL are present but empty strings (""), because
empty values should be treated the same as missing and cause the same error;
update the two specs ("throws when the custom api key is missing" and "throws
when the custom model is missing") or add new tests to set the respective env
var to "" before calling getCustomContentFilterConfig() and expect the same
toThrow messages.

Comment on lines +139 to +146
function getImageReference(part: MessageContent): string | null {
if (part.type === "image_url") {
return `remote-image: ${part.image_url.url}`;
}

if (part.type === "image") {
return `inline-image: media_type=${part.source.media_type}, bytes=${part.source.data.length}`;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not send raw image URLs to the moderator.

part.image_url.url is copied verbatim into the prompt, which can leak presigned query params, internal hostnames, or user identifiers to the moderation model even though it only sees the URL as text.

🔐 Safer handling
 function getImageReference(part: MessageContent): string | null {
 	if (part.type === "image_url") {
-		return `remote-image: ${part.image_url.url}`;
+		try {
+			const url = new URL(part.image_url.url);
+			if (url.protocol !== "http:" && url.protocol !== "https:") {
+				return `remote-image: [${url.protocol.replace(":", "")}]`;
+			}
+			return `remote-image: ${url.origin}${url.pathname}`;
+		} catch {
+			return "remote-image: [redacted]";
+		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/gateway/src/chat/tools/custom-content-filter.ts` around lines 139 - 146,
The getImageReference function currently copies part.image_url.url verbatim for
part.type === "image_url", which can leak sensitive query params or hostnames;
change it to never include the raw URL string — instead return a sanitized
placeholder (e.g., "remote-image: [redacted]") or a safe summary that omits the
URL/hostname/query (you may include non-identifying metadata like image size or
safe media type if available), and ensure the same treatment is applied for
part.type === "image" if any source fields could leak identifying info; update
getImageReference to reference part.image_url.url only to extract non-sensitive
metadata (if needed) but do not emit the URL itself.

Comment on lines +395 to +418
upstreamResponse = await fetch(getCustomContentFilterUrl(), {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: `Bearer ${config.apiKey}`,
"X-Client-Request-Id": context.requestId,
},
body: JSON.stringify({
model: config.model,
temperature: 0,
max_tokens: CUSTOM_CONTENT_FILTER_MAX_TOKENS,
messages: [
{
role: "system",
content: CUSTOM_CONTENT_FILTER_SYSTEM_PROMPT,
},
{
role: "user",
content: buildCustomContentFilterInput(messages),
},
],
}),
signal,
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Skip content filtering on the internal moderation request.

This POST goes through the same /v1/chat/completions handler that invokes checkCustomContentFilter() in apps/gateway/src/chat/chat.ts:1891-1932. With LLM_CONTENT_FILTER_MODELS unset — the default apply-to-all behavior — or configured to include config.model, the moderation request re-enters this code and recursively moderates itself until timeout, so custom filtering fail-opens instead of classifying the original request.

🔁 One way to break the recursion
 		upstreamResponse = await fetch(getCustomContentFilterUrl(), {
 			method: "POST",
 			headers: {
 				"Content-Type": "application/json",
 				Authorization: `Bearer ${config.apiKey}`,
 				"X-Client-Request-Id": context.requestId,
+				"X-LLMGateway-Internal-Moderation": "true",
 			},
// apps/gateway/src/chat/chat.ts
const isInternalModerationRequest =
	c.req.header("x-llmgateway-internal-moderation") === "true";

const shouldApplyGatewayContentFilter =
	!isInternalModerationRequest &&
	contentFilterMode !== "disabled" &&
	shouldApplyContentFilterToModel(requestedModel);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/gateway/src/chat/tools/custom-content-filter.ts` around lines 395 - 418,
The moderation POST is re-entering the gateway's content filter causing
recursion; mark the internal moderation request and skip gateway filtering. Add
the internal-moderation header (e.g., "x-llmgateway-internal-moderation":
"true") to the fetch call created by getCustomContentFilterUrl()/the
upstreamResponse POST so the request can be identified, and ensure the gating
logic in checkCustomContentFilter()/chat.ts uses that header
(isInternalModerationRequest) to bypass applying the gateway content filter to
requests with that header set.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d2b640048d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +173 to +175
if (part.type === "image") {
return `inline-image: media_type=${part.source.media_type}, bytes=${part.source.data.length}`;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Send real inline images to custom moderation

When contentFilterMethod === "custom", inline images are converted to metadata (media_type and byte length) instead of image data, so the moderation model cannot inspect the actual pixels. In enabled mode this allows image-only unsafe content to pass as unflagged whenever accompanying text is benign or empty, which undermines the filter for multimodal requests. The moderation request should include the real image payload (or use a moderation endpoint that accepts image inputs) rather than a textual placeholder.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
apps/gateway/src/chat/tools/custom-content-filter.ts (1)

324-330: Consider extracting the hardcoded score threshold.

The 0.5 threshold for category scores appears here and again in buildCustomModerationPayload (line 397). Consider extracting to a named constant for consistency and easier tuning.

♻️ Optional: Extract threshold constant
+const CUSTOM_CONTENT_FILTER_SCORE_THRESHOLD = 0.5;
+
 function getFlaggedCategories(payload: ModerationApiPayload): string[] {
 	// ...
 	for (const [category, score] of Object.entries(
 		result.category_scores ?? {},
 	)) {
-		if (score > 0.5) {
+		if (score > CUSTOM_CONTENT_FILTER_SCORE_THRESHOLD) {
 			categories.add(category);
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/gateway/src/chat/tools/custom-content-filter.ts` around lines 324 - 330,
Extract the hardcoded 0.5 into a single named constant (e.g.,
CATEGORY_SCORE_THRESHOLD = 0.5) and replace the literal in the category loop
inside custom-content-filter.ts (the for (const [category, score] of
Object.entries(...) { if (score > 0.5) ... }) and the other occurrence in
buildCustomModerationPayload) so both sites reference the same constant for
consistency and easier tuning; ensure the constant is exported or colocated at
the top of the module so both functions use it.
apps/gateway/src/chat/tools/custom-content-filter.spec.ts (1)

7-227: Consider additional test coverage.

The current tests cover the happy path and missing config scenario well. Consider adding tests for:

  • Network timeout handling
  • Non-2xx upstream responses
  • Invalid/malformed upstream JSON responses
  • Request cancellation via AbortSignal

These would increase confidence in the fail-open behavior under various failure modes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/gateway/src/chat/tools/custom-content-filter.spec.ts` around lines 7 -
227, Add tests for failure modes of checkCustomContentFilter: (1) simulate a
network timeout by mocking global.fetch to reject (e.g., throw a TypeError or a
custom timeout Error) and assert the function returns fail-open (flagged false),
does not crash, and logs an error; (2) simulate a non-2xx upstream response by
returning a Response with status 500 and optional error body, then assert
fail-open behavior and logged error; (3) simulate malformed/invalid JSON by
returning a 200 Response whose choices.message.content is not valid JSON and
assert the parser falls back safely and returns fail-open while logging; and (4)
test request cancellation by creating an AbortController, passing its signal
into checkCustomContentFilter (where supported) and mocking fetch to reject with
an AbortError or to observe the signal, then assert the function handles
cancellation by returning fail-open and logging. Reference the test file and the
checkCustomContentFilter function to add these cases, mock logger.error to
inspect logs, and ensure expectations mirror existing tests (fetch calls,
result.flagged false, result.responses empty or sanitized).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@apps/gateway/src/chat/tools/custom-content-filter.spec.ts`:
- Around line 7-227: Add tests for failure modes of checkCustomContentFilter:
(1) simulate a network timeout by mocking global.fetch to reject (e.g., throw a
TypeError or a custom timeout Error) and assert the function returns fail-open
(flagged false), does not crash, and logs an error; (2) simulate a non-2xx
upstream response by returning a Response with status 500 and optional error
body, then assert fail-open behavior and logged error; (3) simulate
malformed/invalid JSON by returning a 200 Response whose choices.message.content
is not valid JSON and assert the parser falls back safely and returns fail-open
while logging; and (4) test request cancellation by creating an AbortController,
passing its signal into checkCustomContentFilter (where supported) and mocking
fetch to reject with an AbortError or to observe the signal, then assert the
function handles cancellation by returning fail-open and logging. Reference the
test file and the checkCustomContentFilter function to add these cases, mock
logger.error to inspect logs, and ensure expectations mirror existing tests
(fetch calls, result.flagged false, result.responses empty or sanitized).

In `@apps/gateway/src/chat/tools/custom-content-filter.ts`:
- Around line 324-330: Extract the hardcoded 0.5 into a single named constant
(e.g., CATEGORY_SCORE_THRESHOLD = 0.5) and replace the literal in the category
loop inside custom-content-filter.ts (the for (const [category, score] of
Object.entries(...) { if (score > 0.5) ... }) and the other occurrence in
buildCustomModerationPayload) so both sites reference the same constant for
consistency and easier tuning; ensure the constant is exported or colocated at
the top of the module so both functions use it.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f8081615-32b3-458e-b58b-08f8ec5996d6

📥 Commits

Reviewing files that changed from the base of the PR and between 01a1cc8 and d2b6400.

📒 Files selected for processing (3)
  • apps/gateway/src/api.spec.ts
  • apps/gateway/src/chat/tools/custom-content-filter.spec.ts
  • apps/gateway/src/chat/tools/custom-content-filter.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • apps/gateway/src/api.spec.ts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants