Skip to content

refactor(inference): modular multi-provider AI architecture#2343

Closed
camjac251 wants to merge 1 commit intokarakeep-app:mainfrom
camjac251:feat/multi-provider-inference
Closed

refactor(inference): modular multi-provider AI architecture#2343
camjac251 wants to merge 1 commit intokarakeep-app:mainfrom
camjac251:feat/multi-provider-inference

Conversation

@camjac251
Copy link

Summary

Replace monolithic inference module with provider-specific clients supporting OpenAI, Anthropic Claude, Google Gemini, and Ollama. Each provider now has native SDK integration with structured output support via JSON schema.

New Providers

  • Anthropic: Claude 4.5 with structured outputs (beta), vision support
  • Google Gemini: 2.5/3.x with JSON schema, batch embeddings
  • OpenAI: Chat Completions + Responses API for GPT-5/o-series reasoning

Architecture Changes

  • Add packages/shared/inference/ with modular provider clients
  • Add InferenceClientFactory and EmbeddingClientFactory for provider selection
  • Separate inference and embedding providers (Anthropic lacks embeddings API)

New Configuration

  • INFERENCE_PROVIDER - explicit provider selection (auto-detects if unset)
  • ANTHROPIC_API_KEY, ANTHROPIC_BASE_URL
  • GEMINI_API_KEY, GEMINI_BASE_URL
  • EMBEDDING_PROVIDER - separate embedding configuration
  • OPENAI_USE_RESPONSES_API, OPENAI_REASONING_EFFORT for GPT-5

Breaking Changes

  • Remove INFERENCE_USE_MAX_COMPLETION_TOKENS (now auto-detected)
  • Remove INFERENCE_SUPPORTS_STRUCTURED_OUTPUT (use INFERENCE_OUTPUT_SCHEMA)

Other

  • Add ProviderIndicator component showing active provider/models in settings
  • Upgrade zod 3.24.2 -> 3.25.0 for Anthropic SDK compatibility
  • Add 106 unit tests and 19 live API integration tests
  • Add pnpm inference:live-test script for manual API testing

Test plan

  • All 106 unit tests pass
  • All 19 live API integration tests pass (OpenAI, Anthropic, Google)
  • Typecheck passes (22 packages)
  • Lint passes (19 packages)

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 3, 2026

Warning

Rate limit exceeded

@camjac251 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 1 minutes and 30 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 47530b0 and 15fd85b.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (68)
  • apps/browser-extension/package.json
  • apps/mcp/package.json
  • apps/mobile/package.json
  • apps/web/components/settings/AISettings.tsx
  • apps/web/lib/clientConfig.tsx
  • apps/web/lib/i18n/locales/ar/translation.json
  • apps/web/lib/i18n/locales/cs/translation.json
  • apps/web/lib/i18n/locales/da/translation.json
  • apps/web/lib/i18n/locales/de/translation.json
  • apps/web/lib/i18n/locales/el/translation.json
  • apps/web/lib/i18n/locales/en/translation.json
  • apps/web/lib/i18n/locales/en_US/translation.json
  • apps/web/lib/i18n/locales/es/translation.json
  • apps/web/lib/i18n/locales/fa/translation.json
  • apps/web/lib/i18n/locales/fi/translation.json
  • apps/web/lib/i18n/locales/fr/translation.json
  • apps/web/lib/i18n/locales/ga/translation.json
  • apps/web/lib/i18n/locales/gl/translation.json
  • apps/web/lib/i18n/locales/hr/translation.json
  • apps/web/lib/i18n/locales/hu/translation.json
  • apps/web/lib/i18n/locales/it/translation.json
  • apps/web/lib/i18n/locales/ja/translation.json
  • apps/web/lib/i18n/locales/ko/translation.json
  • apps/web/lib/i18n/locales/nb_NO/translation.json
  • apps/web/lib/i18n/locales/nl/translation.json
  • apps/web/lib/i18n/locales/pl/translation.json
  • apps/web/lib/i18n/locales/pt/translation.json
  • apps/web/lib/i18n/locales/pt_BR/translation.json
  • apps/web/lib/i18n/locales/ru/translation.json
  • apps/web/lib/i18n/locales/sk/translation.json
  • apps/web/lib/i18n/locales/sl/translation.json
  • apps/web/lib/i18n/locales/sv/translation.json
  • apps/web/lib/i18n/locales/tr/translation.json
  • apps/web/lib/i18n/locales/uk/translation.json
  • apps/web/lib/i18n/locales/vi/translation.json
  • apps/web/lib/i18n/locales/zh/translation.json
  • apps/web/lib/i18n/locales/zhtw/translation.json
  • apps/web/package.json
  • apps/workers/package.json
  • docs/docs/03-configuration/01-environment-variables.md
  • docs/docs/03-configuration/02-different-ai-providers.md
  • package.json
  • packages/api/package.json
  • packages/benchmarks/package.json
  • packages/e2e_tests/package.json
  • packages/open-api/package.json
  • packages/shared/config.ts
  • packages/shared/index.ts
  • packages/shared/inference.ts
  • packages/shared/inference/anthropic.test.ts
  • packages/shared/inference/anthropic.ts
  • packages/shared/inference/factory.test.ts
  • packages/shared/inference/factory.ts
  • packages/shared/inference/google.test.ts
  • packages/shared/inference/google.ts
  • packages/shared/inference/index.ts
  • packages/shared/inference/live-test.ts
  • packages/shared/inference/ollama.test.ts
  • packages/shared/inference/ollama.ts
  • packages/shared/inference/openai.test.ts
  • packages/shared/inference/openai.ts
  • packages/shared/inference/types.ts
  • packages/shared/package.json
  • packages/trpc/package.json
  • tools/compare-models/package.json
  • tools/compare-models/src/bookmarkProcessor.ts
  • tools/compare-models/src/config.ts
  • tools/compare-models/src/inferenceClient.ts

Walkthrough

Adds multi-provider AI inference: new provider clients, factories, types, config/env vars, clientConfig fields, provider-status UI and i18n strings, docs updates, tests, tooling changes, and bumps zod plus new SDK deps.

Changes

Cohort / File(s) Summary
Dependency updates
apps/browser-extension/package.json, apps/mcp/package.json, apps/mobile/package.json, apps/web/package.json, apps/workers/package.json, package.json, packages/api/package.json, packages/benchmarks/package.json, packages/e2e_tests/package.json, packages/open-api/package.json, packages/shared/package.json, packages/trpc/package.json, tools/compare-models/package.json
Bumped zod ^3.24.2 → ^3.25.0 across workspaces; packages/shared also adds @anthropic-ai/sdk and @google/genai.
Core inference modularization
packages/shared/inference.ts → removed; packages/shared/inference/types.ts, packages/shared/inference/index.ts
Removed monolithic old inference module and added typed public surface and index entry (types.ts, index.ts). Review for removed API surface and downstream breakages.
Provider implementations
packages/shared/inference/openai.ts, .../anthropic.ts, .../google.ts, .../ollama.ts
New provider-specific clients (OpenAI, Anthropic, Google Gemini, Ollama) implementing inference and embeddings, plus helpers for schema/output handling, streaming, and token accounting.
Factories & orchestration
packages/shared/inference/factory.ts, packages/shared/inference/index.ts, packages/shared/inference/live-test.ts
Added InferenceClientFactory and EmbeddingClientFactory, index exports, and a live-test runner that exercises providers. Check provider selection and secret handling.
Tests
packages/shared/inference/*.test.ts (openai.test.ts, anthropic.test.ts, google.test.ts, ollama.test.ts, factory.test.ts)
Comprehensive provider tests added covering text/image inference, structured outputs, embeddings, streaming, and factory behavior.
Config & env schema
packages/shared/config.ts, tools/compare-models/src/config.ts
Introduced INFERENCE_PROVIDER, EMBEDDING_PROVIDER, ANTHROPIC_, GEMINI_, OPENAI_USE_RESPONSES_API, OPENAI_REASONING_EFFORT; removed INFERENCE_USE_MAX_COMPLETION_TOKENS and INFERENCE_SUPPORTS_STRUCTURED_OUTPUT; added provider detection and embedding-provider resolution.
Client config & UI
apps/web/lib/clientConfig.tsx, apps/web/components/settings/AISettings.tsx
Extended clientConfig.inference with provider/textModel/imageModel/embedding provider fields; added and exported ProviderIndicator component and integrated it into AI settings UI.
Internationalization
apps/web/lib/i18n/locales/.../translation.json (many locales)
Added provider-related i18n keys: provider_status, provider_status_description, no_provider_configured, provider, text_model, image_model, embeddings across locales.
Docs
docs/docs/03-configuration/01-environment-variables.md, .../02-different-ai-providers.md
Rewrote configuration docs to cover multi-provider setup, model defaults, embeddings, Responses API, and Ollama guidance.
Tools / local compare-models
tools/compare-models/src/inferenceClient.ts, tools/compare-models/src/bookmarkProcessor.ts
Replaced external inference import with local InferenceClient implementation; removed INFERENCE_USE_MAX_COMPLETION_TOKENS from tool config. Review API/typing changes and callsites.
Workspace scripts & module boundary
package.json, packages/shared/index.ts
Added root script inference:live-test; minor preflight:fix tweak. Added export {}; to packages/shared/index.ts as module boundary.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly related PRs

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'refactor(inference): modular multi-provider AI architecture' clearly and concisely summarizes the main change: refactoring the inference module into a modular, multi-provider architecture.
Description check ✅ Passed The PR description is comprehensive and directly related to the changeset. It explains the summary, new providers, architecture changes, new configuration, breaking changes, and test results—all of which align with the files modified.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
apps/web/lib/i18n/locales/pt_BR/translation.json (1)

173-173: Critical: Duplicate JSON key "summarization".

The key "summarization" appears twice within the same "ai" object (lines 173 and 188). This violates JSON structure—duplicate keys cause undefined behavior, and most parsers will silently use the last occurrence, making line 173 inaccessible.

🔎 Proposed fix: Remove the duplicate at line 188
   "auto_summarization": "Resumo automático",
-  "summarization": "Resumo",
   "provider_status": "Status do provedor",

The key already exists at line 173, so line 188 should be removed entirely.

Also applies to: 188-188

apps/web/lib/i18n/locales/cs/translation.json (1)

105-105: Critical: Duplicate JSON key "summarization".

The key "summarization" appears twice in the settings.ai object (lines 105 and 120). JSON parsers typically use the last occurrence and silently discard the first, which will cause data loss. This is likely unintentional.

🔎 Recommended fix

Review the intended structure and remove the duplicate. If line 105 should remain as the general "Shrnutí" entry and line 120 was meant to be a different key, remove line 120:

       "auto_tagging_description": "Automaticky generovat štítky pro tvoje záložky pomocí umělý inteligence.",
       "camelCase": "camelCase",
       "auto_summarization": "Automatický shrnutí",
-      "summarization": "Shrnutí",
       "provider_status": "Stav poskytovatele",

Alternatively, if both entries serve different purposes, one should be renamed to a unique key.

Also applies to: 120-120

apps/web/components/settings/AISettings.tsx (1)

379-383: Add missing translation key to all locale files.

The no_preference key exists only in en/translation.json but is missing from 29 other locale files. This will cause non-English users to see the raw translation key settings.ai.no_preference instead of translated text. Add this key to all locale files under settings.ai:

"no_preference": "No preference"
docs/docs/03-configuration/01-environment-variables.md (1)

85-91: Update intro text to reflect all supported providers.

The intro states "Either OPENAI_API_KEY or OLLAMA_BASE_URL need to be set" but now Anthropic and Google are also supported. This should be updated to reflect all provider options.

🔎 Proposed fix
-Either `OPENAI_API_KEY` or `OLLAMA_BASE_URL` need to be set for automatic tagging to be enabled. Otherwise, automatic tagging will be skipped.
+One of `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GEMINI_API_KEY`, or `OLLAMA_BASE_URL` must be set for automatic tagging to be enabled. Otherwise, automatic tagging will be skipped.
♻️ Duplicate comments (4)
apps/browser-extension/package.json (1)

41-41: LGTM - Part of coordinated zod upgrade.

This change is consistent with the monorepo-wide zod upgrade to ^3.25.0 for Anthropic SDK compatibility. Verification requested in apps/mobile/package.json applies here as well.

apps/web/package.json (1)

104-104: LGTM - Part of coordinated zod upgrade.

This change is consistent with the monorepo-wide zod upgrade to ^3.25.0 for Anthropic SDK compatibility. Verification requested in apps/mobile/package.json applies here as well.

packages/trpc/package.json (1)

29-29: LGTM - Part of coordinated zod upgrade.

This change is consistent with the monorepo-wide zod upgrade to ^3.25.0 for Anthropic SDK compatibility. Verification requested in apps/mobile/package.json applies here as well.

packages/e2e_tests/package.json (1)

24-24: LGTM - Part of coordinated zod upgrade.

This change is consistent with the monorepo-wide zod upgrade to ^3.25.0 for Anthropic SDK compatibility. Verification requested in apps/mobile/package.json applies here as well.

🟡 Minor comments (11)
apps/web/lib/i18n/locales/cs/translation.json-119-119 (1)

119-119: Fix gender agreement in Czech translation.

"Automatický shrnutí" has incorrect gender agreement. The adjective "Automatický" (masculine) does not agree with the neuter noun "shrnutí".

🔎 Proposed fix
-      "auto_summarization": "Automatický shrnutí",
+      "auto_summarization": "Automatické shrnutí",
docs/docs/03-configuration/02-different-ai-providers.md-9-14 (1)

9-14: Add language identifiers to fenced code blocks.

All fenced code blocks should specify a language identifier for proper syntax highlighting and better readability. Since these are environment variable configuration examples, use bash or shell.

🔎 Example fix for the first few code blocks
-```
+```bash
 INFERENCE_PROVIDER=openai    # Use OpenAI
 INFERENCE_PROVIDER=anthropic # Use Anthropic Claude
 INFERENCE_PROVIDER=google    # Use Google Gemini
 INFERENCE_PROVIDER=ollama    # Use Ollama (local)

- +bash
INFERENCE_PROVIDER=openai # Optional, auto-detected if OPENAI_API_KEY is set
OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx


Apply the same pattern to all other fenced code blocks in this file (lines 60, 74, 93, 103, 123, 135, 176, 192, 210, 225, 247, 265).
</details>

Based on static analysis hints.


Also applies to: 20-27, 60-68, 74-88, 93-99, 103-115, 123-129, 135-150, 176-186, 192-199, 210-213, 225-237, 247-257, 265-272

</blockquote></details>
<details>
<summary>apps/web/lib/i18n/locales/pl/translation.json-201-209 (1)</summary><blockquote>

`201-209`: **Duplicate key `image_tagging` detected.**

Line 202 duplicates the `image_tagging` key that already exists at line 187. This should be removed.



<details>
<summary>🔎 Proposed fix</summary>

```diff
       "auto_summarization": "Automatyczne podsumowywanie",
-      "image_tagging": "Tagowanie obrazów",
       "provider_status": "Status dostawcy",
apps/web/lib/i18n/locales/fi/translation.json-196-204 (1)

196-204: Duplicate key summarization detected.

Line 197 duplicates the summarization key that already exists at line 182. This should be removed to prevent the duplicate from overwriting the original value.

🔎 Proposed fix
       "auto_summarization": "Automaattinen tiivistys",
-      "summarization": "Yhteenvedon luonti",
       "provider_status": "Palveluntarjoajan tila",
apps/web/lib/i18n/locales/ko/translation.json-395-403 (1)

395-403: Duplicate key summarization detected.

Line 396 duplicates the summarization key that already exists at line 381. This should be removed.

🔎 Proposed fix
       "auto_summarization": "자동 요약",
-      "summarization": "요약",
       "provider_status": "제공자 상태",
apps/web/lib/i18n/locales/uk/translation.json-205-213 (1)

205-213: Duplicate key tagging_rules detected.

Line 206 duplicates the tagging_rules key that already exists at line 191. In JSON, duplicate keys are technically allowed but the second value overwrites the first, which appears unintentional here. This line should be removed.

🔎 Proposed fix
       "auto_summarization": "Автоматичне підсумовування",
-      "tagging_rules": "Правила тегів",
       "provider_status": "Статус постачальника",
apps/web/lib/i18n/locales/es/translation.json-139-147 (1)

139-147: Duplicate key image_tagging detected.

The key image_tagging is defined twice in this file - once at line 125 and again at line 140. This will cause one of the values to be silently overwritten when parsed.

🔎 Proposed fix

Remove the duplicate key at line 140:

       "camelCase": "camelCase",
       "auto_summarization": "Resumen automático",
-      "image_tagging": "Etiquetado de imágenes",
       "provider_status": "Estado del proveedor",

Committable suggestion skipped: line range outside the PR's diff.

apps/web/lib/i18n/locales/hr/translation.json-208-216 (1)

208-216: Duplicate key all_tagging detected.

The key all_tagging is defined twice in this file - once at line 194 and again at line 209. JSON parsers will only retain one value (typically the last one), which could cause unexpected behavior.

🔎 Proposed fix

Remove the duplicate key at line 209:

       "camelCase": "camelCase",
       "auto_summarization": "Automatsko sažimanje",
-      "all_tagging": "Sve oznake",
       "provider_status": "Status pružatelja",
apps/web/lib/i18n/locales/it/translation.json-193-201 (1)

193-201: Duplicate translation key summarization detected.

The key summarization appears twice in this file (line 179 and line 194). While both have the same value "Riassunto", this duplication should be removed for maintainability.

🔎 Proposed fix - remove the duplicate
       "camelCase": "camelCase",
       "auto_summarization": "Riassunto automatico",
-      "summarization": "Riassunto",
       "provider_status": "Stato del fornitore",

Committable suggestion skipped: line range outside the PR's diff.

apps/web/lib/i18n/locales/da/translation.json-162-170 (1)

162-170: Duplicate translation key image_tagging detected.

The key image_tagging appears twice in this file (line 148 and line 163). JSON parsers will silently use the last occurrence, but this is likely unintended and could cause confusion during maintenance.

🔎 Proposed fix - remove the duplicate
       "camelCase": "camelCase",
-      "auto_summarization": "Automatisk opsummering",
-      "image_tagging": "Billede-tagging",
+      "auto_summarization": "Automatisk opsummering",
       "provider_status": "Udbyderstatus",

Committable suggestion skipped: line range outside the PR's diff.

tools/compare-models/src/inferenceClient.ts-58-80 (1)

58-80: Missing error handling for empty choices array.

If chatCompletion.choices is empty, accessing choices[0] will return undefined, and choices[0].message.content will throw. Add a defensive check.

🔎 Proposed fix
   private async inferFromText(
     prompt: string,
     model: string,
     opts: InferenceOptions,
   ): Promise<InferenceResponse> {
     const chatCompletion = await this.client.chat.completions.create({
       messages: [{ role: "user", content: prompt }],
       model: model,
       response_format: opts.schema
         ? zodResponseFormat(opts.schema, "schema")
         : { type: "json_object" as const },
     });

-    const response = chatCompletion.choices[0].message.content;
+    const choice = chatCompletion.choices[0];
+    if (!choice) {
+      throw new Error("Got no choices from model response");
+    }
+    const response = choice.message.content;
     if (!response) {
       throw new Error("Got no message content from model");
     }
🧹 Nitpick comments (21)
packages/shared/index.ts (1)

1-3: Good practice: explicit path imports over barrel exports.

The pattern of importing modules via specific paths (e.g., @karakeep/shared/config) rather than re-exporting everything through an index file is a solid choice for tree-shaking, build performance, and dependency clarity. The export {} statement correctly ensures this file is treated as an ES module.

One optional consideration: if your package.json already defines an exports field mapping these subpaths, this index file may not be strictly necessary. You could verify whether removing it would affect the module resolution.

apps/web/lib/i18n/locales/cs/translation.json (1)

127-127: Consider AI/ML-specific translation for "embeddings".

"Vložení" (insertion) may not be the most idiomatic translation for "embeddings" in the AI/ML context, where it typically refers to vector representations. Consider using "embeddingy" (borrowed term commonly used in Czech AI/ML communities) or "vektorové reprezentace" (vector representations) for better domain clarity.

packages/shared/config.ts (1)

543-548: Consider documenting the read-only nature of exposed provider info.

The comment on line 543 mentions "Provider info for read-only display", which is good. Consider whether these fields could potentially leak sensitive information or if additional documentation is needed for consumers of this client config.

This is just a documentation/clarity suggestion—the implementation looks correct for displaying provider status in the UI (as seen in AISettings.tsx).

packages/shared/inference/factory.test.ts (1)

1-159: Comprehensive factory test coverage looks excellent!

The test suite thoroughly covers:

  • All four inference providers (OpenAI, Anthropic, Google, Ollama)
  • All three embedding providers (OpenAI, Google, Ollama)
  • Null and unknown provider edge cases
  • Proper verification that Anthropic returns null for embeddings (no embedding support)

The mocking strategy is sound and prevents real API client construction. The use of beforeEach with vi.clearAllMocks() ensures test isolation.

Optional: Consider helper for type assertions

The type assertions on lines 88, 97, 143, and 152 are necessary for testing runtime behavior but bypass TypeScript's safety. Consider extracting a test helper to make the intent clearer:

+function setInvalidProvider(config: any, type: 'inference' | 'embedding', value: string | null) {
+  config[type].provider = value;
+}
+
 it("should return null when provider is not configured", async () => {
   const { default: serverConfig } = await import("../config");
-  (serverConfig.inference as { provider: string | null }).provider = null;
+  setInvalidProvider(serverConfig, 'inference', null);
   
   const client = InferenceClientFactory.build();
   expect(client).toBeNull();
 });

This is purely optional as the current approach is acceptable for test code.

packages/shared/inference/openai.test.ts (2)

278-282: Config cleanup is incomplete - missing textModel restoration.

The afterEach block restores outputSchema but the beforeEach at line 273 also modifies textModel. While it may not affect other tests in this suite, it's good practice to restore all modified values for test isolation.

🔎 Proposed fix
   afterEach(async () => {
     // Restore original config values to prevent test pollution
     const { default: serverConfig } = await import("../config");
     serverConfig.inference.outputSchema = "structured";
+    serverConfig.inference.textModel = "gpt-4o-mini";
   });

330-334: Config cleanup is incomplete - missing textModel restoration.

Same issue as the JSON output test suite - textModel is modified in beforeEach (line 325) but not restored in afterEach.

🔎 Proposed fix
   afterEach(async () => {
     // Restore original config values to prevent test pollution
     const { default: serverConfig } = await import("../config");
     serverConfig.inference.outputSchema = "structured";
+    serverConfig.inference.textModel = "gpt-4o-mini";
   });
packages/shared/inference/ollama.test.ts (2)

251-277: Missing afterEach cleanup for config restoration.

The beforeEach modifies serverConfig.inference.outputSchema to "json" but there's no afterEach to restore it. This could cause test pollution if test execution order changes.

🔎 Proposed fix
   beforeEach(async () => {
     vi.clearAllMocks();

     // Set output schema to json (not structured)
     const { default: serverConfig } = await import("../config");
     serverConfig.inference.outputSchema = "json";

     client = new OllamaInferenceClient();
   });

+  afterEach(async () => {
+    const { default: serverConfig } = await import("../config");
+    serverConfig.inference.outputSchema = "structured";
+  });
+
   it("should use 'json' format string when outputSchema is json", async () => {

279-306: Missing afterEach cleanup for config restoration.

Same issue as the JSON output test suite - outputSchema is modified to "plain" but not restored.

🔎 Proposed fix
   beforeEach(async () => {
     vi.clearAllMocks();

     // Set output schema to plain
     const { default: serverConfig } = await import("../config");
     serverConfig.inference.outputSchema = "plain";

     client = new OllamaInferenceClient();
   });

+  afterEach(async () => {
+    const { default: serverConfig } = await import("../config");
+    serverConfig.inference.outputSchema = "structured";
+  });
+
   it("should not set format when outputSchema is plain", async () => {
packages/shared/inference/anthropic.test.ts (1)

309-342: Consider adding test for unsupported media type.

The supportsStructuredOutputs tests are thorough. Consider adding a test in the inferFromImage describe block that verifies the error thrown for unsupported image types (e.g., image/tiff).

it("should throw error for unsupported image type", async () => {
  await expect(
    client.inferFromImage("describe", "image/tiff", "imagedata", {})
  ).rejects.toThrow("Unsupported image type");
});
packages/shared/inference/openai.ts (2)

316-360: Consider defining a type for the Responses API request.

The any type with eslint-disable is pragmatic but reduces type safety. As the Responses API stabilizes, consider defining a proper interface.

interface ResponsesApiRequest {
  model: string;
  input: string | unknown[];
  text: ReturnType<typeof this.getResponsesTextFormat>;
  store: boolean;
  temperature: number;
  top_p: number;
  reasoning?: { effort: string };
  max_output_tokens?: number;
  previous_response_id?: string;
}

389-398: Silent JSON parsing failure may hide issues.

When JSON parsing fails, the code silently falls back to the raw outputText. This is reasonable for resilience, but consider logging a warning in development mode to help debug malformed responses.

packages/shared/inference/google.test.ts (1)

233-268: Config mutation in tests may cause flakiness if tests run in parallel.

The pattern of mutating serverConfig.inference.outputSchema in beforeEach/afterEach could cause issues if Vitest runs tests concurrently. Consider using vi.doMock with a fresh import for each test or explicitly setting --no-threads for this test file.

🔎 Alternative approach using isolated mocks
 describe("GoogleGeminiInferenceClient with plain output", () => {
   let client: GoogleGeminiInferenceClient;
 
-  beforeEach(async () => {
-    vi.clearAllMocks();
-
-    // Set output schema to plain text
-    const { default: serverConfig } = await import("../config");
-    serverConfig.inference.outputSchema = "plain";
-
-    client = new GoogleGeminiInferenceClient();
-  });
-
-  afterEach(async () => {
-    // Restore original config value to prevent test pollution
-    const { default: serverConfig } = await import("../config");
-    serverConfig.inference.outputSchema = "structured";
-  });
+  beforeEach(() => {
+    vi.clearAllMocks();
+    // Override the mock for this describe block
+    vi.doMock("../config", () => ({
+      default: {
+        inference: {
+          provider: "google",
+          geminiApiKey: "test-gemini-key",
+          textModel: "gemini-2.5-flash",
+          imageModel: "gemini-2.5-flash",
+          maxOutputTokens: 2048,
+          outputSchema: "plain",
+        },
+        embedding: {
+          provider: "google",
+          textModel: "text-embedding-004",
+        },
+      },
+    }));
+  });
packages/shared/inference/ollama.ts (1)

84-95: Consider using undefined instead of NaN for failed token count.

Setting totalTokens = NaN on error could cause unexpected behavior if downstream code performs arithmetic operations. The InferenceResponse type declares totalTokens: number | undefined, so undefined would be more consistent with the type contract.

🔎 Proposed fix
       // https://github.com/ollama/ollama-js/issues/72
-      totalTokens = NaN;
+      totalTokens = undefined;
       logger.warn(
         `Got an exception from ollama, will still attempt to deserialize the response we got so far: ${e}`,
       );

Note: This requires changing the return type handling since totalTokens is initialized as number.

packages/shared/inference/live-test.ts (2)

186-210: Source file inspection test is fragile and may break on refactoring.

This test reads the openai.ts source file to verify string patterns exist. This approach couples tests to implementation details and will break if the code is refactored (e.g., extracting to constants, changing string format).

Consider testing the actual behavior instead - either by calling the client with different model names or by exporting and testing a model detection utility function.

🔎 Alternative: test exported behavior

If model detection logic is important, export a utility function and test it directly:

// In openai.ts - export the detection logic
export function shouldUseResponsesApi(model: string): boolean {
  return model.startsWith("gpt-5") || 
         model.startsWith("o1") || 
         model.startsWith("o3") || 
         model.startsWith("o4");
}

// In live-test.ts - test the function
await runTest("openai", "responses-model-check", async () => {
  const { shouldUseResponsesApi } = await import("./openai.js");
  
  if (!shouldUseResponsesApi("gpt-5.2")) throw new Error("Should detect gpt-5.2");
  if (!shouldUseResponsesApi("o1-mini")) throw new Error("Should detect o1-mini");
  if (shouldUseResponsesApi("gpt-4o")) throw new Error("Should NOT detect gpt-4o");
  
  return "Model detection working correctly";
});

499-513: Minor: { schema: undefined } can be simplified.

Using {} is cleaner since schema is optional in Partial<InferenceOptions>.

🔎 Proposed simplification
   await runTest(`factory-${provider}`, "inferFromText-plain", async () => {
     const result = await client.inferFromText(
       "Summarize in one sentence: TypeScript is a typed superset of JavaScript.",
-      { schema: undefined },
+      {},
     );
docs/docs/03-configuration/01-environment-variables.md (1)

100-101: Clarify OPENAI_REASONING_EFFORT default behavior.

Line 101 states the default is low, but also mentions that gpt-5.1 defaults to none. This could be confusing. Consider clarifying whether this env var overrides model defaults or only applies when the model doesn't have its own default.

Consider rewording to:

Reasoning effort level for GPT-5 models when using Responses API. Options: none, minimal, low, medium, high, xhigh. Default is low. Note: Some models have their own defaults (gpt-5.1 uses none, gpt-5-pro only supports high).

packages/shared/inference/google.ts (4)

23-45: Consider validating outputSchema config value.

The function reads serverConfig.inference.outputSchema and branches on "plain" vs any other value. If an invalid value is configured, it silently defaults to JSON mode which may not be the intended behavior.

🔎 Suggested improvement
 function buildGenerationConfig(
   opts: InferenceOptions,
 ): Record<string, unknown> {
   const config: Record<string, unknown> = {
     maxOutputTokens: serverConfig.inference.maxOutputTokens,
   };

   // Configure response format based on outputSchema setting
-  if (serverConfig.inference.outputSchema === "plain") {
+  const outputSchema = serverConfig.inference.outputSchema;
+  if (outputSchema === "plain") {
     config.responseMimeType = "text/plain";
-  } else {
+  } else if (outputSchema === "json" || outputSchema === "structured") {
     config.responseMimeType = "application/json";

     // If a Zod schema is provided, convert it to JSON schema for structured output
     if (opts.schema) {
       config.responseJsonSchema = zodToJsonSchema(opts.schema, {
         $refStrategy: "none",
       });
     }
+  } else {
+    throw new Error(`Invalid outputSchema value: ${outputSchema}`);
   }

   return config;
 }

80-110: Unused options: reasoningEffort, previousResponseId, store are ignored.

The InferenceOptions interface includes OpenAI Responses API-specific fields (reasoningEffort, previousResponseId, store) that are silently ignored here. This is acceptable since these are provider-specific, but consider documenting this behavior or logging a warning if these options are passed to a provider that doesn't support them.


169-187: Sequential batch processing may be slow for large inputs.

The batching logic processes each batch sequentially with await. For very large input arrays, this could be slow. Consider using Promise.all with rate limiting for parallel processing, though the current sequential approach is safer and avoids rate limit issues.


189-198: Handle empty embeddings array defensively.

If the API returns an embedding with values as undefined, it becomes an empty array. This may cause downstream issues if consumers expect embeddings of consistent dimensions.

🔎 Suggested improvement
   private async embedBatch(inputs: string[]): Promise<EmbeddingResponse> {
     const result = await this.ai.models.embedContent({
       model: serverConfig.embedding.textModel,
       contents: inputs,
     });

-    const embeddings = (result.embeddings ?? []).map((e) => e.values ?? []);
+    const embeddings = (result.embeddings ?? []).map((e) => {
+      if (!e.values || e.values.length === 0) {
+        throw new Error("Received empty embedding values from Google API");
+      }
+      return e.values;
+    });

     return { embeddings };
   }
tools/compare-models/src/inferenceClient.ts (1)

5-12: Type duplication with packages/shared/inference/types.ts.

The InferenceOptions and InferenceResponse interfaces duplicate the definitions in packages/shared/inference/types.ts. Consider importing from the shared package to maintain consistency and avoid drift.

🔎 Suggested change
 import OpenAI from "openai";
 import { zodResponseFormat } from "openai/helpers/zod";
 import { z } from "zod";
+import type { InferenceOptions, InferenceResponse } from "@karakeep/shared/inference";

-export interface InferenceOptions {
-  schema: z.ZodSchema<any> | null;
-}
-
-export interface InferenceResponse {
-  response: string;
-  totalTokens: number | undefined;
-}
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aa7a81e and d52c0e7.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (68)
  • apps/browser-extension/package.json
  • apps/mcp/package.json
  • apps/mobile/package.json
  • apps/web/components/settings/AISettings.tsx
  • apps/web/lib/clientConfig.tsx
  • apps/web/lib/i18n/locales/ar/translation.json
  • apps/web/lib/i18n/locales/cs/translation.json
  • apps/web/lib/i18n/locales/da/translation.json
  • apps/web/lib/i18n/locales/de/translation.json
  • apps/web/lib/i18n/locales/el/translation.json
  • apps/web/lib/i18n/locales/en/translation.json
  • apps/web/lib/i18n/locales/en_US/translation.json
  • apps/web/lib/i18n/locales/es/translation.json
  • apps/web/lib/i18n/locales/fa/translation.json
  • apps/web/lib/i18n/locales/fi/translation.json
  • apps/web/lib/i18n/locales/fr/translation.json
  • apps/web/lib/i18n/locales/ga/translation.json
  • apps/web/lib/i18n/locales/gl/translation.json
  • apps/web/lib/i18n/locales/hr/translation.json
  • apps/web/lib/i18n/locales/hu/translation.json
  • apps/web/lib/i18n/locales/it/translation.json
  • apps/web/lib/i18n/locales/ja/translation.json
  • apps/web/lib/i18n/locales/ko/translation.json
  • apps/web/lib/i18n/locales/nb_NO/translation.json
  • apps/web/lib/i18n/locales/nl/translation.json
  • apps/web/lib/i18n/locales/pl/translation.json
  • apps/web/lib/i18n/locales/pt/translation.json
  • apps/web/lib/i18n/locales/pt_BR/translation.json
  • apps/web/lib/i18n/locales/ru/translation.json
  • apps/web/lib/i18n/locales/sk/translation.json
  • apps/web/lib/i18n/locales/sl/translation.json
  • apps/web/lib/i18n/locales/sv/translation.json
  • apps/web/lib/i18n/locales/tr/translation.json
  • apps/web/lib/i18n/locales/uk/translation.json
  • apps/web/lib/i18n/locales/vi/translation.json
  • apps/web/lib/i18n/locales/zh/translation.json
  • apps/web/lib/i18n/locales/zhtw/translation.json
  • apps/web/package.json
  • apps/workers/package.json
  • docs/docs/03-configuration/01-environment-variables.md
  • docs/docs/03-configuration/02-different-ai-providers.md
  • package.json
  • packages/api/package.json
  • packages/benchmarks/package.json
  • packages/e2e_tests/package.json
  • packages/open-api/package.json
  • packages/shared/config.ts
  • packages/shared/index.ts
  • packages/shared/inference.ts
  • packages/shared/inference/anthropic.test.ts
  • packages/shared/inference/anthropic.ts
  • packages/shared/inference/factory.test.ts
  • packages/shared/inference/factory.ts
  • packages/shared/inference/google.test.ts
  • packages/shared/inference/google.ts
  • packages/shared/inference/index.ts
  • packages/shared/inference/live-test.ts
  • packages/shared/inference/ollama.test.ts
  • packages/shared/inference/ollama.ts
  • packages/shared/inference/openai.test.ts
  • packages/shared/inference/openai.ts
  • packages/shared/inference/types.ts
  • packages/shared/package.json
  • packages/trpc/package.json
  • tools/compare-models/package.json
  • tools/compare-models/src/bookmarkProcessor.ts
  • tools/compare-models/src/config.ts
  • tools/compare-models/src/inferenceClient.ts
💤 Files with no reviewable changes (2)
  • tools/compare-models/src/config.ts
  • packages/shared/inference.ts
🧰 Additional context used
📓 Path-based instructions (6)
**/*.{ts,tsx,js,jsx,json,css,md}

📄 CodeRabbit inference engine (AGENTS.md)

Format code using Prettier according to project standards

Files:

  • packages/api/package.json
  • apps/workers/package.json
  • packages/open-api/package.json
  • apps/web/lib/i18n/locales/ja/translation.json
  • packages/e2e_tests/package.json
  • apps/web/package.json
  • packages/benchmarks/package.json
  • apps/web/lib/i18n/locales/cs/translation.json
  • packages/shared/inference/factory.test.ts
  • apps/web/lib/i18n/locales/de/translation.json
  • apps/web/components/settings/AISettings.tsx
  • apps/web/lib/i18n/locales/hu/translation.json
  • apps/web/lib/i18n/locales/en/translation.json
  • apps/web/lib/i18n/locales/gl/translation.json
  • packages/shared/inference/anthropic.test.ts
  • packages/shared/index.ts
  • packages/shared/inference/openai.test.ts
  • packages/shared/inference/index.ts
  • apps/web/lib/i18n/locales/el/translation.json
  • packages/trpc/package.json
  • packages/shared/inference/openai.ts
  • packages/shared/inference/anthropic.ts
  • packages/shared/inference/google.test.ts
  • packages/shared/inference/factory.ts
  • packages/shared/inference/types.ts
  • apps/web/lib/clientConfig.tsx
  • packages/shared/inference/ollama.ts
  • apps/web/lib/i18n/locales/en_US/translation.json
  • packages/shared/inference/google.ts
  • apps/web/lib/i18n/locales/sk/translation.json
  • apps/mobile/package.json
  • apps/web/lib/i18n/locales/nb_NO/translation.json
  • apps/web/lib/i18n/locales/zhtw/translation.json
  • apps/web/lib/i18n/locales/sv/translation.json
  • apps/web/lib/i18n/locales/tr/translation.json
  • apps/web/lib/i18n/locales/fa/translation.json
  • apps/web/lib/i18n/locales/vi/translation.json
  • apps/web/lib/i18n/locales/ru/translation.json
  • apps/browser-extension/package.json
  • packages/shared/inference/ollama.test.ts
  • apps/web/lib/i18n/locales/sl/translation.json
  • apps/web/lib/i18n/locales/ar/translation.json
  • packages/shared/package.json
  • packages/shared/config.ts
  • apps/mcp/package.json
  • apps/web/lib/i18n/locales/uk/translation.json
  • apps/web/lib/i18n/locales/pl/translation.json
  • apps/web/lib/i18n/locales/zh/translation.json
  • tools/compare-models/package.json
  • apps/web/lib/i18n/locales/it/translation.json
  • apps/web/lib/i18n/locales/fi/translation.json
  • apps/web/lib/i18n/locales/hr/translation.json
  • docs/docs/03-configuration/02-different-ai-providers.md
  • apps/web/lib/i18n/locales/es/translation.json
  • tools/compare-models/src/inferenceClient.ts
  • apps/web/lib/i18n/locales/pt_BR/translation.json
  • packages/shared/inference/live-test.ts
  • apps/web/lib/i18n/locales/da/translation.json
  • apps/web/lib/i18n/locales/pt/translation.json
  • apps/web/lib/i18n/locales/ko/translation.json
  • package.json
  • tools/compare-models/src/bookmarkProcessor.ts
  • apps/web/lib/i18n/locales/ga/translation.json
  • apps/web/lib/i18n/locales/fr/translation.json
  • apps/web/lib/i18n/locales/nl/translation.json
  • docs/docs/03-configuration/01-environment-variables.md
**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use TypeScript for type safety in all source files

Files:

  • packages/shared/inference/factory.test.ts
  • apps/web/components/settings/AISettings.tsx
  • packages/shared/inference/anthropic.test.ts
  • packages/shared/index.ts
  • packages/shared/inference/openai.test.ts
  • packages/shared/inference/index.ts
  • packages/shared/inference/openai.ts
  • packages/shared/inference/anthropic.ts
  • packages/shared/inference/google.test.ts
  • packages/shared/inference/factory.ts
  • packages/shared/inference/types.ts
  • apps/web/lib/clientConfig.tsx
  • packages/shared/inference/ollama.ts
  • packages/shared/inference/google.ts
  • packages/shared/inference/ollama.test.ts
  • packages/shared/config.ts
  • tools/compare-models/src/inferenceClient.ts
  • packages/shared/inference/live-test.ts
  • tools/compare-models/src/bookmarkProcessor.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

Lint code using oxlint and fix issues with pnpm lint:fix

Files:

  • packages/shared/inference/factory.test.ts
  • apps/web/components/settings/AISettings.tsx
  • packages/shared/inference/anthropic.test.ts
  • packages/shared/index.ts
  • packages/shared/inference/openai.test.ts
  • packages/shared/inference/index.ts
  • packages/shared/inference/openai.ts
  • packages/shared/inference/anthropic.ts
  • packages/shared/inference/google.test.ts
  • packages/shared/inference/factory.ts
  • packages/shared/inference/types.ts
  • apps/web/lib/clientConfig.tsx
  • packages/shared/inference/ollama.ts
  • packages/shared/inference/google.ts
  • packages/shared/inference/ollama.test.ts
  • packages/shared/config.ts
  • tools/compare-models/src/inferenceClient.ts
  • packages/shared/inference/live-test.ts
  • tools/compare-models/src/bookmarkProcessor.ts
**/*.{test,spec}.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use Vitest for writing and running tests

Files:

  • packages/shared/inference/factory.test.ts
  • packages/shared/inference/anthropic.test.ts
  • packages/shared/inference/openai.test.ts
  • packages/shared/inference/google.test.ts
  • packages/shared/inference/ollama.test.ts
packages/shared/**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Organize shared code and types in the packages/shared directory for use across packages

Files:

  • packages/shared/inference/factory.test.ts
  • packages/shared/inference/anthropic.test.ts
  • packages/shared/index.ts
  • packages/shared/inference/openai.test.ts
  • packages/shared/inference/index.ts
  • packages/shared/inference/openai.ts
  • packages/shared/inference/anthropic.ts
  • packages/shared/inference/google.test.ts
  • packages/shared/inference/factory.ts
  • packages/shared/inference/types.ts
  • packages/shared/inference/ollama.ts
  • packages/shared/inference/google.ts
  • packages/shared/inference/ollama.test.ts
  • packages/shared/config.ts
  • packages/shared/inference/live-test.ts
apps/web/**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

apps/web/**/*.{ts,tsx}: Use Tailwind CSS for styling in the web application
Use Next.js for building the main web application

Files:

  • apps/web/components/settings/AISettings.tsx
  • apps/web/lib/clientConfig.tsx
🧠 Learnings (9)
📚 Learning: 2025-11-27T16:02:47.440Z
Learnt from: CR
Repo: karakeep-app/karakeep PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-27T16:02:47.440Z
Learning: Applies to **/*.{ts,tsx,js,jsx,json,css,md} : Format code using Prettier according to project standards

Applied to files:

  • apps/workers/package.json
📚 Learning: 2025-11-27T16:02:47.440Z
Learnt from: CR
Repo: karakeep-app/karakeep PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-27T16:02:47.440Z
Learning: Applies to packages/e2e_tests/**/*.{ts,tsx,test,spec} : Write end-to-end tests in the `packages/e2e_tests` directory

Applied to files:

  • packages/shared/inference/factory.test.ts
  • packages/shared/inference/openai.test.ts
📚 Learning: 2026-01-03T11:36:34.916Z
Learnt from: RobertRosca
Repo: karakeep-app/karakeep PR: 2339
File: packages/shared/config.ts:62-62
Timestamp: 2026-01-03T11:36:34.916Z
Learning: In packages/shared/config.ts, enforce OpenAI SDK version compatibility: service_tier values are limited to ["auto", "default", "flex"]. The "priority" tier requires OpenAI SDK >= v5.8.0. Add a guard or validation in config to prevent using priority tier unless the SDK is upgraded (v5.8.0+). Consider documenting this constraint and adding a unit test or lint rule to ensure only allowed service_tier values are used based on the installed SDK version.

Applied to files:

  • packages/shared/inference/factory.test.ts
  • packages/shared/inference/anthropic.test.ts
  • packages/shared/index.ts
  • packages/shared/inference/openai.test.ts
  • packages/shared/inference/index.ts
  • packages/shared/inference/openai.ts
  • packages/shared/inference/anthropic.ts
  • packages/shared/inference/google.test.ts
  • packages/shared/inference/factory.ts
  • packages/shared/inference/types.ts
  • packages/shared/inference/ollama.ts
  • packages/shared/inference/google.ts
  • packages/shared/inference/ollama.test.ts
  • packages/shared/config.ts
  • packages/shared/inference/live-test.ts
📚 Learning: 2025-11-27T16:02:47.440Z
Learnt from: CR
Repo: karakeep-app/karakeep PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-27T16:02:47.440Z
Learning: Applies to packages/shared/**/*.{ts,tsx} : Organize shared code and types in the `packages/shared` directory for use across packages

Applied to files:

  • packages/shared/index.ts
📚 Learning: 2025-11-27T16:02:47.440Z
Learnt from: CR
Repo: karakeep-app/karakeep PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-27T16:02:47.440Z
Learning: Applies to packages/shared-server/**/*.{ts,tsx} : Place shared server-side logic in `packages/shared-server`

Applied to files:

  • packages/shared/index.ts
📚 Learning: 2025-11-27T16:02:47.440Z
Learnt from: CR
Repo: karakeep-app/karakeep PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-27T16:02:47.440Z
Learning: Applies to packages/shared-react/**/*.{ts,tsx} : Place shared React components and hooks in `packages/shared-react`

Applied to files:

  • packages/shared/index.ts
📚 Learning: 2025-11-27T16:02:47.440Z
Learnt from: CR
Repo: karakeep-app/karakeep PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-27T16:02:47.440Z
Learning: Applies to **/*.{test,spec}.{ts,tsx,js,jsx} : Use Vitest for writing and running tests

Applied to files:

  • packages/shared/inference/openai.test.ts
  • packages/shared/inference/google.test.ts
📚 Learning: 2025-11-27T16:02:47.440Z
Learnt from: CR
Repo: karakeep-app/karakeep PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-27T16:02:47.440Z
Learning: Applies to **/*.{ts,tsx,js,jsx} : Lint code using oxlint and fix issues with `pnpm lint:fix`

Applied to files:

  • package.json
📚 Learning: 2025-11-27T16:02:47.440Z
Learnt from: CR
Repo: karakeep-app/karakeep PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-27T16:02:47.440Z
Learning: Use Turborepo for managing the monorepo build system and tasks

Applied to files:

  • package.json
🧬 Code graph analysis (10)
packages/shared/inference/factory.test.ts (1)
packages/shared/inference/factory.ts (2)
  • InferenceClientFactory (17-41)
  • EmbeddingClientFactory (54-76)
apps/web/components/settings/AISettings.tsx (4)
apps/web/lib/i18n/client.ts (1)
  • useTranslation (33-33)
apps/web/lib/i18n/server.ts (1)
  • useTranslation (23-36)
packages/shared/config.ts (1)
  • clientConfig (524-552)
apps/web/lib/clientConfig.tsx (1)
  • useClientConfig (29-31)
packages/shared/inference/openai.test.ts (1)
packages/shared/inference/openai.ts (2)
  • OpenAIInferenceClient (136-405)
  • OpenAIEmbeddingClient (412-432)
packages/shared/inference/google.test.ts (1)
packages/shared/inference/google.ts (2)
  • GoogleGeminiInferenceClient (73-153)
  • GoogleEmbeddingClient (162-199)
packages/shared/inference/factory.ts (1)
packages/shared/inference/types.ts (2)
  • InferenceClient (26-37)
  • EmbeddingClient (39-41)
packages/shared/inference/types.ts (1)
tools/compare-models/src/inferenceClient.ts (3)
  • InferenceResponse (9-12)
  • InferenceOptions (5-7)
  • InferenceClient (14-131)
packages/shared/inference/ollama.ts (2)
packages/shared/inference/index.ts (7)
  • InferenceClient (19-19)
  • InferenceOptions (23-23)
  • InferenceResponse (21-21)
  • defaultInferenceOptions (26-26)
  • OllamaEmbeddingClient (35-35)
  • EmbeddingClient (20-20)
  • EmbeddingResponse (22-22)
packages/shared/inference/types.ts (6)
  • InferenceClient (26-37)
  • InferenceOptions (12-20)
  • InferenceResponse (3-6)
  • defaultInferenceOptions (22-24)
  • EmbeddingClient (39-41)
  • EmbeddingResponse (8-10)
packages/shared/inference/google.ts (2)
packages/shared/inference/index.ts (8)
  • InferenceOptions (23-23)
  • GoogleGeminiInferenceClient (34-34)
  • InferenceClient (19-19)
  • InferenceResponse (21-21)
  • defaultInferenceOptions (26-26)
  • GoogleEmbeddingClient (34-34)
  • EmbeddingClient (20-20)
  • EmbeddingResponse (22-22)
packages/shared/inference/types.ts (6)
  • InferenceOptions (12-20)
  • InferenceClient (26-37)
  • InferenceResponse (3-6)
  • defaultInferenceOptions (22-24)
  • EmbeddingClient (39-41)
  • EmbeddingResponse (8-10)
packages/shared/inference/ollama.test.ts (2)
packages/shared/inference/index.ts (2)
  • OllamaInferenceClient (35-35)
  • OllamaEmbeddingClient (35-35)
packages/shared/inference/ollama.ts (2)
  • OllamaInferenceClient (21-133)
  • OllamaEmbeddingClient (140-162)
tools/compare-models/src/inferenceClient.ts (1)
packages/shared/inference/types.ts (1)
  • InferenceResponse (3-6)
🪛 markdownlint-cli2 (0.18.1)
docs/docs/03-configuration/02-different-ai-providers.md

9-9: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


20-20: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


60-60: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


74-74: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


93-93: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


103-103: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


123-123: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


135-135: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


176-176: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


192-192: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


210-210: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


225-225: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


247-247: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


265-265: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review

@greptile-apps
Copy link

greptile-apps bot commented Jan 3, 2026

Greptile Summary

Refactored monolithic inference module into modular multi-provider architecture supporting OpenAI, Anthropic Claude, Google Gemini, and Ollama with native SDK integrations.

Key Changes:

  • Migrated from single packages/shared/inference.ts (385 lines) to modular packages/shared/inference/ directory with separate provider implementations
  • Introduced factory pattern with InferenceClientFactory and EmbeddingClientFactory for provider selection
  • Added explicit INFERENCE_PROVIDER and EMBEDDING_PROVIDER environment variables with auto-detection fallback
  • Implemented OpenAI Responses API support for GPT-5/o-series models with reasoning effort control
  • Added Anthropic Claude 4.5 structured outputs (beta) with model validation
  • Integrated Google Gemini with JSON schema support and automatic embedding batch processing
  • Updated Ollama client from generate to chat API with improved abort signal handling
  • Removed deprecated INFERENCE_USE_MAX_COMPLETION_TOKENS and INFERENCE_SUPPORTS_STRUCTURED_OUTPUT (auto-detected per provider)
  • Added ProviderIndicator UI component showing active providers and models in settings
  • Upgraded zod from 3.24.2 to 3.25.0 for Anthropic SDK compatibility
  • All 106 unit tests pass with comprehensive coverage of factory logic, provider implementations, and edge cases

Confidence Score: 5/5

  • This PR is safe to merge with high confidence
  • The refactoring is well-architected with clean separation of concerns, comprehensive test coverage (106 unit tests + 19 live API tests), proper error handling, and maintains backward compatibility through auto-detection logic. The code follows TypeScript best practices with exhaustiveness checking, defensive validation, and clear documentation.
  • No files require special attention

Important Files Changed

Filename Overview
packages/shared/inference/factory.ts Implements factory pattern with exhaustiveness checking for provider selection, properly separating inference and embedding clients
packages/shared/inference/openai.ts Comprehensive OpenAI implementation supporting both Chat Completions and Responses API with auto-detection for GPT-5/o-series models and reasoning effort
packages/shared/inference/anthropic.ts Anthropic Claude implementation with structured outputs beta support, proper model validation, and vision capabilities
packages/shared/inference/google.ts Google Gemini client with JSON schema support and automatic batching for embeddings (100 per batch)
packages/shared/inference/ollama.ts Migrated from generate to chat API with improved abort handling using addEventListener, maintains workaround for streaming error handling
packages/shared/config.ts Added multi-provider configuration with auto-detection logic, separate embedding provider, removed deprecated flags, added OpenAI Responses API options
apps/web/components/settings/AISettings.tsx Added ProviderIndicator component displaying active inference/embedding providers and models in a responsive grid layout

Sequence Diagram

sequenceDiagram
    participant App as Application Code
    participant Factory as InferenceClientFactory
    participant Config as serverConfig
    participant OpenAI as OpenAIInferenceClient
    participant Anthropic as AnthropicInferenceClient
    participant Google as GoogleGeminiInferenceClient
    participant Ollama as OllamaInferenceClient
    participant SDK as Provider SDK

    App->>Factory: InferenceClientFactory.build()
    Factory->>Config: Get inference.provider
    
    alt provider: openai
        Factory->>OpenAI: new OpenAIInferenceClient()
        OpenAI->>Config: Get OpenAI config
        OpenAI-->>Factory: client instance
    else provider: anthropic
        Factory->>Anthropic: new AnthropicInferenceClient()
        Anthropic->>Config: Get Anthropic config
        Anthropic-->>Factory: client instance
    else provider: google
        Factory->>Google: new GoogleGeminiInferenceClient()
        Google->>Config: Get Gemini config
        Google-->>Factory: client instance
    else provider: ollama
        Factory->>Ollama: new OllamaInferenceClient()
        Ollama->>Config: Get Ollama config
        Ollama-->>Factory: client instance
    else provider: null
        Factory-->>App: null
    end
    
    Factory-->>App: InferenceClient

    App->>OpenAI: inferFromText(prompt, options)
    
    alt shouldUseResponsesApi()
        OpenAI->>SDK: responses.create() [GPT-5/o-series]
        Note over OpenAI,SDK: Supports reasoning effort,<br/>conversation, structured outputs
    else
        OpenAI->>SDK: chat.completions.create()
        Note over OpenAI,SDK: Legacy Chat Completions API<br/>for GPT-4 and compatible models
    end
    
    SDK-->>OpenAI: API response
    OpenAI-->>App: InferenceResponse {response, totalTokens}
Loading

@greptile-apps
Copy link

greptile-apps bot commented Jan 3, 2026

Greptile found no issues!

From now on, if a review finishes and we haven't found any issues, we will not post anything, but you can confirm that we reviewed your changes in the status check section.

This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".

@camjac251 camjac251 force-pushed the feat/multi-provider-inference branch from d52c0e7 to c0206dc Compare January 3, 2026 20:42
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

♻️ Duplicate comments (1)
packages/shared/inference/types.ts (1)

16-24: Verify reasoning_effort values match OpenAI API specification.

The reasoningEffort type includes values ("none", "minimal", "xhigh") that may not be supported by the standard OpenAI Responses API. According to previous verification, the documented values for o-series models are "low", "medium", and "high" (default: "medium").

While the JSDoc comments describe model-specific capabilities, the type itself permits values that may cause runtime errors if passed to models that don't support them. Consider either:

  1. Restricting the type to the standard API values: "low" | "medium" | "high"
  2. Adding runtime validation that checks the model and reasoning effort combination
  3. Separating the type into different unions for different model families
OpenAI GPT-5 reasoning_effort parameter supported values and model-specific capabilities 2026

Based on past review findings that identified API specification mismatches.

🧹 Nitpick comments (13)
tools/compare-models/src/inferenceClient.ts (3)

104-130: Consider edge cases in JSON extraction fallback logic.

The greedy regex /\{[\s\S]*\}/ on line 119 matches from the first { to the last } in the string, which works for single objects but could produce invalid JSON if the response contains multiple separate JSON objects or malformed content between braces.

Also, if all parsing strategies fail, the final JSON.parse(trimmedResponse) throws a generic SyntaxError. Wrapping this in a descriptive error would improve debuggability.

🔎 Optional improvement
-      return JSON.parse(trimmedResponse);
+      try {
+        return JSON.parse(trimmedResponse);
+      } catch (e) {
+        throw new Error(
+          `Failed to parse JSON from response: ${e instanceof Error ? e.message : String(e)}`,
+        );
+      }
     }
   }

133-133: Move import to the top of the file.

The config import is placed after the class definition rather than with other imports at the top of the file. This is unconventional and reduces readability.

🔎 Proposed fix

Move the import to the top of the file with other imports:

 import OpenAI from "openai";
 import { zodResponseFormat } from "openai/helpers/zod";
 import { z } from "zod";
+
+import { config } from "./config";

Then remove line 133.


144-149: Wrapper function signature limits functionality.

The inferTags wrapper only accepts client and prompt, but the underlying client.inferTags() method supports additional parameters (model, lang, customPrompts). If callers need these options, they must bypass the wrapper and call the class method directly.

If this is intentional to keep the API simple for the compare-models tool, this is fine. Otherwise, consider exposing these options.

docs/docs/03-configuration/02-different-ai-providers.md (6)

9-14: Add language identifier to code block.

The fenced code block should specify a language for proper syntax highlighting. Use bash or env:

🔎 Proposed fix
-```
+```bash
 INFERENCE_PROVIDER=openai    # Use OpenAI
 INFERENCE_PROVIDER=anthropic # Use Anthropic Claude
 INFERENCE_PROVIDER=google    # Use Google Gemini
 INFERENCE_PROVIDER=ollama    # Use Ollama (local)
</details>

Based on static analysis hints from markdownlint.

---

`20-27`: **Add language identifier to code blocks.**

Multiple code blocks throughout this section lack language specifiers. They should be marked as `bash` or `env` for proper syntax highlighting.



Example for this block:
<details>
<summary>🔎 Proposed fix</summary>

```diff
-```
+```bash
 INFERENCE_PROVIDER=openai  # Optional, auto-detected if OPENAI_API_KEY is set
 OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

This applies to code blocks at lines: 20, 60, 74, 93, 103, 123 (same issue).

Based on static analysis hints from markdownlint.


135-150: Add language identifiers to code blocks.

Code blocks at lines 135 and 176 need language specifiers (bash or env).

Based on static analysis hints from markdownlint.

Also applies to: 176-186


192-241: Add language identifiers to code blocks.

Code blocks at lines 192, 210, and 225 need language specifiers (bash or env).

Based on static analysis hints from markdownlint.


247-257: Add language identifier to code block.

🔎 Proposed fix
-```
+```bash
 INFERENCE_PROVIDER=ollama  # Optional, auto-detected if OLLAMA_BASE_URL is set
 OLLAMA_BASE_URL=http://ollama.mylab.com:11434

Based on static analysis hints from markdownlint.


265-272: Add language identifier to code block.

🔎 Proposed fix
-```
+```bash
 EMBEDDING_PROVIDER=openai   # Use OpenAI for embeddings
 EMBEDDING_PROVIDER=google   # Use Google for embeddings
 EMBEDDING_PROVIDER=ollama   # Use Ollama for embeddings

Based on static analysis hints from markdownlint.

packages/shared/inference/anthropic.ts (1)

67-71: Model matching with startsWith may be overly permissive.

The startsWith check allows any model string beginning with a supported model name to pass validation. This enables versioned model names (e.g., claude-sonnet-4-5-20250929-v2) but could also match unintended models.

This is likely intentional for flexibility with model versioning. Consider documenting this behavior in the function's JSDoc.

packages/shared/inference/openai.ts (2)

58-61: Verify the necessity of custom headers.

The X-Title and HTTP-Referer headers are non-standard for OpenAI API calls. While they may serve tracking or identification purposes, OpenAI's API doesn't require them. Confirm these headers provide value and don't cause issues with OpenAI's request validation.


321-329: Consider using a more specific type instead of any.

While the eslint-disable comment acknowledges the issue, consider defining a proper interface for the Responses API request object to improve type safety. This would help catch configuration errors at compile time.

💡 Suggested improvement
+interface ResponsesApiRequest {
+  model: string;
+  input: string | unknown[];
+  text: ReturnType<typeof this.getResponsesTextFormat>;
+  store: boolean;
+  temperature: number;
+  top_p: number;
+  reasoning?: { effort: string };
+  max_output_tokens?: number;
+  previous_response_id?: string;
+}
+
 private buildResponsesRequest(
   model: string,
   input: string | unknown[],
   opts: InferenceOptions,
 ) {
-  // eslint-disable-next-line @typescript-eslint/no-explicit-any
-  const requestObj: any = {
+  const requestObj: ResponsesApiRequest = {
     model,
     input,
packages/shared/config.ts (1)

296-347: Document provider auto-detection priority order.

The auto-detection logic (lines 308-311) prioritizes providers in order: OpenAI → Anthropic → Google → Ollama. When multiple API keys are configured, the first matched provider will be selected. This could be unexpected if users have multiple credentials configured and don't set INFERENCE_PROVIDER explicitly.

Consider adding a log message or documentation to clarify this behavior and encourage explicit INFERENCE_PROVIDER configuration when multiple credentials exist.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d52c0e7 and c0206dc.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (68)
  • apps/browser-extension/package.json
  • apps/mcp/package.json
  • apps/mobile/package.json
  • apps/web/components/settings/AISettings.tsx
  • apps/web/lib/clientConfig.tsx
  • apps/web/lib/i18n/locales/ar/translation.json
  • apps/web/lib/i18n/locales/cs/translation.json
  • apps/web/lib/i18n/locales/da/translation.json
  • apps/web/lib/i18n/locales/de/translation.json
  • apps/web/lib/i18n/locales/el/translation.json
  • apps/web/lib/i18n/locales/en/translation.json
  • apps/web/lib/i18n/locales/en_US/translation.json
  • apps/web/lib/i18n/locales/es/translation.json
  • apps/web/lib/i18n/locales/fa/translation.json
  • apps/web/lib/i18n/locales/fi/translation.json
  • apps/web/lib/i18n/locales/fr/translation.json
  • apps/web/lib/i18n/locales/ga/translation.json
  • apps/web/lib/i18n/locales/gl/translation.json
  • apps/web/lib/i18n/locales/hr/translation.json
  • apps/web/lib/i18n/locales/hu/translation.json
  • apps/web/lib/i18n/locales/it/translation.json
  • apps/web/lib/i18n/locales/ja/translation.json
  • apps/web/lib/i18n/locales/ko/translation.json
  • apps/web/lib/i18n/locales/nb_NO/translation.json
  • apps/web/lib/i18n/locales/nl/translation.json
  • apps/web/lib/i18n/locales/pl/translation.json
  • apps/web/lib/i18n/locales/pt/translation.json
  • apps/web/lib/i18n/locales/pt_BR/translation.json
  • apps/web/lib/i18n/locales/ru/translation.json
  • apps/web/lib/i18n/locales/sk/translation.json
  • apps/web/lib/i18n/locales/sl/translation.json
  • apps/web/lib/i18n/locales/sv/translation.json
  • apps/web/lib/i18n/locales/tr/translation.json
  • apps/web/lib/i18n/locales/uk/translation.json
  • apps/web/lib/i18n/locales/vi/translation.json
  • apps/web/lib/i18n/locales/zh/translation.json
  • apps/web/lib/i18n/locales/zhtw/translation.json
  • apps/web/package.json
  • apps/workers/package.json
  • docs/docs/03-configuration/01-environment-variables.md
  • docs/docs/03-configuration/02-different-ai-providers.md
  • package.json
  • packages/api/package.json
  • packages/benchmarks/package.json
  • packages/e2e_tests/package.json
  • packages/open-api/package.json
  • packages/shared/config.ts
  • packages/shared/index.ts
  • packages/shared/inference.ts
  • packages/shared/inference/anthropic.test.ts
  • packages/shared/inference/anthropic.ts
  • packages/shared/inference/factory.test.ts
  • packages/shared/inference/factory.ts
  • packages/shared/inference/google.test.ts
  • packages/shared/inference/google.ts
  • packages/shared/inference/index.ts
  • packages/shared/inference/live-test.ts
  • packages/shared/inference/ollama.test.ts
  • packages/shared/inference/ollama.ts
  • packages/shared/inference/openai.test.ts
  • packages/shared/inference/openai.ts
  • packages/shared/inference/types.ts
  • packages/shared/package.json
  • packages/trpc/package.json
  • tools/compare-models/package.json
  • tools/compare-models/src/bookmarkProcessor.ts
  • tools/compare-models/src/config.ts
  • tools/compare-models/src/inferenceClient.ts
💤 Files with no reviewable changes (2)
  • tools/compare-models/src/config.ts
  • packages/shared/inference.ts
🚧 Files skipped from review as they are similar to previous changes (41)
  • packages/shared/index.ts
  • apps/web/lib/i18n/locales/sl/translation.json
  • apps/mcp/package.json
  • apps/web/lib/i18n/locales/sk/translation.json
  • apps/web/lib/i18n/locales/nb_NO/translation.json
  • apps/web/lib/i18n/locales/ja/translation.json
  • packages/trpc/package.json
  • tools/compare-models/package.json
  • apps/web/lib/i18n/locales/ar/translation.json
  • apps/web/lib/i18n/locales/gl/translation.json
  • apps/web/lib/i18n/locales/sv/translation.json
  • packages/e2e_tests/package.json
  • apps/web/lib/i18n/locales/nl/translation.json
  • apps/browser-extension/package.json
  • packages/shared/inference/factory.test.ts
  • apps/web/lib/i18n/locales/it/translation.json
  • apps/web/lib/i18n/locales/hr/translation.json
  • apps/web/lib/i18n/locales/hu/translation.json
  • apps/mobile/package.json
  • apps/web/lib/i18n/locales/cs/translation.json
  • apps/web/lib/i18n/locales/fi/translation.json
  • apps/web/lib/i18n/locales/de/translation.json
  • apps/web/package.json
  • apps/web/lib/i18n/locales/fr/translation.json
  • apps/web/lib/i18n/locales/da/translation.json
  • apps/web/lib/i18n/locales/tr/translation.json
  • apps/web/lib/i18n/locales/ru/translation.json
  • packages/shared/package.json
  • apps/web/lib/i18n/locales/pl/translation.json
  • packages/shared/inference/factory.ts
  • packages/shared/inference/index.ts
  • apps/web/lib/clientConfig.tsx
  • apps/web/lib/i18n/locales/en/translation.json
  • apps/web/lib/i18n/locales/en_US/translation.json
  • packages/api/package.json
  • tools/compare-models/src/bookmarkProcessor.ts
  • package.json
  • apps/web/lib/i18n/locales/uk/translation.json
  • apps/web/lib/i18n/locales/ga/translation.json
  • packages/benchmarks/package.json
  • apps/web/lib/i18n/locales/zh/translation.json
🧰 Additional context used
📓 Path-based instructions (6)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use TypeScript for type safety in all source files

Files:

  • packages/shared/inference/openai.test.ts
  • apps/web/components/settings/AISettings.tsx
  • packages/shared/inference/openai.ts
  • packages/shared/inference/google.test.ts
  • packages/shared/inference/anthropic.ts
  • packages/shared/inference/ollama.ts
  • packages/shared/inference/anthropic.test.ts
  • packages/shared/inference/live-test.ts
  • packages/shared/inference/ollama.test.ts
  • packages/shared/inference/google.ts
  • packages/shared/config.ts
  • packages/shared/inference/types.ts
  • tools/compare-models/src/inferenceClient.ts
**/*.{ts,tsx,js,jsx,json,css,md}

📄 CodeRabbit inference engine (AGENTS.md)

Format code using Prettier according to project standards

Files:

  • packages/shared/inference/openai.test.ts
  • apps/web/components/settings/AISettings.tsx
  • packages/shared/inference/openai.ts
  • docs/docs/03-configuration/02-different-ai-providers.md
  • packages/shared/inference/google.test.ts
  • packages/shared/inference/anthropic.ts
  • apps/web/lib/i18n/locales/es/translation.json
  • packages/shared/inference/ollama.ts
  • apps/workers/package.json
  • packages/shared/inference/anthropic.test.ts
  • packages/shared/inference/live-test.ts
  • packages/open-api/package.json
  • docs/docs/03-configuration/01-environment-variables.md
  • packages/shared/inference/ollama.test.ts
  • apps/web/lib/i18n/locales/fa/translation.json
  • packages/shared/inference/google.ts
  • apps/web/lib/i18n/locales/pt_BR/translation.json
  • packages/shared/config.ts
  • packages/shared/inference/types.ts
  • apps/web/lib/i18n/locales/vi/translation.json
  • apps/web/lib/i18n/locales/el/translation.json
  • apps/web/lib/i18n/locales/zhtw/translation.json
  • tools/compare-models/src/inferenceClient.ts
  • apps/web/lib/i18n/locales/pt/translation.json
  • apps/web/lib/i18n/locales/ko/translation.json
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

Lint code using oxlint and fix issues with pnpm lint:fix

Files:

  • packages/shared/inference/openai.test.ts
  • apps/web/components/settings/AISettings.tsx
  • packages/shared/inference/openai.ts
  • packages/shared/inference/google.test.ts
  • packages/shared/inference/anthropic.ts
  • packages/shared/inference/ollama.ts
  • packages/shared/inference/anthropic.test.ts
  • packages/shared/inference/live-test.ts
  • packages/shared/inference/ollama.test.ts
  • packages/shared/inference/google.ts
  • packages/shared/config.ts
  • packages/shared/inference/types.ts
  • tools/compare-models/src/inferenceClient.ts
**/*.{test,spec}.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use Vitest for writing and running tests

Files:

  • packages/shared/inference/openai.test.ts
  • packages/shared/inference/google.test.ts
  • packages/shared/inference/anthropic.test.ts
  • packages/shared/inference/ollama.test.ts
packages/shared/**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Organize shared code and types in the packages/shared directory for use across packages

Files:

  • packages/shared/inference/openai.test.ts
  • packages/shared/inference/openai.ts
  • packages/shared/inference/google.test.ts
  • packages/shared/inference/anthropic.ts
  • packages/shared/inference/ollama.ts
  • packages/shared/inference/anthropic.test.ts
  • packages/shared/inference/live-test.ts
  • packages/shared/inference/ollama.test.ts
  • packages/shared/inference/google.ts
  • packages/shared/config.ts
  • packages/shared/inference/types.ts
apps/web/**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

apps/web/**/*.{ts,tsx}: Use Tailwind CSS for styling in the web application
Use Next.js for building the main web application

Files:

  • apps/web/components/settings/AISettings.tsx
🧠 Learnings (4)
📚 Learning: 2025-11-27T16:02:47.440Z
Learnt from: CR
Repo: karakeep-app/karakeep PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-27T16:02:47.440Z
Learning: Applies to packages/e2e_tests/**/*.{ts,tsx,test,spec} : Write end-to-end tests in the `packages/e2e_tests` directory

Applied to files:

  • packages/shared/inference/openai.test.ts
  • packages/shared/inference/live-test.ts
📚 Learning: 2026-01-03T11:36:34.916Z
Learnt from: RobertRosca
Repo: karakeep-app/karakeep PR: 2339
File: packages/shared/config.ts:62-62
Timestamp: 2026-01-03T11:36:34.916Z
Learning: In packages/shared/config.ts, enforce OpenAI SDK version compatibility: service_tier values are limited to ["auto", "default", "flex"]. The "priority" tier requires OpenAI SDK >= v5.8.0. Add a guard or validation in config to prevent using priority tier unless the SDK is upgraded (v5.8.0+). Consider documenting this constraint and adding a unit test or lint rule to ensure only allowed service_tier values are used based on the installed SDK version.

Applied to files:

  • packages/shared/inference/openai.test.ts
  • packages/shared/inference/openai.ts
  • packages/shared/inference/google.test.ts
  • packages/shared/inference/anthropic.ts
  • packages/shared/inference/ollama.ts
  • packages/shared/inference/anthropic.test.ts
  • packages/shared/inference/live-test.ts
  • packages/shared/inference/ollama.test.ts
  • packages/shared/inference/google.ts
  • packages/shared/config.ts
  • packages/shared/inference/types.ts
📚 Learning: 2025-11-27T16:02:47.440Z
Learnt from: CR
Repo: karakeep-app/karakeep PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-27T16:02:47.440Z
Learning: Applies to **/*.{test,spec}.{ts,tsx,js,jsx} : Use Vitest for writing and running tests

Applied to files:

  • packages/shared/inference/google.test.ts
📚 Learning: 2025-11-27T16:02:47.440Z
Learnt from: CR
Repo: karakeep-app/karakeep PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-27T16:02:47.440Z
Learning: Applies to **/*.{ts,tsx,js,jsx,json,css,md} : Format code using Prettier according to project standards

Applied to files:

  • apps/workers/package.json
🧬 Code graph analysis (7)
packages/shared/inference/openai.ts (2)
packages/shared/inference/index.ts (6)
  • InferenceOptions (23-23)
  • InferenceResponse (21-21)
  • defaultInferenceOptions (26-26)
  • OpenAIEmbeddingClient (32-32)
  • EmbeddingClient (20-20)
  • EmbeddingResponse (22-22)
packages/shared/inference/types.ts (5)
  • InferenceOptions (12-27)
  • InferenceResponse (3-6)
  • defaultInferenceOptions (29-31)
  • EmbeddingClient (46-48)
  • EmbeddingResponse (8-10)
packages/shared/inference/anthropic.ts (2)
packages/shared/inference/index.ts (4)
  • InferenceOptions (23-23)
  • InferenceResponse (21-21)
  • InferenceClient (19-19)
  • defaultInferenceOptions (26-26)
packages/shared/inference/types.ts (4)
  • InferenceOptions (12-27)
  • InferenceResponse (3-6)
  • InferenceClient (33-44)
  • defaultInferenceOptions (29-31)
packages/shared/inference/ollama.ts (3)
packages/shared/inference/index.ts (8)
  • OllamaInferenceClient (35-35)
  • InferenceClient (19-19)
  • InferenceOptions (23-23)
  • InferenceResponse (21-21)
  • defaultInferenceOptions (26-26)
  • OllamaEmbeddingClient (35-35)
  • EmbeddingClient (20-20)
  • EmbeddingResponse (22-22)
packages/shared/inference/types.ts (6)
  • InferenceClient (33-44)
  • InferenceOptions (12-27)
  • InferenceResponse (3-6)
  • defaultInferenceOptions (29-31)
  • EmbeddingClient (46-48)
  • EmbeddingResponse (8-10)
packages/shared/customFetch.ts (1)
  • customFetch (24-24)
packages/shared/inference/anthropic.test.ts (1)
packages/shared/inference/anthropic.ts (2)
  • AnthropicInferenceClient (135-247)
  • supportsStructuredOutputs (67-71)
packages/shared/inference/live-test.ts (2)
packages/shared/inference/anthropic.ts (1)
  • supportsStructuredOutputs (67-71)
packages/shared/inference/factory.ts (1)
  • InferenceClientFactory (17-41)
packages/shared/inference/ollama.test.ts (1)
packages/shared/inference/ollama.ts (2)
  • OllamaInferenceClient (21-133)
  • OllamaEmbeddingClient (140-162)
packages/shared/inference/types.ts (2)
packages/shared/inference/index.ts (5)
  • InferenceResponse (21-21)
  • EmbeddingResponse (22-22)
  • InferenceOptions (23-23)
  • InferenceClient (19-19)
  • EmbeddingClient (20-20)
tools/compare-models/src/inferenceClient.ts (3)
  • InferenceResponse (9-12)
  • InferenceOptions (5-7)
  • InferenceClient (14-131)
🪛 markdownlint-cli2 (0.18.1)
docs/docs/03-configuration/02-different-ai-providers.md

9-9: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


20-20: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


60-60: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


74-74: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


93-93: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


103-103: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


123-123: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


135-135: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


176-176: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


192-192: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


210-210: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


225-225: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


247-247: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


265-265: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/docs/03-configuration/01-environment-variables.md

96-96: Link text should be descriptive

(MD059, descriptive-link-text)

🔇 Additional comments (31)
packages/open-api/package.json (1)

10-10: LGTM! Zod version bump aligns with multi-provider architecture.

The upgrade from ^3.24.2 to ^3.25.0 is consistent with the PR's broader zod dependency update for Anthropic SDK compatibility. The existing @asteasolutions/zod-to-openapi@^7.2.0 dependency is compatible with this zod version.

apps/workers/package.json (1)

59-59: LGTM! Consistent zod version update.

The upgrade to ^3.25.0 maintains consistency with the repository-wide zod dependency update for Anthropic SDK compatibility.

apps/web/lib/i18n/locales/el/translation.json (1)

196-203: LGTM! AI provider translations added correctly.

The Greek translations for the new AI provider status and model configuration keys are well-structured and consistent with the multi-locale expansion across the repository. The past duplicate "summarization" key issue flagged in previous reviews has been resolved—the current code is valid JSON.

apps/web/lib/i18n/locales/zhtw/translation.json (1)

193-200: LGTM! AI provider translations added correctly.

The Traditional Chinese translations for the new AI provider status and model configuration keys are properly formatted and consistent with the multi-locale expansion. The past duplicate "summarization_prompt" key issue flagged in previous reviews has been resolved—the current code is valid JSON.

apps/web/lib/i18n/locales/es/translation.json (1)

139-146: LGTM! AI provider translations added correctly.

The Spanish translations for the new AI provider status and model configuration keys are well-structured and consistent with the multi-locale expansion across 31 locales mentioned in the PR objectives.

apps/web/lib/i18n/locales/ko/translation.json (1)

395-402: LGTM! Clean i18n expansion for AI provider status.

The Korean translations for the new AI provider status keys are well-structured and properly integrated into the existing translation file.

apps/web/lib/i18n/locales/pt/translation.json (1)

217-224: LGTM! AI provider status keys added correctly.

The Portuguese translations for the new AI provider configuration keys are properly formatted and integrate well with the existing translation structure.

apps/web/lib/i18n/locales/fa/translation.json (1)

247-254: LGTM! Farsi translations properly added.

The AI provider status translations are correctly formatted and consistent with the other locale files in this PR.

packages/shared/inference/types.ts (3)

1-10: LGTM! Clean response type definitions.

The InferenceResponse and EmbeddingResponse interfaces are well-defined and appropriate for the multi-provider architecture.


12-27: Well-structured options interface.

The InferenceOptions interface provides good flexibility with the schema field for structured outputs and proper support for request cancellation via abortSignal. The use of // eslint-disable-next-line for the generic schema type is appropriate here.

Note: See separate comment regarding reasoningEffort values.


29-48: LGTM! Clean and well-designed client interfaces.

The defaultInferenceOptions provides a sensible default, and the InferenceClient and EmbeddingClient interfaces establish a clear contract for provider implementations. The use of Partial<InferenceOptions> in the client methods provides good flexibility.

docs/docs/03-configuration/02-different-ai-providers.md (1)

70-88: Documentation is accurate regarding reasoning_effort values and model support.

The documentation correctly reflects the current OpenAI Responses API specification. The reasoning_effort values including "none", "minimal", and "xhigh" are indeed valid for certain GPT-5 models, as documented by OpenAI. The documentation appropriately describes model-dependent behavior where o-series models support the standard "low", "medium", "high" range, while newer GPT-5 variants expose additional options for reasoning control.

packages/shared/inference/openai.test.ts (2)

210-231: Consistent test isolation pattern.

Good use of beforeEach/afterEach to modify and restore config values, preventing test pollution between suites. This pattern is consistently applied across the Responses API, JSON output, and plain output test blocks.


1-398: Comprehensive test coverage for OpenAI clients.

The test suite thoroughly covers:

  • Chat Completions and Responses API paths
  • Structured, JSON, and plain output modes
  • Image inference with proper message formatting
  • Token counting and error handling
  • Abort signal propagation
  • Embedding generation

The mocking strategy correctly mirrors the OpenAI SDK structure.

apps/web/lib/i18n/locales/pt_BR/translation.json (1)

187-194: Localization strings for AI provider status added correctly.

The new Brazilian Portuguese translations for provider status UI are complete and match the keys used in AISettings.tsx. The translations appear grammatically correct.

apps/web/components/settings/AISettings.tsx (1)

117-171: Well-structured ProviderIndicator component.

The component:

  • Handles the no-provider state with a clear warning message
  • Uses consistent styling with the existing SettingsSection pattern
  • Properly maps provider identifiers to display names with fallback
  • Conditionally renders embeddings only when both provider and model are available

The responsive grid layout (sm:grid-cols-2 lg:grid-cols-4) ensures good display across screen sizes.

apps/web/lib/i18n/locales/vi/translation.json (1)

131-138: Vietnamese translations for AI provider status added correctly.

The new keys align with the UI component and other locale files. The translations appear appropriate for Vietnamese.

packages/shared/inference/anthropic.ts (1)

135-247: Clean implementation of the Anthropic inference client.

The implementation:

  • Properly validates model capabilities before making requests
  • Conditionally applies beta headers only when structured outputs are needed
  • Correctly handles both text and image inference
  • Properly propagates abort signals
  • Calculates total tokens from input + output

The separation of concerns (schema conversion, model validation, media type validation, response extraction) makes the code maintainable.

packages/shared/inference/ollama.test.ts (1)

66-207: Thorough test coverage for OllamaInferenceClient.

The test suite comprehensively covers:

  • Streaming response accumulation
  • Token counting across multiple chunks
  • Model and configuration parameter passing
  • JSON schema format when schema is provided
  • Image inference with base64 data
  • keep_alive setting propagation

The mock streaming generator (mockChatStream) is a clean approach for simulating Ollama's streaming API.

packages/shared/inference/anthropic.test.ts (3)

264-307: Good test isolation pattern with afterEach cleanup.

Unlike the Ollama tests, this test block correctly restores serverConfig.inference.outputSchema to "structured" in afterEach, preventing test pollution. This pattern should be applied consistently across all test files that modify shared config.


309-343: Comprehensive tests for supportsStructuredOutputs utility.

The tests cover:

  • All documented model variants (Sonnet 4.5, Haiku 4.5, Opus 4.5, Opus 4.1)
  • Older unsupported models (Claude 3.x, Claude 2.x)
  • Edge cases (empty string, non-Claude models, partial matches)

This provides confidence in the model validation logic.


1-261: Well-structured test suite for Anthropic inference client.

The tests thoroughly cover:

  • API call construction and parameter passing
  • Structured outputs beta header inclusion/exclusion
  • Text and image inference paths
  • Token accounting
  • Abort signal propagation
  • Different image media types
  • Error handling for missing text content

The mock structure correctly mirrors the Anthropic SDK's beta API shape.

packages/shared/inference/google.test.ts (1)

1-316: LGTM! Comprehensive test coverage.

The test suite is well-structured and covers all critical paths for both GoogleGeminiInferenceClient and GoogleEmbeddingClient:

  • Text and image inference with various configurations
  • JSON schema validation and plain text output modes
  • Token counting with and without usage metadata
  • Error handling for missing text content
  • Abort signal propagation
  • Batch embedding generation

The dynamic config modification for plain output tests (lines 236-250) is handled correctly with proper cleanup in afterEach.

packages/shared/inference/live-test.ts (1)

1-593: LGTM! Excellent integration test coverage.

This live test script provides comprehensive coverage across all providers:

  • Direct SDK tests for OpenAI, Anthropic, and Google
  • Factory-based InferenceClient tests matching actual app usage
  • Realistic prompts from the actual tagging implementation
  • Structured output validation with schema parsing
  • Image inference and embeddings testing
  • Proper conditional skipping when API keys are missing

The use of void $schema; (line 271) to avoid unused variable warnings is an acceptable pattern.

packages/shared/inference/google.ts (1)

1-202: LGTM! Clean implementation of Google Gemini integration.

The implementation is well-structured and correct:

  • Proper API key validation in createGoogleClient
  • Correct handling of structured vs plain output modes
  • Appropriate token counting from usageMetadata
  • Proper error handling for missing responses
  • Abort signal propagation to the underlying API calls

The embedding client correctly handles batch sizes and accumulates results. The sequential batch processing (lines 174-186) is acceptable, though it could be optimized with Promise.all() for parallel processing if needed in the future.

packages/shared/inference/openai.ts (3)

333-348: Excellent validation for o-series reasoning constraints.

The defensive check preventing o-series models from using "none" reasoning effort is well-implemented and prevents a common API error. The error message is clear and actionable.


412-431: LGTM: Clean embedding client implementation.

The embedding client correctly implements the interface and properly maps the OpenAI API response to the expected 2D array format.


143-158: No action needed. The InferenceOptions interface is complete and properly defines all properties referenced in the code (abortSignal, store, reasoningEffort, previousResponseId). The original concern was based on incomplete context provided during review; the actual interface in packages/shared/inference/types.ts fully defines all these properties with appropriate types. No type safety issue exists.

Likely an incorrect or invalid review comment.

packages/shared/config.ts (3)

349-368: LGTM: Correct handling of embedding provider selection.

The embedding provider logic correctly excludes Anthropic (which lacks an embeddings API) and provides sensible fallback behavior when EMBEDDING_PROVIDER is not explicitly set. The auto-detection aligns well with the inference provider selection while respecting provider capabilities.


538-549: LGTM: Appropriate client-side config exposure.

The expanded client config correctly exposes provider information for read-only UI display (e.g., ProviderIndicator component). The fields align with the multi-provider architecture and support the UI requirements mentioned in the PR objectives.


87-88: Remove the recommendation to revert to gpt-4o-mini.

GPT-5 models were publicly released on August 7, 2025, and are available via the OpenAI API, making "gpt-5-mini" a valid default choice. The concern about unavailability is no longer applicable.

@camjac251 camjac251 force-pushed the feat/multi-provider-inference branch 2 times, most recently from 47530b0 to b1738bd Compare January 3, 2026 21:12
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
packages/shared/inference/ollama.ts (1)

84-95: LGTM! Error handling correctly uses undefined for unknown token counts.

The error handling properly sets totalTokens = undefined when an exception occurs, which aligns with the InferenceResponse.totalTokens: number | undefined type. The workaround for the Ollama streaming bug is well-documented with the GitHub issue reference.

docs/docs/03-configuration/01-environment-variables.md (1)

96-96: Use descriptive link text instead of "here".

The link text "here" is not descriptive and harms accessibility and SEO. Consider using descriptive text that indicates the destination.

🔎 Proposed fix
-| OPENAI_API_KEY                       | No       | Not set                | The OpenAI API key for inference and embeddings. More on that in [here](../integrations/openai).
+| OPENAI_API_KEY                       | No       | Not set                | The OpenAI API key for inference and embeddings. See [OpenAI integration guide](../integrations/openai) for details.
🧹 Nitpick comments (4)
packages/shared/inference/live-test.ts (1)

186-210: Consider testing behavior instead of source code inspection.

This test reads the source file to verify string patterns exist. This is fragile—if the code is refactored (e.g., patterns moved to constants or the logic restructured), the test breaks even if behavior is unchanged. Consider testing the actual shouldUseResponsesApi function behavior with various model names instead.

🔎 Suggested alternative approach
// Test the exported function behavior instead of source inspection
await runTest("openai", "responses-model-check", async () => {
  const { shouldUseResponsesApi } = await import("./openai.js");
  
  // Verify GPT-5 models use Responses API (when enabled)
  // Note: This would need the config to enable openaiUseResponsesApi
  const modelsToCheck = ["gpt-5", "gpt-5.2", "o1-mini", "o3-mini", "o4-mini"];
  
  for (const model of modelsToCheck) {
    // Test that these models are recognized for Responses API routing
    // (actual behavior depends on config.openaiUseResponsesApi)
  }
  
  return "Model detection behavior verified";
});
packages/shared/inference/openai.ts (2)

136-141: Consider making openAI field private.

The openAI field is only used internally. Making it private would improve encapsulation and signal that it's not part of the public API.

🔎 Suggested change
 export class OpenAIInferenceClient implements InferenceClient {
-  openAI: OpenAI;
+  private openAI: OpenAI;

   constructor() {
     this.openAI = createOpenAIClient();
   }

412-431: Consider making openAI field private.

Same suggestion as for OpenAIInferenceClient - the field is only used internally.

🔎 Suggested change
 export class OpenAIEmbeddingClient implements EmbeddingClient {
-  openAI: OpenAI;
+  private openAI: OpenAI;

   constructor() {
     this.openAI = createOpenAIClient();
   }
tools/compare-models/src/inferenceClient.ts (1)

107-133: Consider preserving error context in fallback parsing logic.

The multi-strategy JSON extraction is robust and handles various response formats well. However, the empty catch blocks at lines 112, 119, and 128 discard potentially useful error information that could aid debugging when all strategies fail.

🔎 Optional enhancement to preserve error context
 private parseJsonFromResponse(response: string): unknown {
   const trimmedResponse = response.trim();
+  const errors: string[] = [];

   try {
     return JSON.parse(trimmedResponse);
-  } catch {
+  } catch (e) {
+    errors.push(`Direct parse: ${e instanceof Error ? e.message : String(e)}`);
     const jsonBlockRegex = /```(?:json)?\s*(\{[\s\S]*?\})\s*```/i;
     const match = trimmedResponse.match(jsonBlockRegex);

     if (match) {
       try {
         return JSON.parse(match[1]);
-      } catch {}
+      } catch (e) {
+        errors.push(`Markdown block: ${e instanceof Error ? e.message : String(e)}`);
+      }
     }

     const jsonObjectRegex = /\{[\s\S]*\}/;
     const objectMatch = trimmedResponse.match(jsonObjectRegex);

     if (objectMatch) {
       try {
         return JSON.parse(objectMatch[0]);
-      } catch {}
+      } catch (e) {
+        errors.push(`Object extraction: ${e instanceof Error ? e.message : String(e)}`);
+      }
     }

-    return JSON.parse(trimmedResponse);
+    try {
+      return JSON.parse(trimmedResponse);
+    } catch (e) {
+      throw new Error(
+        `Failed to parse JSON from response. Attempted strategies: ${errors.join("; ")}. Final error: ${e instanceof Error ? e.message : String(e)}`
+      );
+    }
   }
 }
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c0206dc and 47530b0.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (68)
  • apps/browser-extension/package.json
  • apps/mcp/package.json
  • apps/mobile/package.json
  • apps/web/components/settings/AISettings.tsx
  • apps/web/lib/clientConfig.tsx
  • apps/web/lib/i18n/locales/ar/translation.json
  • apps/web/lib/i18n/locales/cs/translation.json
  • apps/web/lib/i18n/locales/da/translation.json
  • apps/web/lib/i18n/locales/de/translation.json
  • apps/web/lib/i18n/locales/el/translation.json
  • apps/web/lib/i18n/locales/en/translation.json
  • apps/web/lib/i18n/locales/en_US/translation.json
  • apps/web/lib/i18n/locales/es/translation.json
  • apps/web/lib/i18n/locales/fa/translation.json
  • apps/web/lib/i18n/locales/fi/translation.json
  • apps/web/lib/i18n/locales/fr/translation.json
  • apps/web/lib/i18n/locales/ga/translation.json
  • apps/web/lib/i18n/locales/gl/translation.json
  • apps/web/lib/i18n/locales/hr/translation.json
  • apps/web/lib/i18n/locales/hu/translation.json
  • apps/web/lib/i18n/locales/it/translation.json
  • apps/web/lib/i18n/locales/ja/translation.json
  • apps/web/lib/i18n/locales/ko/translation.json
  • apps/web/lib/i18n/locales/nb_NO/translation.json
  • apps/web/lib/i18n/locales/nl/translation.json
  • apps/web/lib/i18n/locales/pl/translation.json
  • apps/web/lib/i18n/locales/pt/translation.json
  • apps/web/lib/i18n/locales/pt_BR/translation.json
  • apps/web/lib/i18n/locales/ru/translation.json
  • apps/web/lib/i18n/locales/sk/translation.json
  • apps/web/lib/i18n/locales/sl/translation.json
  • apps/web/lib/i18n/locales/sv/translation.json
  • apps/web/lib/i18n/locales/tr/translation.json
  • apps/web/lib/i18n/locales/uk/translation.json
  • apps/web/lib/i18n/locales/vi/translation.json
  • apps/web/lib/i18n/locales/zh/translation.json
  • apps/web/lib/i18n/locales/zhtw/translation.json
  • apps/web/package.json
  • apps/workers/package.json
  • docs/docs/03-configuration/01-environment-variables.md
  • docs/docs/03-configuration/02-different-ai-providers.md
  • package.json
  • packages/api/package.json
  • packages/benchmarks/package.json
  • packages/e2e_tests/package.json
  • packages/open-api/package.json
  • packages/shared/config.ts
  • packages/shared/index.ts
  • packages/shared/inference.ts
  • packages/shared/inference/anthropic.test.ts
  • packages/shared/inference/anthropic.ts
  • packages/shared/inference/factory.test.ts
  • packages/shared/inference/factory.ts
  • packages/shared/inference/google.test.ts
  • packages/shared/inference/google.ts
  • packages/shared/inference/index.ts
  • packages/shared/inference/live-test.ts
  • packages/shared/inference/ollama.test.ts
  • packages/shared/inference/ollama.ts
  • packages/shared/inference/openai.test.ts
  • packages/shared/inference/openai.ts
  • packages/shared/inference/types.ts
  • packages/shared/package.json
  • packages/trpc/package.json
  • tools/compare-models/package.json
  • tools/compare-models/src/bookmarkProcessor.ts
  • tools/compare-models/src/config.ts
  • tools/compare-models/src/inferenceClient.ts
💤 Files with no reviewable changes (2)
  • tools/compare-models/src/config.ts
  • packages/shared/inference.ts
🚧 Files skipped from review as they are similar to previous changes (36)
  • apps/web/lib/i18n/locales/fi/translation.json
  • apps/web/lib/i18n/locales/en/translation.json
  • packages/open-api/package.json
  • apps/web/lib/i18n/locales/uk/translation.json
  • apps/web/lib/i18n/locales/zh/translation.json
  • apps/web/lib/i18n/locales/sv/translation.json
  • packages/shared/inference/factory.test.ts
  • apps/web/lib/i18n/locales/ru/translation.json
  • apps/mobile/package.json
  • apps/web/lib/i18n/locales/ja/translation.json
  • packages/shared/inference/anthropic.ts
  • packages/shared/package.json
  • apps/web/lib/i18n/locales/pt/translation.json
  • apps/web/components/settings/AISettings.tsx
  • apps/web/lib/i18n/locales/da/translation.json
  • apps/web/lib/i18n/locales/nl/translation.json
  • apps/web/lib/i18n/locales/ar/translation.json
  • apps/web/lib/i18n/locales/tr/translation.json
  • packages/shared/inference/types.ts
  • packages/shared/inference/ollama.test.ts
  • apps/web/lib/i18n/locales/es/translation.json
  • packages/api/package.json
  • apps/web/lib/i18n/locales/pt_BR/translation.json
  • apps/web/lib/i18n/locales/de/translation.json
  • apps/web/lib/i18n/locales/vi/translation.json
  • apps/workers/package.json
  • apps/web/lib/i18n/locales/hr/translation.json
  • apps/web/lib/i18n/locales/hu/translation.json
  • apps/web/lib/i18n/locales/fa/translation.json
  • packages/shared/inference/anthropic.test.ts
  • package.json
  • apps/web/lib/i18n/locales/zhtw/translation.json
  • tools/compare-models/package.json
  • apps/browser-extension/package.json
  • apps/web/lib/i18n/locales/gl/translation.json
  • apps/web/lib/i18n/locales/en_US/translation.json
🧰 Additional context used
📓 Path-based instructions (6)
**/*.{ts,tsx,js,jsx,json,css,md}

📄 CodeRabbit inference engine (AGENTS.md)

Format code using Prettier according to project standards

Files:

  • apps/mcp/package.json
  • apps/web/package.json
  • packages/shared/inference/factory.ts
  • apps/web/lib/i18n/locales/it/translation.json
  • packages/shared/inference/ollama.ts
  • packages/shared/inference/openai.test.ts
  • packages/trpc/package.json
  • packages/shared/index.ts
  • apps/web/lib/i18n/locales/fr/translation.json
  • apps/web/lib/i18n/locales/ko/translation.json
  • packages/shared/inference/google.ts
  • docs/docs/03-configuration/01-environment-variables.md
  • apps/web/lib/i18n/locales/el/translation.json
  • packages/shared/inference/openai.ts
  • packages/shared/inference/google.test.ts
  • docs/docs/03-configuration/02-different-ai-providers.md
  • apps/web/lib/i18n/locales/sl/translation.json
  • apps/web/lib/clientConfig.tsx
  • apps/web/lib/i18n/locales/pl/translation.json
  • apps/web/lib/i18n/locales/nb_NO/translation.json
  • packages/shared/config.ts
  • packages/shared/inference/live-test.ts
  • packages/benchmarks/package.json
  • packages/e2e_tests/package.json
  • apps/web/lib/i18n/locales/ga/translation.json
  • tools/compare-models/src/inferenceClient.ts
  • apps/web/lib/i18n/locales/cs/translation.json
  • packages/shared/inference/index.ts
  • apps/web/lib/i18n/locales/sk/translation.json
  • tools/compare-models/src/bookmarkProcessor.ts
**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use TypeScript for type safety in all source files

Files:

  • packages/shared/inference/factory.ts
  • packages/shared/inference/ollama.ts
  • packages/shared/inference/openai.test.ts
  • packages/shared/index.ts
  • packages/shared/inference/google.ts
  • packages/shared/inference/openai.ts
  • packages/shared/inference/google.test.ts
  • apps/web/lib/clientConfig.tsx
  • packages/shared/config.ts
  • packages/shared/inference/live-test.ts
  • tools/compare-models/src/inferenceClient.ts
  • packages/shared/inference/index.ts
  • tools/compare-models/src/bookmarkProcessor.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

Lint code using oxlint and fix issues with pnpm lint:fix

Files:

  • packages/shared/inference/factory.ts
  • packages/shared/inference/ollama.ts
  • packages/shared/inference/openai.test.ts
  • packages/shared/index.ts
  • packages/shared/inference/google.ts
  • packages/shared/inference/openai.ts
  • packages/shared/inference/google.test.ts
  • apps/web/lib/clientConfig.tsx
  • packages/shared/config.ts
  • packages/shared/inference/live-test.ts
  • tools/compare-models/src/inferenceClient.ts
  • packages/shared/inference/index.ts
  • tools/compare-models/src/bookmarkProcessor.ts
packages/shared/**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Organize shared code and types in the packages/shared directory for use across packages

Files:

  • packages/shared/inference/factory.ts
  • packages/shared/inference/ollama.ts
  • packages/shared/inference/openai.test.ts
  • packages/shared/index.ts
  • packages/shared/inference/google.ts
  • packages/shared/inference/openai.ts
  • packages/shared/inference/google.test.ts
  • packages/shared/config.ts
  • packages/shared/inference/live-test.ts
  • packages/shared/inference/index.ts
**/*.{test,spec}.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use Vitest for writing and running tests

Files:

  • packages/shared/inference/openai.test.ts
  • packages/shared/inference/google.test.ts
apps/web/**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

apps/web/**/*.{ts,tsx}: Use Tailwind CSS for styling in the web application
Use Next.js for building the main web application

Files:

  • apps/web/lib/clientConfig.tsx
🧠 Learnings (5)
📚 Learning: 2026-01-03T11:36:34.916Z
Learnt from: RobertRosca
Repo: karakeep-app/karakeep PR: 2339
File: packages/shared/config.ts:62-62
Timestamp: 2026-01-03T11:36:34.916Z
Learning: In packages/shared/config.ts, enforce OpenAI SDK version compatibility: service_tier values are limited to ["auto", "default", "flex"]. The "priority" tier requires OpenAI SDK >= v5.8.0. Add a guard or validation in config to prevent using priority tier unless the SDK is upgraded (v5.8.0+). Consider documenting this constraint and adding a unit test or lint rule to ensure only allowed service_tier values are used based on the installed SDK version.

Applied to files:

  • packages/shared/inference/factory.ts
  • packages/shared/inference/ollama.ts
  • packages/shared/inference/openai.test.ts
  • packages/shared/index.ts
  • packages/shared/inference/google.ts
  • packages/shared/inference/openai.ts
  • packages/shared/inference/google.test.ts
  • packages/shared/config.ts
  • packages/shared/inference/live-test.ts
  • packages/shared/inference/index.ts
📚 Learning: 2025-11-27T16:02:47.440Z
Learnt from: CR
Repo: karakeep-app/karakeep PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-27T16:02:47.440Z
Learning: Applies to packages/shared/**/*.{ts,tsx} : Organize shared code and types in the `packages/shared` directory for use across packages

Applied to files:

  • packages/shared/index.ts
📚 Learning: 2025-11-27T16:02:47.440Z
Learnt from: CR
Repo: karakeep-app/karakeep PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-27T16:02:47.440Z
Learning: Applies to packages/shared-server/**/*.{ts,tsx} : Place shared server-side logic in `packages/shared-server`

Applied to files:

  • packages/shared/index.ts
📚 Learning: 2025-11-27T16:02:47.440Z
Learnt from: CR
Repo: karakeep-app/karakeep PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-27T16:02:47.440Z
Learning: Applies to **/*.{test,spec}.{ts,tsx,js,jsx} : Use Vitest for writing and running tests

Applied to files:

  • packages/shared/inference/google.test.ts
📚 Learning: 2025-11-27T16:02:47.440Z
Learnt from: CR
Repo: karakeep-app/karakeep PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-27T16:02:47.440Z
Learning: Applies to packages/e2e_tests/**/*.{ts,tsx,test,spec} : Write end-to-end tests in the `packages/e2e_tests` directory

Applied to files:

  • packages/shared/inference/google.test.ts
  • packages/shared/inference/live-test.ts
🧬 Code graph analysis (7)
packages/shared/inference/factory.ts (2)
packages/shared/inference/types.ts (2)
  • InferenceClient (33-44)
  • EmbeddingClient (46-48)
packages/shared/inference/openai.ts (1)
  • OpenAIEmbeddingClient (412-432)
packages/shared/inference/ollama.ts (3)
packages/shared/inference/types.ts (6)
  • InferenceClient (33-44)
  • InferenceOptions (12-27)
  • InferenceResponse (3-6)
  • defaultInferenceOptions (29-31)
  • EmbeddingClient (46-48)
  • EmbeddingResponse (8-10)
tools/compare-models/src/inferenceClient.ts (3)
  • InferenceClient (14-134)
  • InferenceOptions (5-7)
  • InferenceResponse (9-12)
packages/shared/customFetch.ts (1)
  • customFetch (24-24)
packages/shared/inference/openai.test.ts (1)
packages/shared/inference/openai.ts (2)
  • OpenAIInferenceClient (136-405)
  • OpenAIEmbeddingClient (412-432)
packages/shared/inference/google.ts (1)
packages/shared/inference/types.ts (5)
  • InferenceOptions (12-27)
  • InferenceClient (33-44)
  • InferenceResponse (3-6)
  • EmbeddingClient (46-48)
  • EmbeddingResponse (8-10)
packages/shared/inference/google.test.ts (1)
packages/shared/inference/google.ts (2)
  • GoogleGeminiInferenceClient (73-153)
  • GoogleEmbeddingClient (162-199)
packages/shared/inference/live-test.ts (3)
packages/shared/inference/anthropic.ts (1)
  • supportsStructuredOutputs (67-71)
packages/shared/inference/factory.ts (1)
  • InferenceClientFactory (17-41)
packages/shared/inference/index.ts (1)
  • InferenceClientFactory (29-29)
tools/compare-models/src/inferenceClient.ts (2)
packages/shared/inference/types.ts (3)
  • InferenceOptions (12-27)
  • InferenceResponse (3-6)
  • InferenceClient (33-44)
tools/compare-models/src/config.ts (1)
  • config (29-29)
🪛 markdownlint-cli2 (0.18.1)
docs/docs/03-configuration/02-different-ai-providers.md

96-96: Link text should be descriptive

(MD059, descriptive-link-text)


148-148: Bare URL used

(MD034, no-bare-urls)


190-190: Link text should be descriptive

(MD059, descriptive-link-text)


205-205: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (41)
apps/web/lib/i18n/locales/sk/translation.json (1)

287-294: LGTM! Previous duplicate issue resolved, and new translations are appropriate.

The duplicate summarization_prompt key flagged in the previous review has been successfully removed. The new AI provider-related translation keys are properly formatted, and the Slovak translations are accurate and consistent with the multi-provider AI architecture additions described in the PR objectives.

apps/web/lib/i18n/locales/pl/translation.json (1)

201-208: LGTM! Polish translations for AI provider status are well-structured.

The new translation keys for the provider status UI are properly formatted with correct JSON syntax, appropriate Polish translations, and consistent formatting. The addition of the trailing comma on line 201 and the structure of the new entries (lines 202-208) follow proper JSON conventions.

The translations are contextually appropriate for AI settings and maintain consistency with the existing translation patterns in the file.

packages/e2e_tests/package.json (1)

24-24: LGTM: Zod dependency bump aligns with PR objectives.

The version bump from ^3.24.2 to ^3.25.0 is consistent with the repository-wide upgrade for Anthropic SDK compatibility, and test results confirm no breaking changes.

apps/web/package.json (1)

104-104: LGTM: Consistent dependency upgrade.

The zod version bump is consistent with the monorepo-wide update and supports the new multi-provider inference architecture.

packages/trpc/package.json (1)

29-29: LGTM: Dependency version aligned across monorepo.

The zod upgrade is part of the coordinated version bump across all packages for Anthropic SDK compatibility.

apps/mcp/package.json (1)

44-44: LGTM: Version bump supports new AI architecture.

The zod upgrade is consistent with the multi-provider inference changes and passed all validation checks.

packages/benchmarks/package.json (1)

22-22: LGTM: Consistent monorepo-wide dependency update.

The zod version bump completes the coordinated upgrade across all packages. Note that zod 3.25.0 introduced a significant architectural change (Zod 4 alongside Zod 3 via subpath exports), which has caused TypeScript compatibility issues in some projects. However, since the PR tests passed, this package appears unaffected by that change.

packages/shared/index.ts (1)

1-3: LGTM! Valid module boundary pattern.

The empty export with documentation comments correctly establishes this as a module boundary file, directing users to import from specific subpaths (e.g., @karakeep/shared/config). This is a standard pattern for packages with subpath exports.

apps/web/lib/i18n/locales/nb_NO/translation.json (1)

315-322: LGTM! Norwegian translation keys added correctly.

The new AI provider and model translation keys are properly formatted and integrate cleanly into the existing ai settings block. The JSON structure is valid with correct comma placement.

Note: The past review comment about a duplicate images_prompt key appears to have been resolved in this version.

apps/web/lib/i18n/locales/el/translation.json (1)

196-203: LGTM! Greek translation keys added correctly.

The new AI provider and model translation keys are properly integrated into the ai settings block with valid JSON structure. All keys use appropriate Greek translations.

Note: The past review comment about a duplicate summarization key appears to have been resolved. The current diff shows auto_summarization (line 196) and summarization (line 182) as distinct keys.

apps/web/lib/i18n/locales/ga/translation.json (1)

119-126: LGTM! Irish translation keys added correctly.

The new AI provider and model translation keys integrate properly into the ai settings block with valid JSON formatting. Irish translations are correctly provided for all new keys.

Note: The past review comment about a duplicate summarization key appears to have been resolved. The current diff shows auto_summarization (line 119) and summarization (line 105) as distinct keys.

apps/web/lib/i18n/locales/ko/translation.json (1)

395-402: LGTM! Korean translation keys added correctly.

The new AI provider and model translation keys are properly formatted and consistent with the other locale files in this PR. The JSON structure is valid with appropriate Korean translations for all new keys.

tools/compare-models/src/bookmarkProcessor.ts (1)

3-3: LGTM! Clean refactoring for multi-provider architecture.

The import change correctly shifts InferenceClient to a type-only import from the local module while preserving the inferTags runtime function. This aligns with the PR's goal of modularizing provider-specific clients.

apps/web/lib/i18n/locales/fr/translation.json (1)

193-200: LGTM! French translations correctly added.

The new provider status and model configuration keys are properly formatted and grammatically correct. These support the new ProviderIndicator UI component for displaying active AI providers.

apps/web/lib/i18n/locales/sl/translation.json (1)

120-127: LGTM! Slovenian translations correctly added, and past duplicate key issue resolved.

The new provider configuration keys are properly formatted. The previous review concern about a duplicate tagging_rule_description key at line 121 appears to have been addressed—line 121 now correctly contains provider_status.

apps/web/lib/clientConfig.tsx (1)

19-23: LGTM! Client config correctly extended for multi-provider support.

The new inference fields properly separate inference and embedding providers (critical since Anthropic lacks an embeddings API). Default values are appropriate: null for optional providers and empty strings for model names.

apps/web/lib/i18n/locales/it/translation.json (1)

193-200: LGTM! Italian translations correctly added.

The new provider status and model configuration keys are properly formatted and grammatically correct. These translations complete the i18n support for the multi-provider AI configuration UI.

apps/web/lib/i18n/locales/cs/translation.json (1)

119-127: LGTM!

The new Czech translation keys for AI provider status are well-structured and align with the multi-provider UI additions. The translations appear linguistically appropriate.

packages/shared/inference/openai.test.ts (4)

1-47: LGTM!

Well-structured test setup with comprehensive mocking of the OpenAI SDK and server config. The mock structure correctly mirrors the SDK's API surface (chat.completions, responses, embeddings).


57-207: Good test coverage for Chat Completions API paths.

Tests comprehensively validate:

  • Default API routing
  • Prompt and model configuration
  • Structured output with Zod schema
  • Error handling for missing content
  • Abort signal propagation
  • Image inference with proper content structure

210-261: Good Responses API test coverage with proper config isolation.

The beforeEach/afterEach pattern correctly mutates and restores config values to test GPT-5 model routing. The reasoning effort test validates the expected request structure.


365-397: LGTM!

Embedding client tests properly validate model configuration usage and response mapping from the SDK format to the expected 2D array structure.

packages/shared/inference/google.test.ts (3)

1-42: LGTM!

Well-structured mock setup for the Google Gen AI SDK. The Type enum mock correctly provides the constants needed for schema generation.


215-229: Good defensive test case.

Testing graceful handling of missing usageMetadata ensures the client won't fail when the API response lacks token count information.


270-315: LGTM!

Embedding tests properly validate:

  • Batch embedding generation
  • Single batch request composition
  • Empty embeddings array handling
packages/shared/inference/factory.ts (2)

17-41: LGTM!

Clean factory implementation with proper exhaustiveness checking. The never type pattern ensures compile-time errors if a new provider is added to the config type but not handled in the switch.


54-76: LGTM!

Good documentation noting that Anthropic doesn't provide embeddings. The factory correctly excludes Anthropic from the embedding provider options.

packages/shared/inference/index.ts (1)

1-35: LGTM!

Well-organized barrel file with clear documentation. The structure appropriately separates:

  • Types (for consumers who need type information)
  • Factories (recommended entry point)
  • Individual clients (for advanced usage/testing)
packages/shared/inference/live-test.ts (2)

72-115: LGTM!

Good test result tracking with status, duration, response preview, and error capture. The runTest helper provides consistent timing and logging across all provider tests.


471-553: Good integration test pattern.

Testing the actual InferenceClientFactory.build() path validates the full code path the app uses. The three test cases (plain text, tagging with schema, image tagging) cover the primary use cases.

packages/shared/inference/openai.ts (4)

17-39: LGTM!

Good defensive validation of image types with a clear error message listing supported formats.


65-98: LGTM!

Good model detection helpers with clear separation of concerns:

  • requiresMaxCompletionTokens: GPT-5 and o-series
  • isOSeriesModel: Only reasoning models (o1, o3, o4)
  • getTokenLimitParam: Returns appropriate parameter name

340-345: Good validation for o-series model constraints.

Clear error message when attempting to use unsupported "none" reasoning effort with o-series models.


389-398: Good defensive JSON parsing.

The try-catch fallback ensures the client doesn't fail if the model returns unexpected output format.

docs/docs/03-configuration/01-environment-variables.md (1)

95-110: LGTM! Multi-provider configuration is well-documented.

The documentation clearly explains:

  • Provider selection via INFERENCE_PROVIDER with auto-detection fallback
  • Provider-specific API keys and base URLs for OpenAI, Anthropic, Google Gemini, and Ollama
  • OpenAI-specific options for Responses API and reasoning effort
  • Separate embedding provider configuration with appropriate defaults

The descriptions align well with the implementation in packages/shared/config.ts.

docs/docs/03-configuration/02-different-ai-providers.md (1)

1-292: LGTM! Comprehensive multi-provider documentation.

This documentation provides excellent coverage of all four supported AI providers:

  • Clear provider selection guidance with auto-detection fallback
  • Detailed model families, pricing, and recommendations for each provider
  • Important warnings (e.g., Anthropic's lack of embeddings, Azure deployment names)
  • Structured output compatibility notes
  • Comprehensive embedding configuration section
  • Helpful reference tables for quick lookup

The documentation will significantly improve user experience when configuring multi-provider AI inference.

packages/shared/inference/google.ts (1)

1-202: LGTM! Google Gemini integration is well-implemented.

The implementation correctly:

  • Validates API key presence with clear error messages
  • Handles three output schema modes (plain, json, structured) with appropriate MIME types
  • Converts Zod schemas to JSON schema using $refStrategy: "none" for structured outputs
  • Calculates token usage from usageMetadata (promptTokenCount + candidatesTokenCount)
  • Implements proper batch processing for embeddings with a 100-item limit matching Google's API constraints
  • Follows the Google GenAI SDK patterns for both text and multimodal (image + text) inference

The code is clean, well-documented, and consistent with the other provider implementations in this PR.

packages/shared/config.ts (2)

59-83: LGTM! Multi-provider environment variable schema is well-structured.

The schema additions are comprehensive:

  • Explicit INFERENCE_PROVIDER selection with four supported providers
  • Provider-specific configuration blocks (OpenAI, Anthropic, Google Gemini, Ollama)
  • Separate EMBEDDING_PROVIDER configuration for flexibility
  • OPENAI_SERVICE_TIER enum correctly limited to ["auto", "default", "flex"] per SDK compatibility

Based on learnings, the service_tier constraint is properly enforced at the schema level (excluding "priority" tier which requires OpenAI SDK >= v5.8.0).


296-370: LGTM! Multi-provider configuration logic is well-designed.

The implementation correctly handles:

Inference provider selection:

  • Respects explicit INFERENCE_PROVIDER when set
  • Auto-detects from available credentials in documented order (OpenAI → Anthropic → Google → Ollama)
  • Returns null when no provider is configured

Embedding provider selection:

  • Respects explicit EMBEDDING_PROVIDER when set
  • Intelligently falls back to inference provider when it supports embeddings (OpenAI, Google, Ollama)
  • Handles Anthropic's lack of embeddings API by auto-detecting alternative providers
  • Uses appropriate credential-based fallback order

The configuration structure cleanly separates provider-specific settings while maintaining backward compatibility. The clientConfig exposure provides necessary read-only information for UI components without leaking sensitive credentials.

tools/compare-models/src/inferenceClient.ts (2)

29-56: LGTM! Well-structured tagging method.

The method properly:

  • Defines a clear schema for validation
  • Uses safeParse with explicit error handling
  • Delegates to helper methods for separation of concerns
  • Provides sensible defaults for language and custom prompts

58-83: LGTM! Bounds check properly implemented.

The method correctly:

  • Validates the choices array is non-empty (lines 71-72) ✅ — this addresses the previous review comment
  • Checks for null/empty content with a clear error message
  • Uses appropriate response_format based on whether a schema is provided
  • Returns structured response with token usage

Replace monolithic inference module with provider-specific clients supporting
OpenAI, Anthropic Claude, Google Gemini, and Ollama. Each provider now has
native SDK integration with structured output support via JSON schema.

Architecture:
- Add packages/shared/inference/ with modular provider clients
- Add InferenceClientFactory and EmbeddingClientFactory for provider selection
- Separate inference and embedding providers (Anthropic lacks embeddings API)

New providers:
- OpenAI: Chat Completions + Responses API for GPT-5/o-series reasoning
- Google: Gemini 2.5/3.x with JSON schema, batch embeddings
- Anthropic: Claude 4.5 with structured outputs (beta), vision support

Configuration:
- Add INFERENCE_PROVIDER for explicit provider selection (auto-detects if unset)
- Add ANTHROPIC_API_KEY, ANTHROPIC_BASE_URL
- Add GEMINI_API_KEY, GEMINI_BASE_URL
- Add EMBEDDING_PROVIDER for separate embedding configuration
- Add OPENAI_USE_RESPONSES_API, OPENAI_REASONING_EFFORT for GPT-5
- Add reasoning effort support for o-series models (o1, o3, o4)
- Auto-detect max_completion_tokens for GPT-5/o-series models
- Change default models to gpt-5-mini

Breaking changes:
- Remove INFERENCE_SUPPORTS_STRUCTURED_OUTPUT (use INFERENCE_OUTPUT_SCHEMA)
- Remove INFERENCE_USE_MAX_COMPLETION_TOKENS (now auto-detected)

Other:
- Add ProviderIndicator component showing active provider/models in settings
- Add @anthropic-ai/sdk and @google/genai dependencies
- Upgrade zod 3.24.2 -> 3.25.0 for Anthropic SDK compatibility
- Add 106 unit tests and 19 live API integration tests
- Rewrite AI provider documentation with model tables and examples
- Add translations for AI provider settings UI to all 31 locales
@camjac251 camjac251 force-pushed the feat/multi-provider-inference branch from b1738bd to 15fd85b Compare January 3, 2026 21:15
@camjac251
Copy link
Author

@MohamedBassem could this be reviewed?

@MohamedBassem
Copy link
Collaborator

Hey, thanks for taking the time to send a PR. However, I'd recommend reading our contribution guidelines here to avoid any wasted effort. It's important for us to align on what to work on, otherwise it might not be aligned with the direction of the product.

Now for this PR, it's honestly not clear to me the value it adds compared to the complexity it introduced. All major LLM providers already support the open AI API. So if anything, I'm trying to converge more on that rather than split into separate providers. If I'm to go with separate providers, it'll also probably be through vercel's AI sdk. So unfortunately, I don't think I'm currently willing to accept the extra complexity introduced by this PR. I hate closing PRs that people have put effort into, that's why it's important for us to align first. Thanks again for taking the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants