feat: add OpenAI Whisper API support (bring your own key)#129
feat: add OpenAI Whisper API support (bring your own key)#129CreatorGhost wants to merge 5 commits intoamicalhq:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds OpenAI Whisper as a cloud transcription option: provider/model metadata, a new OpenAIWhisper transcription provider, API-key persistence/validation, onboarding/settings UI flows, tRPC endpoints and schemas, i18n strings, error mapping, service routing updates, utilities, and tests. Changes
Sequence DiagramsequenceDiagram
participant User
participant UI as Settings / Onboarding UI
participant TRPC as tRPC Router
participant Service as ModelService
participant API as OpenAI /v1/models
participant DB as Local Settings
User->>UI: Enter API key & click Connect
UI->>TRPC: validateOpenAIWhisperConnection(apiKey)
TRPC->>Service: validateOpenAIWhisperConnection(apiKey)
Service->>API: GET /v1/models (Authorization: Bearer)
alt API returns whisper-1
API-->>Service: 200 + model list (includes whisper-1)
Service-->>TRPC: { success: true }
TRPC-->>UI: success
UI->>TRPC: setOpenAIWhisperConfig(apiKey)
TRPC->>Service: setOpenAIWhisperConfig(apiKey)
Service->>DB: Persist modelProvidersConfig.openAIWhisper
UI-->>User: Show connected
else API error (401/429/other)
API-->>Service: HTTP error
Service-->>TRPC: { success: false, error }
TRPC-->>UI: error
UI-->>User: Show error message
end
sequenceDiagram
participant Mic as Audio Frames
participant Provider as OpenAIWhisperProvider
participant VAD as VAD Logic
participant Encoder as WAV Encoder
participant API as OpenAI /v1/audio/transcriptions
Mic->>Provider: transcribe(frame, speechProb)
Provider->>VAD: update speech/silence counters
alt Silence threshold reached or buffer > 30s
Provider->>Provider: snapshot buffers
Provider->>VAD: extract speech segments
Provider->>Encoder: encode to 16-bit WAV buffer
Encoder-->>Provider: WAV Buffer
Provider->>API: POST /v1/audio/transcriptions (FormData + WAV + language/prompt)
alt Success
API-->>Provider: transcription text
Provider->>Provider: reset buffers
Provider-->>Mic: TranscriptionOutput{text}
else Failure
API-->>Provider: error (401/429/other)
Provider-->>Mic: AppError mapped to code (AUTH_REQUIRED, RATE_LIMIT_EXCEEDED, CLOUD_TRANSCRIPTION_FAILED)
end
else Not enough speech
Provider-->>Mic: ""
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
13cd9b2 to
443e0dc
Compare
There was a problem hiding this comment.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
apps/desktop/src/services/transcription-service.ts (1)
204-223:⚠️ Potential issue | 🟡 Minor
isModelAvailable()doesn't recognize OpenAI Whisper as a cloud model.The method checks
model?.provider === "Amical Cloud"but not"OpenAI", so when OpenAI Whisper is selected, it falls through to checking downloaded local models instead of returningtrueimmediately. This inconsistency withselectProvider()andinitialize()could cause confusing behavior.🛠️ Proposed fix
public async isModelAvailable(): Promise<boolean> { try { // Check if selected model is a cloud model (doesn't need download) const selectedModelId = await this.modelService.getSelectedModel(); if (selectedModelId) { const model = AVAILABLE_MODELS.find((m) => m.id === selectedModelId); - if (model?.provider === "Amical Cloud") { + if (model?.provider === "Amical Cloud" || model?.provider === "OpenAI") { return true; } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/desktop/src/services/transcription-service.ts` around lines 204 - 223, The isModelAvailable() logic currently only treats models with provider "Amical Cloud" as cloud models; update the provider check in the block that looks up AVAILABLE_MODELS (used after modelService.getSelectedModel()) to also treat OpenAI Whisper as a cloud model (e.g., include model.provider === "OpenAI" or the exact provider string used for OpenAI Whisper) so it returns true immediately when an OpenAI model is selected; ensure you change the condition in the isModelAvailable() function (and keep it consistent with selectProvider() and initialize() behavior).
🧹 Nitpick comments (1)
apps/desktop/src/constants/models.ts (1)
151-180: Separate provider identity from the display label.
provider: "OpenAI"is now part of the control flow inapps/desktop/src/services/transcription-service.ts:86-112andapps/desktop/src/services/model-service.ts:893-929. Because this field is also presentational data, a branding/copy change here would silently break routing. Add a machine-stableproviderTypeorproviderInstanceIdand keepproviderpurely for display.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/desktop/src/constants/models.ts` around lines 151 - 180, The model entry for "openai-whisper" uses the presentational field provider ("OpenAI") in control flow; add a stable machine identifier (e.g., providerType or providerInstanceId) to the model object and switch all control-flow checks in transcription-service (around the logic that inspects provider in apps/desktop/src/services/transcription-service.ts) and model-service (around the logic that inspects provider in apps/desktop/src/services/model-service.ts) to use that new machine-stable field instead of provider; keep provider as the display label and update any code that compares provider === "OpenAI" to compare providerType === "openai" (or similar) so branding changes won’t affect routing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/desktop/src/i18n/locales/en.json`:
- Around line 356-359: The "cloudTranscriptionFailed" i18n entry is
provider-specific ("OpenAI Whisper API request failed"); update its description
to be provider-agnostic (e.g., "Cloud transcription request failed" or similar)
so the notification correctly reflects any cloud backend—edit the
"cloudTranscriptionFailed" JSON object in apps/desktop/src/i18n/locales/en.json
and replace the hardcoded OpenAI Whisper wording in the "description" value with
a neutral message.
In `@apps/desktop/src/i18n/locales/es.json`:
- Around line 330-333: Translate the English strings under the JSON key
"cloudTranscriptionFailed" to Spanish: replace "title": "Cloud transcription
failed" with a Spanish title (e.g., "Error de transcripción en la nube") and
replace "description": "OpenAI Whisper API request failed" with a Spanish
description (e.g., "La solicitud a la API OpenAI Whisper falló") so both "title"
and "description" in the "cloudTranscriptionFailed" object are localized.
In `@apps/desktop/src/i18n/locales/ja.json`:
- Around line 330-333: The JSON entry "cloudTranscriptionFailed" currently has
English values for "title" and "description"; replace them with Japanese
translations to match the locale (update "title" to something like
"クラウド文字起こしに失敗しました" and "description" to "OpenAI Whisper APIへのリクエストが失敗しました"),
preserving the key names and JSON structure so other code referencing
cloudTranscriptionFailed.title/description continues to work.
In `@apps/desktop/src/i18n/locales/zh-TW.json`:
- Around line 330-333: Replace the English strings under the
"cloudTranscriptionFailed" key with Traditional Chinese: change "title" to
"雲端轉寫失敗" and "description" to "OpenAI Whisper API 請求失敗" so the zh-TW locale is
fully localized for that error entry.
In
`@apps/desktop/src/pipeline/providers/transcription/openai-whisper-provider.ts`:
- Around line 284-288: The thrown AppError in the OpenAI Whisper provider uses
the wrong code: replace ErrorCodes.LOCAL_TRANSCRIPTION_FAILED with
ErrorCodes.CLOUD_TRANSCRIPTION_FAILED in the catch/fallback that builds the
AppError (the throw that currently constructs `new AppError(...,
ErrorCodes.LOCAL_TRANSCRIPTION_FAILED)`), so the provider uses the same cloud
transcription error code as earlier; keep the existing message formatting
(`OpenAI Whisper transcription failed: ...`) and only change the ErrorCodes
symbol.
In `@apps/desktop/src/renderer/main/pages/settings/ai-models/tabs/SpeechTab.tsx`:
- Around line 225-226: Replace all hardcoded English UI strings in the SpeechTab
React component with i18n translation calls using the existing useTranslation
hook (t). Specifically, update toast.success/toast.error usages (e.g., the
success message after API key validation and all error messages such as "Failed
to validate API key", "Failed to validate OpenAI API key", "Failed to save
OpenAI Whisper configuration", "OpenAI Whisper configuration removed", "Failed
to remove OpenAI Whisper configuration") and all visible labels/messages in the
component (e.g., "Please configure your OpenAI API key above first." and the
labels around lines 517–587) to use
t("settings.aiModels.speech.<descriptive_key>") keys; add corresponding keys to
the locale files and replace the hardcoded strings inside the SpeechTab
component (look for toast.success, toast.error, and literal JSX text in
SpeechTab) to call t(...) instead. Ensure pluralization/formatting uses t where
needed and keep key names consistent with the existing settings.aiModels.speech
namespace.
In
`@apps/desktop/src/renderer/onboarding/components/screens/ModelSelectionScreen.tsx`:
- Around line 61-75: The recommendation rendering logic treats recommendation as
binary Cloud vs Local and thus maps ModelType.OpenAIWhisper to the wrong label;
update the conditional in ModelSelectionScreen that computes the recommendation
label (the code/path that builds recommendationLabel or selects the recommended
title/subtitle) to explicitly handle ModelType.OpenAIWhisper (in addition to the
existing Cloud and Local branches) and return the correct localized keys or text
for OpenAI Whisper (use
t("onboarding.modelSelection.recommendation.openaiWhisper") or the appropriate
i18n keys you already use for OpenAI entries); ensure both the displayed
recommendation title and any subtitle/description selection use this new branch
so OpenAIWhisper shows the intended recommendation text.
In `@apps/desktop/src/renderer/onboarding/components/screens/ModelSetupModal.tsx`:
- Around line 255-266: The password Input in ModelSetupModal lacks an accessible
label so screen readers lose context; update the Input (where
value={openAIApiKey}, onChange={setOpenAIApiKey}, onKeyDown calling
handleOpenAIConnect and disabled={isLoading}) to include an accessible label —
either add a visible <label> associated with the input or add an
aria-label/aria-labelledby prop (e.g., aria-label="OpenAI API key") so assistive
tech can announce the field purpose while preserving existing handlers and
props.
In `@apps/desktop/src/services/onboarding-service.ts`:
- Around line 687-725: The onboarding flow currently calls
completeOnboarding(finalState) before ensuring a working speech model and
swallows failures from modelService.setSelectedModel, which can mark users as
completed without a valid model; move the entire model-selection block (the
calls to modelService.setSelectedModel, modelService.getSelectedModel, and
modelService.getDownloadedModels) to run before completeOnboarding(finalState)
and let errors propagate (or rethrow) instead of only logging so onboarding
completion is blocked on a successful model selection; ensure you keep the same
logic for cloud, openai-whisper, and local branches and handle the
no-downloaded-model case consistently before calling completeOnboarding.
In `@apps/desktop/src/services/settings-service.ts`:
- Around line 279-305: The setOpenAIWhisperConfig currently writes the raw
apiKey into modelProvidersConfig (via setModelProvidersConfig) which persists as
plaintext; change this to store the secret in a secure OS-backed store (Electron
safeStorage or keytar) and only write a non-secret reference/flag into
modelProvidersConfig. Update setOpenAIWhisperConfig to save config.apiKey.trim()
into the secure store (using a unique key name like "openAIWhisper_apiKey") and
write no plaintext key to modelProvidersConfig; update getOpenAIWhisperConfig to
read the placeholder from modelProvidersConfig and retrieve the actual apiKey
from the secure store (return undefined if not present); update
removeOpenAIWhisperConfig to delete the secret from the secure store and remove
the reference from modelProvidersConfig. Ensure you use the existing helpers
getModelProvidersConfig and setModelProvidersConfig when updating the non-secret
metadata.
In `@apps/desktop/src/types/onboarding.ts`:
- Line 41: The schema enum ModelType has a new value OpenAIWhisper
("openai-whisper") but AppSettingsData.onboarding fields selectedModelType and
modelRecommendation.suggested in your schema definitions were not updated and
the code that casts/filters modelRecommendation.suggested is dropping that
value; update the schema types to include "openai-whisper" in ModelType and
ensure AppSettingsData.onboarding.selectedModelType and
AppSettingsData.onboarding.modelRecommendation.suggested accept that string, and
modify the casting/filtering logic that coerces modelRecommendation.suggested so
it does not exclude "openai-whisper" (remove the hardcoded whitelist or add the
new value) so recommendations using the new model type are preserved.
In `@apps/desktop/tests/pipeline/openai-whisper-provider.test.ts`:
- Around line 59-69: Tests assert the raw return value from
provider.transcribe() as a string, but transcribe() returns a
TranscriptionOutput object ({ text: string, detectedLanguage? }), so update
assertions to read the text property; specifically, replace
expect(result).toBe("Hello world") with expect(result.text).toBe("Hello world")
for the two occurrences, replace expect(result).toBe("") with
expect(result.text).toBe("") for the three occurrences, and replace
expect(typeof result).toBe("string") with expect(typeof
result.text).toBe("string"); locate usages around the provider.transcribe()
calls in the test file and adjust assertions to reference result.text.
---
Outside diff comments:
In `@apps/desktop/src/services/transcription-service.ts`:
- Around line 204-223: The isModelAvailable() logic currently only treats models
with provider "Amical Cloud" as cloud models; update the provider check in the
block that looks up AVAILABLE_MODELS (used after
modelService.getSelectedModel()) to also treat OpenAI Whisper as a cloud model
(e.g., include model.provider === "OpenAI" or the exact provider string used for
OpenAI Whisper) so it returns true immediately when an OpenAI model is selected;
ensure you change the condition in the isModelAvailable() function (and keep it
consistent with selectProvider() and initialize() behavior).
---
Nitpick comments:
In `@apps/desktop/src/constants/models.ts`:
- Around line 151-180: The model entry for "openai-whisper" uses the
presentational field provider ("OpenAI") in control flow; add a stable machine
identifier (e.g., providerType or providerInstanceId) to the model object and
switch all control-flow checks in transcription-service (around the logic that
inspects provider in apps/desktop/src/services/transcription-service.ts) and
model-service (around the logic that inspects provider in
apps/desktop/src/services/model-service.ts) to use that new machine-stable field
instead of provider; keep provider as the display label and update any code that
compares provider === "OpenAI" to compare providerType === "openai" (or similar)
so branding changes won’t affect routing.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d0bcccce-c21b-4dd1-8fd3-8885a81321a2
📒 Files selected for processing (22)
apps/desktop/src/constants/models.tsapps/desktop/src/constants/provider-types.tsapps/desktop/src/db/schema.tsapps/desktop/src/i18n/locales/en.jsonapps/desktop/src/i18n/locales/es.jsonapps/desktop/src/i18n/locales/ja.jsonapps/desktop/src/i18n/locales/zh-TW.jsonapps/desktop/src/pipeline/providers/transcription/openai-whisper-provider.tsapps/desktop/src/renderer/main/pages/settings/ai-models/tabs/SpeechTab.tsxapps/desktop/src/renderer/onboarding/components/screens/ModelSelectionScreen.tsxapps/desktop/src/renderer/onboarding/components/screens/ModelSetupModal.tsxapps/desktop/src/services/model-service.tsapps/desktop/src/services/onboarding-service.tsapps/desktop/src/services/settings-service.tsapps/desktop/src/services/transcription-service.tsapps/desktop/src/trpc/routers/models.tsapps/desktop/src/trpc/routers/settings.tsapps/desktop/src/types/error.tsapps/desktop/src/types/onboarding.tsapps/desktop/src/types/widget-notification.tsapps/desktop/src/utils/model-selection.tsapps/desktop/tests/pipeline/openai-whisper-provider.test.ts
| async getOpenAIWhisperConfig(): Promise<{ apiKey: string } | undefined> { | ||
| const config = await this.getModelProvidersConfig(); | ||
| return config?.openAIWhisper; | ||
| } | ||
|
|
||
| /** | ||
| * Update OpenAI Whisper configuration | ||
| */ | ||
| async setOpenAIWhisperConfig(config: { apiKey: string }): Promise<void> { | ||
| const currentConfig = await this.getModelProvidersConfig(); | ||
| await this.setModelProvidersConfig({ | ||
| ...currentConfig, | ||
| openAIWhisper: { | ||
| apiKey: config.apiKey.trim(), | ||
| }, | ||
| }); | ||
| } | ||
|
|
||
| /** | ||
| * Remove OpenAI Whisper configuration | ||
| */ | ||
| async removeOpenAIWhisperConfig(): Promise<void> { | ||
| const currentConfig = await this.getModelProvidersConfig(); | ||
| const updatedConfig = { ...currentConfig }; | ||
| delete updatedConfig.openAIWhisper; | ||
| await this.setModelProvidersConfig(updatedConfig); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== apps/desktop/src/db/app-settings.ts =="
fd -i 'app-settings.ts' apps/desktop/src/db -x sed -n '1,260p' {}
echo
echo "== apps/desktop/src/db/schema.ts =="
fd -i 'schema.ts' apps/desktop/src/db -x sed -n '1,220p' {}
echo
echo "== secret-handling references =="
rg -n -C2 'modelProvidersConfig|openAIWhisper|safeStorage|keytar|encrypt|decrypt|secret' apps/desktop/srcRepository: amicalhq/amical
Length of output: 29348
API keys must not be stored as plain text in the settings database.
setOpenAIWhisperConfig() writes the raw API key directly to modelProvidersConfig, which is persisted as plain JSON text in SQLite. The schema stores it as a simple apiKey: string field with no encryption layer. updateSettingsSection("modelProvidersConfig", ...) performs a shallow merge and writes the entire config object to the database's data text field without any encryption.
This means BYOK credentials are stored in cleartext in the SQLite settings database. Use Electron's safeStorage API or OS keychain for credential storage instead.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/desktop/src/services/settings-service.ts` around lines 279 - 305, The
setOpenAIWhisperConfig currently writes the raw apiKey into modelProvidersConfig
(via setModelProvidersConfig) which persists as plaintext; change this to store
the secret in a secure OS-backed store (Electron safeStorage or keytar) and only
write a non-secret reference/flag into modelProvidersConfig. Update
setOpenAIWhisperConfig to save config.apiKey.trim() into the secure store (using
a unique key name like "openAIWhisper_apiKey") and write no plaintext key to
modelProvidersConfig; update getOpenAIWhisperConfig to read the placeholder from
modelProvidersConfig and retrieve the actual apiKey from the secure store
(return undefined if not present); update removeOpenAIWhisperConfig to delete
the secret from the secure store and remove the reference from
modelProvidersConfig. Ensure you use the existing helpers
getModelProvidersConfig and setModelProvidersConfig when updating the non-secret
metadata.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (4)
apps/desktop/src/pipeline/providers/transcription/openai-whisper-provider.ts (1)
284-288:⚠️ Potential issue | 🟡 MinorWrong error code used for fallback errors.
The fallback error handling uses
ErrorCodes.LOCAL_TRANSCRIPTION_FAILEDbut this is a cloud provider. For consistency with the error handling above (line 264), it should useErrorCodes.CLOUD_TRANSCRIPTION_FAILED.🛠️ Proposed fix
throw new AppError( `OpenAI Whisper transcription failed: ${error instanceof Error ? error.message : error}`, - ErrorCodes.LOCAL_TRANSCRIPTION_FAILED, + ErrorCodes.CLOUD_TRANSCRIPTION_FAILED, );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/desktop/src/pipeline/providers/transcription/openai-whisper-provider.ts` around lines 284 - 288, The thrown AppError in the OpenAI Whisper provider's fallback catch block is using ErrorCodes.LOCAL_TRANSCRIPTION_FAILED; change it to ErrorCodes.CLOUD_TRANSCRIPTION_FAILED to match the cloud provider error handling used earlier (see the earlier throw at line ~264) so the fallback error is categorized correctly for the OpenAIWhisper transcription path (the throw new AppError(...) in openai-whisper-provider.ts).apps/desktop/src/renderer/onboarding/components/screens/ModelSetupModal.tsx (1)
255-266:⚠️ Potential issue | 🟡 MinorLabel the API key field for accessibility.
The password input only has placeholder text, so assistive technology loses the field purpose once focus is inside it. Add an
aria-labelattribute.♿ Proposed fix
<Input type="password" + aria-label={t("onboarding.modelSetup.openai.title")} placeholder={t("onboarding.modelSetup.openai.placeholder")} value={openAIApiKey}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/desktop/src/renderer/onboarding/components/screens/ModelSetupModal.tsx` around lines 255 - 266, The password Input for the OpenAI key lacks an accessible label; update the Input used in ModelSetupModal (the component with value openAIApiKey, onChange setOpenAIApiKey, onKeyDown that calls handleOpenAIConnect, and disabled isLoading) to include an aria-label (e.g., aria-label="OpenAI API key") so screen readers retain the field purpose when focused; ensure the aria-label reflects the same user-facing purpose as the placeholder.apps/desktop/src/renderer/main/pages/settings/ai-models/tabs/SpeechTab.tsx (2)
517-587:⚠️ Potential issue | 🟡 MinorHardcoded strings in OpenAI Whisper configuration section.
Multiple hardcoded English strings should use the i18n translation system (
t()) for consistency:
- Line 522:
"OpenAI Whisper"- Lines 539-541:
"Connected","Not configured"- Lines 544-546: description text
- Line 551: placeholder text
- Line 566:
"Validating..."- Line 569:
"Connect"- Line 578:
"Remove"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/desktop/src/renderer/main/pages/settings/ai-models/tabs/SpeechTab.tsx` around lines 517 - 587, Replace the hardcoded English strings in the OpenAI Whisper UI in SpeechTab.tsx with i18n lookups using t(...) and appropriate translation keys: update the label "OpenAI Whisper" (near the KeyRound/Label), the badge texts ("Connected" / "Not configured") that depend on openAIStatus, the description paragraph below the label, the Input placeholder ("Enter your OpenAI API key (sk-...)"), the button text "Validating..." (when openAIValidating is true), the "Connect" button text, and the "Remove" button text; keep the existing logic and props (openAIApiKey, openAIValidating, openAIValidationError, handleOpenAIConnect, handleOpenAIRemove) but replace literal strings with t('settings.ai.openaiWhisper.title'), t('settings.ai.openaiWhisper.connected'), etc., adding keys to your i18n files accordingly.
225-226:⚠️ Potential issue | 🟡 MinorHardcoded English strings should use i18n.
Multiple UI strings are hardcoded in English instead of using the translation system (
t()) like the rest of the component. This affects lines 225, 228, 230, 236, 248, 260, and 264.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/desktop/src/renderer/main/pages/settings/ai-models/tabs/SpeechTab.tsx` around lines 225 - 226, The component SpeechTab contains several hardcoded English UI strings (e.g., "OpenAI API key validated successfully" and others around the API key validation and error flows) that should be replaced with i18n calls; update those occurrences to use the component's translation function (t('...')) with appropriate translation keys, ensure the t function (from useTranslation) is available in SpeechTab, and add corresponding keys to the translation resource files; specifically locate the hardcoded messages in SpeechTab (validation success, validation failure, connection/test prompts and error messages referenced around the validation logic) and replace each string literal with t('settings.speech.<descriptive_key>') or similar keys, keeping context placeholders if needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/desktop/src/renderer/main/pages/settings/ai-models/tabs/SpeechTab.tsx`:
- Around line 549-556: The Input element rendering the OpenAI API key (props:
type="password", value={openAIApiKey}, onChange={setOpenAIApiKey},
disabled={openAIStatus === "connected"}) is missing an accessible label; add an
aria-label attribute (e.g. aria-label="OpenAI API key" or similar descriptive
text) to that Input so screen readers can identify the field, ensuring the label
remains present when the input is disabled.
---
Duplicate comments:
In
`@apps/desktop/src/pipeline/providers/transcription/openai-whisper-provider.ts`:
- Around line 284-288: The thrown AppError in the OpenAI Whisper provider's
fallback catch block is using ErrorCodes.LOCAL_TRANSCRIPTION_FAILED; change it
to ErrorCodes.CLOUD_TRANSCRIPTION_FAILED to match the cloud provider error
handling used earlier (see the earlier throw at line ~264) so the fallback error
is categorized correctly for the OpenAIWhisper transcription path (the throw new
AppError(...) in openai-whisper-provider.ts).
In `@apps/desktop/src/renderer/main/pages/settings/ai-models/tabs/SpeechTab.tsx`:
- Around line 517-587: Replace the hardcoded English strings in the OpenAI
Whisper UI in SpeechTab.tsx with i18n lookups using t(...) and appropriate
translation keys: update the label "OpenAI Whisper" (near the KeyRound/Label),
the badge texts ("Connected" / "Not configured") that depend on openAIStatus,
the description paragraph below the label, the Input placeholder ("Enter your
OpenAI API key (sk-...)"), the button text "Validating..." (when
openAIValidating is true), the "Connect" button text, and the "Remove" button
text; keep the existing logic and props (openAIApiKey, openAIValidating,
openAIValidationError, handleOpenAIConnect, handleOpenAIRemove) but replace
literal strings with t('settings.ai.openaiWhisper.title'),
t('settings.ai.openaiWhisper.connected'), etc., adding keys to your i18n files
accordingly.
- Around line 225-226: The component SpeechTab contains several hardcoded
English UI strings (e.g., "OpenAI API key validated successfully" and others
around the API key validation and error flows) that should be replaced with i18n
calls; update those occurrences to use the component's translation function
(t('...')) with appropriate translation keys, ensure the t function (from
useTranslation) is available in SpeechTab, and add corresponding keys to the
translation resource files; specifically locate the hardcoded messages in
SpeechTab (validation success, validation failure, connection/test prompts and
error messages referenced around the validation logic) and replace each string
literal with t('settings.speech.<descriptive_key>') or similar keys, keeping
context placeholders if needed.
In `@apps/desktop/src/renderer/onboarding/components/screens/ModelSetupModal.tsx`:
- Around line 255-266: The password Input for the OpenAI key lacks an accessible
label; update the Input used in ModelSetupModal (the component with value
openAIApiKey, onChange setOpenAIApiKey, onKeyDown that calls
handleOpenAIConnect, and disabled isLoading) to include an aria-label (e.g.,
aria-label="OpenAI API key") so screen readers retain the field purpose when
focused; ensure the aria-label reflects the same user-facing purpose as the
placeholder.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f8a2d269-fd46-4828-b41a-72699d1218c0
📒 Files selected for processing (22)
apps/desktop/src/constants/models.tsapps/desktop/src/constants/provider-types.tsapps/desktop/src/db/schema.tsapps/desktop/src/i18n/locales/en.jsonapps/desktop/src/i18n/locales/es.jsonapps/desktop/src/i18n/locales/ja.jsonapps/desktop/src/i18n/locales/zh-TW.jsonapps/desktop/src/pipeline/providers/transcription/openai-whisper-provider.tsapps/desktop/src/renderer/main/pages/settings/ai-models/tabs/SpeechTab.tsxapps/desktop/src/renderer/onboarding/components/screens/ModelSelectionScreen.tsxapps/desktop/src/renderer/onboarding/components/screens/ModelSetupModal.tsxapps/desktop/src/services/model-service.tsapps/desktop/src/services/onboarding-service.tsapps/desktop/src/services/settings-service.tsapps/desktop/src/services/transcription-service.tsapps/desktop/src/trpc/routers/models.tsapps/desktop/src/trpc/routers/settings.tsapps/desktop/src/types/error.tsapps/desktop/src/types/onboarding.tsapps/desktop/src/types/widget-notification.tsapps/desktop/src/utils/model-selection.tsapps/desktop/tests/pipeline/openai-whisper-provider.test.ts
✅ Files skipped from review due to trivial changes (6)
- apps/desktop/src/i18n/locales/es.json
- apps/desktop/src/types/error.ts
- apps/desktop/src/db/schema.ts
- apps/desktop/src/i18n/locales/ja.json
- apps/desktop/src/i18n/locales/en.json
- apps/desktop/src/constants/models.ts
🚧 Files skipped from review as they are similar to previous changes (10)
- apps/desktop/src/i18n/locales/zh-TW.json
- apps/desktop/src/types/onboarding.ts
- apps/desktop/src/renderer/onboarding/components/screens/ModelSelectionScreen.tsx
- apps/desktop/src/trpc/routers/models.ts
- apps/desktop/src/utils/model-selection.ts
- apps/desktop/src/trpc/routers/settings.ts
- apps/desktop/src/services/settings-service.ts
- apps/desktop/src/services/onboarding-service.ts
- apps/desktop/src/constants/provider-types.ts
- apps/desktop/tests/pipeline/openai-whisper-provider.test.ts
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (1)
apps/desktop/src/renderer/main/pages/settings/ai-models/tabs/SpeechTab.tsx (1)
548-555:⚠️ Potential issue | 🟡 MinorAdd a programmatic label to the API key field.
This input still has no accessible name. Placeholder text is not a stable label once the field is focused or autofilled, so please add
aria-labelor wire aLabel htmlForto the input.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/desktop/src/renderer/main/pages/settings/ai-models/tabs/SpeechTab.tsx` around lines 548 - 555, The API key Input (the Input component controlling openAIApiKey via setOpenAIApiKey and disabled by openAIStatus) lacks an accessible name; add an accessible label by either giving the Input a stable id and wiring a Label with htmlFor to that id or by adding an aria-label (e.g., aria-label="OpenAI API key") to the Input; ensure the label text is localized via t(...) if using a visible Label and keep existing props (value/onChange/disabled) intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@apps/desktop/src/pipeline/providers/transcription/openai-whisper-provider.ts`:
- Around line 163-170: The code currently calls this.reset() immediately after
aggregating frames in doTranscription(), which loses buffered audio if
subsequent VAD filtering or the network request throws; change the flow to only
clear buffers on the successful paths (e.g., after a successful transcription
response or when audio is empty), or capture and restore the buffers in the
catch block: keep the captured vadProbs and rawAudio (from aggregateFrames and
frameBufferSpeechProbabilities) alive, move or defer the this.reset() call until
after the network call completes successfully (or after VAD finds no speech),
and if an error is thrown restore the frame buffer and
frameBufferSpeechProbabilities so no audio is lost. Ensure to reference
doTranscription(), aggregateFrames(), reset(), and
frameBufferSpeechProbabilities when locating and changing the code.
- Around line 273-276: The current logger.transcription.info call logs verbatim
transcription via the preview field (variable text), which must be removed;
update the logger.transcription.info invocation in openai-whisper-provider (the
logger.transcription.info call that currently sends textLength and preview using
text) to stop emitting any plaintext transcription — keep only metadata (e.g.,
textLength, duration, model) or replace preview with a fixed placeholder like
"[REDACTED]" or a non-reversible hash if you need a referent; ensure the
variable text is not included in any log payload or passed through any logging
helper.
In `@apps/desktop/src/renderer/main/pages/settings/ai-models/tabs/SpeechTab.tsx`:
- Around line 215-224: When kicking off validation with
validateOpenAIWhisperMutation, capture the trimmed API key into a local variable
(e.g., const keyToValidate = openAIApiKey.trim()) before calling the mutation
and use that same keyToValidate in the onSuccess path when calling
setOpenAIWhisperConfigMutation.mutate instead of reading openAIApiKey from
state; apply the same fix to the other validate-and-save block referenced (lines
~548-555) so the persisted key is the exact value that was validated.
- Around line 212-213: The modelProvidersConfigQuery
(api.settings.getModelProvidersConfig.useQuery) is hydrating the raw persisted
API key into the renderer and then copying it into React state/Input; instead,
change the query to never return the secret to the client (either update the
backend endpoint to omit/return a masked flag or use the query's
selector/transform to replace the key with a masked value or boolean indicating
presence). Locate usages in SpeechTab.tsx (modelProvidersConfigQuery and the
effect that copies the key into React state/Input) and the other referenced
blocks (around lines you noted: 269-273 and 548-555) and ensure only a masked
indicator or boolean is passed to the renderer and that the Input displays a
masked value rather than the actual key.
In `@apps/desktop/src/services/transcription-service.ts`:
- Around line 121-127: The current cloud-availability check counts provider ===
"OpenAI" even without credentials; update the logic so OpenAI only counts when
getOpenAIWhisperConfig() returns a truthy apiKey. Specifically, modify the
isCloudModel/hasCloudOption computation in transcription-service.ts (the
constants isCloudModel, openAIConfig from getOpenAIWhisperConfig(), and
hasCloudOption) so that provider === "OpenAI" is treated as available only if
openAIConfig?.apiKey is present; ensure related callers like isModelAvailable()
and handleModelChange() will therefore see OpenAI as unavailable when no API key
exists.
---
Duplicate comments:
In `@apps/desktop/src/renderer/main/pages/settings/ai-models/tabs/SpeechTab.tsx`:
- Around line 548-555: The API key Input (the Input component controlling
openAIApiKey via setOpenAIApiKey and disabled by openAIStatus) lacks an
accessible name; add an accessible label by either giving the Input a stable id
and wiring a Label with htmlFor to that id or by adding an aria-label (e.g.,
aria-label="OpenAI API key") to the Input; ensure the label text is localized
via t(...) if using a visible Label and keep existing props
(value/onChange/disabled) intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2c310b84-6369-4e89-b8be-d7092bd2b559
📒 Files selected for processing (11)
apps/desktop/src/i18n/locales/en.jsonapps/desktop/src/i18n/locales/es.jsonapps/desktop/src/i18n/locales/ja.jsonapps/desktop/src/i18n/locales/zh-TW.jsonapps/desktop/src/pipeline/providers/transcription/openai-whisper-provider.tsapps/desktop/src/renderer/main/pages/settings/ai-models/tabs/SpeechTab.tsxapps/desktop/src/renderer/onboarding/components/screens/ModelSelectionScreen.tsxapps/desktop/src/renderer/onboarding/components/screens/ModelSetupModal.tsxapps/desktop/src/services/onboarding-service.tsapps/desktop/src/services/transcription-service.tsapps/desktop/tests/pipeline/openai-whisper-provider.test.ts
✅ Files skipped from review due to trivial changes (1)
- apps/desktop/src/i18n/locales/es.json
🚧 Files skipped from review as they are similar to previous changes (7)
- apps/desktop/src/i18n/locales/ja.json
- apps/desktop/src/i18n/locales/zh-TW.json
- apps/desktop/src/renderer/onboarding/components/screens/ModelSelectionScreen.tsx
- apps/desktop/src/i18n/locales/en.json
- apps/desktop/src/renderer/onboarding/components/screens/ModelSetupModal.tsx
- apps/desktop/tests/pipeline/openai-whisper-provider.test.ts
- apps/desktop/src/services/onboarding-service.ts
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@apps/desktop/src/pipeline/providers/transcription/openai-whisper-provider.ts`:
- Around line 128-148: The duration calculation in shouldTranscribe()
incorrectly assumes each entry in frameBuffer has FRAME_SIZE samples, which
breaks after error recovery when frameBuffer is set to a single aggregated
buffer ([rawAudio]); update audioDurationMs computation to sum the actual sample
counts of entries in this.frameBuffer (e.g., for each buffer entry use its
length if it's a typed array or FRAME_SIZE if it's a frame placeholder) then
multiply by 1000 / SAMPLE_RATE, preserving the existing silenceDurationMs logic
using currentSilenceFrameCount; adjust the check against MIN_AUDIO_DURATION_MS
and the 30000ms guard accordingly so transcription triggers correctly after
recovery.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3831afd4-8d21-45e3-89ac-1e8119223834
📒 Files selected for processing (3)
apps/desktop/src/pipeline/providers/transcription/openai-whisper-provider.tsapps/desktop/src/renderer/main/pages/settings/ai-models/tabs/SpeechTab.tsxapps/desktop/src/services/transcription-service.ts
✅ Files skipped from review due to trivial changes (1)
- apps/desktop/src/services/transcription-service.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- apps/desktop/src/renderer/main/pages/settings/ai-models/tabs/SpeechTab.tsx
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@apps/desktop/src/pipeline/providers/transcription/openai-whisper-provider.ts`:
- Around line 281-285: The catch block restores buffers inconsistently: it sets
this.frameBuffer = [rawAudio] but leaves this.frameBufferSpeechProbabilities =
vadProbs, breaking the 1:1 alignment used by transcribe() and
extractSpeechFromVad(). Fix by making the two arrays have matching lengths —
either (preferred) restore frameBuffer as the original per-frame chunks so
frameBuffer.length === vadProbs.length (reconstruct the same frame segmentation
used when building vadProbs and assign that array to this.frameBuffer), or
(alternate) collapse vadProbs to a single representative value (e.g., mean or
last probability) so this.frameBufferSpeechProbabilities = [repProb] to match
the single rawAudio entry; update the catch block where frameBuffer and
frameBufferSpeechProbabilities are reassigned to ensure alignment.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2112f78c-83fc-4bbd-82f1-37fa572ee5a1
📒 Files selected for processing (1)
apps/desktop/src/pipeline/providers/transcription/openai-whisper-provider.ts
Add OpenAI Whisper as a cloud transcription option, allowing users to use their own OpenAI API key for speech-to-text. This provides an alternative to Amical Cloud and local models. Changes: - New OpenAI Whisper transcription provider with VAD filtering, WAV encoding, and 60s request timeout - Settings UI for API key configuration with connection validation - Onboarding flow integration with dedicated setup modal - Model selection persistence through app restart (fix race condition in onboarding completion and startup validation) - Proper error handling with CLOUD_TRANSCRIPTION_FAILED error code - i18n support for en, es, ja, zh-TW locales - Unit tests for the provider Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Make cloudTranscriptionFailed i18n description provider-agnostic
- Translate cloudTranscriptionFailed to es, ja, zh-TW locales
- Fix wrong error code: LOCAL → CLOUD_TRANSCRIPTION_FAILED in catch
- Replace all hardcoded English strings in SpeechTab with i18n calls
- Add aria-label to OpenAI API key input in ModelSetupModal
- Handle OpenAIWhisper in recommendation label (ModelSelectionScreen)
- Move model selection before completeOnboarding to block on success
- Fix modelRecommendation.suggested cast to include "openai-whisper"
- Fix isModelAvailable to treat OpenAI as cloud model
- Update test assertions for TranscriptionOutput ({ text }) return type
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Defer buffer reset until after successful transcription; restore buffers on error so audio is not lost on transient failures - Remove plaintext transcription preview from logs (privacy) - Use mutation variables in onSuccess to persist the exact validated key - Mask API key in renderer — don't hydrate raw secret into React state - Add aria-label to settings API key input for accessibility - OpenAI Whisper only counts as available when API key is present (isModelAvailable and initialize cloud checks) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
After error recovery, frameBuffer may contain a single aggregated Float32Array instead of fixed-size frames. Sum actual buffer lengths instead of assuming FRAME_SIZE per entry. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Collapse vadProbs to a single mean probability to match the single aggregated rawAudio entry, maintaining the 1:1 alignment expected by transcribe() and extractSpeechFromVad(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
d387839 to
ab3eb3a
Compare
Summary
completeOnboardingFlowbefore app relaunchCLOUD_TRANSCRIPTION_FAILEDerror codeFiles changed (22 files)
openai-whisper-provider.ts— Transcription provider with WAV encoding, VAD filtering, API error handlingopenai-whisper-provider.test.ts— Unit testsmodel-service.ts— Skip OAuth check for OpenAI Whisper on startup, add provider guard in auth listenersonboarding-service.ts— Set model incompleteOnboardingFlowto fix race conditiontranscription-service.ts— Integrate OpenAI Whisper provider, suppress "no models" dialog when API key configuredsettings-service.ts— Add OpenAI Whisper config get/set methodsSpeechTab.tsx— API key UI, hide download buttons for cloud modelModelSelectionScreen.tsx— Add OpenAI Whisper card to onboardingModelSetupModal.tsx— API key input, validation, and save flowsettings.ts(tRPC) — Zod schema, validation and config mutationsmodels.ts(tRPC) — Validation query for OpenAI Whisper connectionmodels.ts(constants) — OpenAI Whisper model definitionprovider-types.ts—openAIWhisperprovider typemodel-selection.ts— Selection key helpers for OpenAI Whispererror.ts,widget-notification.ts— New error codeonboarding.ts—OpenAIWhispermodel type enumschema.ts— DB schema for OpenAI Whisper configTest plan
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Services
Documentation
Tests