You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
refactor(inference): modular multi-provider AI architecture
Replace monolithic inference module with provider-specific clients supporting
OpenAI, Anthropic Claude, Google Gemini, and Ollama. Each provider now has
native SDK integration with structured output support via JSON schema.
Architecture:
- Add packages/shared/inference/ with modular provider clients
- Add InferenceClientFactory and EmbeddingClientFactory for provider selection
- Separate inference and embedding providers (Anthropic lacks embeddings API)
New providers:
- OpenAI: Chat Completions + Responses API for GPT-5/o-series reasoning
- Google: Gemini 2.5/3.x with JSON schema, batch embeddings
- Anthropic: Claude 4.5 with structured outputs (beta), vision support
Configuration:
- Add INFERENCE_PROVIDER for explicit provider selection (auto-detects if unset)
- Add ANTHROPIC_API_KEY, ANTHROPIC_BASE_URL
- Add GEMINI_API_KEY, GEMINI_BASE_URL
- Add EMBEDDING_PROVIDER for separate embedding configuration
- Add OPENAI_USE_RESPONSES_API, OPENAI_REASONING_EFFORT for GPT-5
- Add reasoning effort support for o-series models (o1, o3, o4)
- Auto-detect max_completion_tokens for GPT-5/o-series models
- Change default models to gpt-5-mini
Breaking changes:
- Remove INFERENCE_SUPPORTS_STRUCTURED_OUTPUT (use INFERENCE_OUTPUT_SCHEMA)
- Remove INFERENCE_USE_MAX_COMPLETION_TOKENS (now auto-detected)
Other:
- Add ProviderIndicator component showing active provider/models in settings
- Add @anthropic-ai/sdk and @google/genai dependencies
- Upgrade zod 3.24.2 -> 3.25.0 for Anthropic SDK compatibility
- Add 106 unit tests and 19 live API integration tests
- Rewrite AI provider documentation with model tables and examples
- Add translations for AI provider settings UI to all 31 locales
0 commit comments