feat: add LiteLLM as AI gateway vision-language model provider#2328
feat: add LiteLLM as AI gateway vision-language model provider#2328RheagalFire wants to merge 4 commits into
Conversation
Greptile SummaryThis PR adds
Confidence Score: 4/5Safe to merge after raising the litellm lower bound past the compromised 1.82.x builds; the new provider code itself is well-structured and additive. The only blocking concern is in pyproject.toml: the declared range >=1.80,<1.87 formally allows litellm 1.82.7 and 1.82.8, which were confirmed malicious PyPI releases (supply-chain attack, March 2026). Those versions are yanked, so fresh installs are safe, but environments with cached wheels or corporate artifact proxies that captured the packages before quarantine could still resolve to them. Bumping the lower bound to >=1.83.0 closes this gap. Everything else — the provider implementation, MIME-type fix, lazy import, factory wiring, and test suite — looks correct and consistent with the existing codebase. pyproject.toml — the litellm version range needs its lower bound updated to exclude compromised builds.
|
| Filename | Overview |
|---|---|
| pyproject.toml | Adds litellm optional-dependency group; the range >=1.80,<1.87 includes the supply-chain-compromised versions 1.82.7 and 1.82.8 — lower bound should be >=1.83.0. |
| dimos/models/vl/litellm.py | New LiteLLMVlModel provider following existing patterns; lazy-imports litellm, uses correct image/jpeg MIME type, and handles null responses gracefully. |
| dimos/models/vl/test_litellm.py | 27 unit tests with broad coverage; credential-forwarding tests mock _completion rather than litellm.completion so they cannot detect regressions inside _completion itself (already flagged in prior review thread). |
| dimos/models/vl/create.py | Adds "litellm" case to the factory with a lazy inline import; no issues. |
| dimos/models/vl/types.py | Adds "litellm" to the VlModelName Literal; straightforward and correct. |
Sequence Diagram
sequenceDiagram
participant Caller
participant LiteLLMVlModel
participant _completion
participant litellm
Caller->>LiteLLMVlModel: query(image, prompt)
LiteLLMVlModel->>LiteLLMVlModel: _prepare_image(image)
LiteLLMVlModel->>LiteLLMVlModel: image.to_base64() → JPEG data URI
LiteLLMVlModel->>_completion: model, messages, [response_format]
_completion->>_completion: inject api_key / api_base from config
_completion->>litellm: "completion(drop_params=True, **kwargs)"
litellm-->>_completion: ModelResponse
_completion-->>LiteLLMVlModel: ModelResponse
LiteLLMVlModel-->>Caller: choices[0].message.content or ""
Caller->>LiteLLMVlModel: query_batch(images, prompt)
LiteLLMVlModel->>_completion: model, messages (all images in one call)
_completion->>litellm: "completion(drop_params=True, **kwargs)"
litellm-->>_completion: ModelResponse
_completion-->>LiteLLMVlModel: single response_text
LiteLLMVlModel-->>Caller: "[response_text] * len(images)"
Reviews (2): Last reviewed commit: "[autofix.ci] apply automated fixes" | Re-trigger Greptile
| def query_batch( | ||
| self, | ||
| images: list[Image], | ||
| query: str, | ||
| response_format: dict[str, Any] | None = None, | ||
| **kwargs: Any, | ||
| ) -> list[str]: | ||
| """Query VLM with multiple images using a single API call.""" | ||
| if not images: | ||
| return [] | ||
|
|
||
| content: list[dict[str, Any]] = [ | ||
| { | ||
| "type": "image_url", | ||
| "image_url": { | ||
| "url": f"data:image/jpeg;base64,{self._prepare_image(img)[0].to_base64()}" | ||
| }, | ||
| } | ||
| for img in images | ||
| ] | ||
| content.append({"type": "text", "text": query}) | ||
|
|
||
| api_kwargs: dict[str, Any] = { | ||
| "model": self.config.model_name, | ||
| "messages": [{"role": "user", "content": content}], | ||
| } | ||
| if response_format: | ||
| api_kwargs["response_format"] = response_format | ||
|
|
||
| response = self._completion(**api_kwargs) | ||
| response_text = response.choices[0].message.content or "" | ||
| return [response_text] * len(images) |
There was a problem hiding this comment.
query_batch sends all images in one call regardless of provider support
query_batch packs every image into a single API message and returns [response_text] * len(images). This means (a) providers that don't support multi-image inputs (many Bedrock/Vertex models available through LiteLLM) will throw an exception for any multi-image call that would otherwise succeed per-image, and (b) callers expecting per-image responses always receive the same combined description repeated — silently wrong data.
The base-class fallback (query() per image) is both safer and correct for the per-image contract. The QwenVlModel shares the same design, but LiteLLMVlModel targets a far broader provider surface where single-message multi-image is not universal.
| if response_format: | ||
| api_kwargs["response_format"] = response_format | ||
|
|
||
| response = self._completion(**api_kwargs) | ||
| return response.choices[0].message.content or "" |
There was a problem hiding this comment.
**kwargs from query() and query_batch() are silently dropped
Both methods accept **kwargs but never forward them to _completion. Any extra kwargs a caller passes (temperature, max_tokens, stream, provider-specific flags) are silently discarded. Adding api_kwargs.update(kwargs) before the _completion call preserves caller intent and matches LiteLLM's flexibility.
| if response_format: | |
| api_kwargs["response_format"] = response_format | |
| response = self._completion(**api_kwargs) | |
| return response.choices[0].message.content or "" | |
| if response_format: | |
| api_kwargs["response_format"] = response_format | |
| api_kwargs.update(kwargs) | |
| response = self._completion(**api_kwargs) | |
| return response.choices[0].message.content or "" |
| model = LiteLLMVlModel(model_name="gpt-4o-mini", api_key="sk-test") | ||
| with patch(_COMPLETION_PATH, return_value=_resp()) as mock: | ||
| model.query(_img(), "hi") | ||
| assert mock.call_args.kwargs.get("api_key") is None # forwarded inside _completion | ||
|
|
||
| def test_api_base_forwarded_when_set(self) -> None: | ||
| model = LiteLLMVlModel( | ||
| model_name="azure/gpt-4o", | ||
| api_base="https://my-resource.openai.azure.com", | ||
| ) | ||
| with patch(_COMPLETION_PATH, return_value=_resp()) as mock: | ||
| model.query(_img(), "hi") | ||
| assert mock.call_args.kwargs.get("api_base") is None # forwarded inside _completion |
There was a problem hiding this comment.
Credential forwarding tests assert the wrong thing
Both test_api_key_forwarded_when_set and test_api_base_forwarded_when_set assert mock.call_args.kwargs.get("api_key") is None and get("api_base") is None. Since the mock targets _completion, the query() method's api_kwargs never include those fields (they are injected inside the real _completion). The assertion is trivially true regardless of whether credentials are configured — a regression in _completion's credential injection would pass these tests undetected.
| "soundfile", | ||
| ] | ||
|
|
||
| litellm = [ |
There was a problem hiding this comment.
Please also add litellm to all = [ below.
| if self.config.api_key: | ||
| kwargs["api_key"] = self.config.api_key |
There was a problem hiding this comment.
Does LiteLLM work without an api_key? I assume not. So instead of conditionally adding api_key to kwargs, it should throw an error if api_key is missing.
| litellm = [ | ||
| "litellm>=1.80,<1.87", | ||
| ] | ||
|
|
There was a problem hiding this comment.
Version range allows compromised litellm builds
The constraint litellm>=1.80,<1.87 includes versions 1.82.7 and 1.82.8, which were published to PyPI on March 24, 2026 as part of a confirmed supply-chain attack. Those builds embedded code that exfiltrated cloud credentials, SSH keys, and Kubernetes secrets at import time. PyPI quarantined the packages roughly 40 minutes after publication, so a fresh pip install will not resolve to them — but the constraint still formally permits those versions, meaning any environment that holds a local cache of those wheels (common in CI layer caches or corporate artifact proxies) could install them.
Raising the lower bound to >=1.83.0 (the first clean release after the incident) removes the compromised range entirely from the allowed set.
| litellm = [ | |
| "litellm>=1.80,<1.87", | |
| ] | |
| litellm = [ | |
| "litellm>=1.83.0,<1.87", | |
| ] |
Problem
dimos currently has OpenAI and Qwen (via DashScope) as cloud VLM providers. Users who want to use Anthropic Claude's vision, Google Gemini, Azure-hosted models, or AWS Bedrock need to write a new provider from scratch. LiteLLM is a Python SDK that provides a unified
completion()interface to 100+ LLM providers, so a single newVlModelcovers all of them.Solution
Added
LiteLLMVlModelextendingVlModel, following the same pattern asOpenAIVlModelandQwenVlModel:dimos/models/vl/litellm.py--LiteLLMVlModelwithLiteLLMVlModelConfig(model_name, api_key, api_base). Callslitellm.completion()directly as an SDK withdrop_params=Truefor cross-provider compatibility. Lazy-imports litellm so the base install is unaffected.dimos/models/vl/types.py-- added"litellm"toVlModelNameliteraldimos/models/vl/create.py-- added"litellm"case to the factorypyproject.toml-- added[project.optional-dependencies].litellm = ["litellm>=1.80,<1.87"]dimos/models/vl/test_litellm.py-- 27 unit tests across 8 categoriesKey decisions:
drop_params=Truesilently drops provider-unsupported kwargs (e.g.strict,seed) so the same config works across OpenAI, Anthropic, Gemini, etc.ANTHROPIC_API_KEY,OPENAI_API_KEY, etc.) directly.""instead of crashing.pip install 'dimos[litellm]'), additive only, existing providers untouched.Integration bug caught during deep-dive and fixed
MIME type mismatch:
Image.to_base64()encodes images as JPEG, but the existing OpenAI and Qwen providers label the data URI asdata:image/png;base64,.... OpenAI and Qwen are lenient about this mismatch, but Anthropic (via LiteLLM) strictly validates and rejects with400 Bad Request: "The image was specified using the image/png media type, but the image appears to be a image/jpeg image". The LiteLLM provider uses the correctdata:image/jpeg;base64,...label. The existing providers have the same latent bug but that's outside the scope of this PR.How to Test
Unit tests (27 pass):
Live E2E (Anthropic claude-sonnet-4-6 via Azure Foundry):
Usage examples:
Contributor License Agreement