feat: add LiteLLM as AI gateway vision-language model provider by RheagalFire · Pull Request #2328 · dimensionalOS/dimos

RheagalFire · 2026-06-02T17:00:24Z

Problem

dimos currently has OpenAI and Qwen (via DashScope) as cloud VLM providers. Users who want to use Anthropic Claude's vision, Google Gemini, Azure-hosted models, or AWS Bedrock need to write a new provider from scratch. LiteLLM is a Python SDK that provides a unified completion() interface to 100+ LLM providers, so a single new VlModel covers all of them.

Solution

Added LiteLLMVlModel extending VlModel, following the same pattern as OpenAIVlModel and QwenVlModel:

dimos/models/vl/litellm.py -- LiteLLMVlModel with LiteLLMVlModelConfig (model_name, api_key, api_base). Calls litellm.completion() directly as an SDK with drop_params=True for cross-provider compatibility. Lazy-imports litellm so the base install is unaffected.
dimos/models/vl/types.py -- added "litellm" to VlModelName literal
dimos/models/vl/create.py -- added "litellm" case to the factory
pyproject.toml -- added [project.optional-dependencies].litellm = ["litellm>=1.80,<1.87"]
dimos/models/vl/test_litellm.py -- 27 unit tests across 8 categories

Key decisions:

drop_params=True silently drops provider-unsupported kwargs (e.g. strict, seed) so the same config works across OpenAI, Anthropic, Gemini, etc.
Credentials forwarded only when explicitly set in config; when blank, litellm reads provider-specific env vars (ANTHROPIC_API_KEY, OPENAI_API_KEY, etc.) directly.
Null/empty response content returns "" instead of crashing.
litellm is an optional extra (pip install 'dimos[litellm]'), additive only, existing providers untouched.

Integration bug caught during deep-dive and fixed

MIME type mismatch: Image.to_base64() encodes images as JPEG, but the existing OpenAI and Qwen providers label the data URI as data:image/png;base64,.... OpenAI and Qwen are lenient about this mismatch, but Anthropic (via LiteLLM) strictly validates and rejects with 400 Bad Request: "The image was specified using the image/png media type, but the image appears to be a image/jpeg image". The LiteLLM provider uses the correct data:image/jpeg;base64,... label. The existing providers have the same latent bug but that's outside the scope of this PR.

How to Test

pip install 'dimos[litellm]'
pytest dimos/models/vl/test_litellm.py -v -o "addopts="

Unit tests (27 pass):

TestCoreDispatch::test_query_sends_correct_model_and_messages PASSED
TestCoreDispatch::test_query_sends_base64_image_url PASSED
TestCoreDispatch::test_response_format_forwarded PASSED
TestCoreDispatch::test_response_format_omitted_when_none PASSED
TestCredentials::test_api_key_forwarded_when_set PASSED
TestCredentials::test_api_base_forwarded_when_set PASSED
TestCredentials::test_no_api_key_no_env_lets_litellm_handle_it PASSED
TestBatch::test_batch_returns_per_image_response PASSED
TestBatch::test_batch_empty_input PASSED
TestBatch::test_batch_single_image PASSED
TestNullResponse::test_query_null_content_returns_empty_string PASSED
TestNullResponse::test_query_empty_string_content PASSED
TestNullResponse::test_batch_null_content_returns_empty_strings PASSED
TestExceptionPropagation::test_authentication_error_propagates PASSED
TestExceptionPropagation::test_not_found_error_propagates PASSED
TestExceptionPropagation::test_rate_limit_error_propagates PASSED
TestExceptionPropagation::test_generic_exception_not_swallowed PASSED
TestExceptionPropagation::test_batch_exception_propagates PASSED
TestNumpyInput::test_numpy_array_triggers_deprecation_warning PASSED
TestNumpyInput::test_numpy_array_still_works PASSED
TestDetections::test_query_detections_parses_json_response PASSED
TestDetections::test_query_detections_empty_response PASSED
TestDetections::test_query_detections_malformed_json PASSED
TestDetections::test_caption_uses_query PASSED
TestFactory::test_factory_creates_litellm_model PASSED
TestFactory::test_litellm_in_vlmodel_name_type PASSED
TestImportGuard::test_import_error_without_litellm PASSED
=================== 27 passed, 1 skipped in 0.64s ===================

Live E2E (Anthropic claude-sonnet-4-6 via Azure Foundry):

Querying model: claude-sonnet-4-6
Response: 'Black'

Testing query_detections...
Detections: 0 found

Testing caption...
Caption: 'The image shows a black silhouette of what appears to be a person
or figure, set against a dark background.'

SUCCESS - all live E2E tests passed

Usage examples:

from dimos.models.vl.litellm import LiteLLMVlModel
from dimos.msgs.sensor_msgs.Image import Image

# Use any provider via LiteLLM's model format: "provider/model-name"
model = LiteLLMVlModel(model_name="anthropic/claude-sonnet-4-20250514")
# export ANTHROPIC_API_KEY=...

image = Image.from_file("path/to/image.jpg").to_rgb()
response = model.query(image, "What do you see?")

# Google Gemini
model = LiteLLMVlModel(model_name="gemini/gemini-2.5-flash")
# export GEMINI_API_KEY=...

# Azure OpenAI
model = LiteLLMVlModel(model_name="azure/gpt-4o", api_base="https://my-resource.openai.azure.com")
# export AZURE_API_KEY=...

# Works with all VlModel methods: query_batch, query_detections, query_points, caption
detections = model.query_detections(image, "person")
caption = model.caption(image)

# Factory also works
from dimos.models.vl.create import create
model = create("litellm")

Contributor License Agreement

I have read and approved the CLA.

…ve tests

greptile-apps · 2026-06-02T17:04:00Z

Greptile Summary

This PR adds LiteLLMVlModel, a new VlModel provider that routes vision-language queries through LiteLLM's unified completion() interface, giving callers access to Anthropic Claude, Google Gemini, Azure OpenAI, AWS Bedrock, and 100+ other providers without writing new provider code. It also correctly fixes the JPEG/PNG MIME-type mismatch that affects the existing providers.

dimos/models/vl/litellm.py — new provider with lazy litellm import, proper image/jpeg data-URI encoding, null-response guard, and deprecation warning for raw numpy input.
pyproject.toml — adds litellm optional-dependency group; the current range >=1.80,<1.87 formally permits versions 1.82.7 and 1.82.8, which were part of a confirmed March 2026 PyPI supply-chain attack and should be excluded by raising the lower bound to >=1.83.0.
dimos/models/vl/test_litellm.py — 27 unit tests covering dispatch, credentials, batching, null responses, error propagation, numpy input, detection parsing, factory, and import guard.

Confidence Score: 4/5

Safe to merge after raising the litellm lower bound past the compromised 1.82.x builds; the new provider code itself is well-structured and additive.

The only blocking concern is in pyproject.toml: the declared range >=1.80,<1.87 formally allows litellm 1.82.7 and 1.82.8, which were confirmed malicious PyPI releases (supply-chain attack, March 2026). Those versions are yanked, so fresh installs are safe, but environments with cached wheels or corporate artifact proxies that captured the packages before quarantine could still resolve to them. Bumping the lower bound to >=1.83.0 closes this gap. Everything else — the provider implementation, MIME-type fix, lazy import, factory wiring, and test suite — looks correct and consistent with the existing codebase.

pyproject.toml — the litellm version range needs its lower bound updated to exclude compromised builds.

Security Review

Supply-chain risk in pyproject.toml: The declared range litellm>=1.80,<1.87 includes versions 1.82.7 and 1.82.8, which contained malicious code that executed at import time and exfiltrated cloud credentials, SSH keys, and Kubernetes secrets (PyPI supply-chain incident, March 24, 2026). Those versions were yanked from PyPI within ~40 minutes, so a standard pip install will not resolve to them, but the range still formally permits them — environments using a corporate artifact proxy or a warmed pip cache that captured those packages before quarantine remain at risk. The lower bound should be raised to >=1.83.0.

Important Files Changed

Filename	Overview
pyproject.toml	Adds `litellm` optional-dependency group; the range `>=1.80,<1.87` includes the supply-chain-compromised versions 1.82.7 and 1.82.8 — lower bound should be `>=1.83.0`.
dimos/models/vl/litellm.py	New `LiteLLMVlModel` provider following existing patterns; lazy-imports litellm, uses correct `image/jpeg` MIME type, and handles null responses gracefully.
dimos/models/vl/test_litellm.py	27 unit tests with broad coverage; credential-forwarding tests mock `_completion` rather than `litellm.completion` so they cannot detect regressions inside `_completion` itself (already flagged in prior review thread).
dimos/models/vl/create.py	Adds `"litellm"` case to the factory with a lazy inline import; no issues.
dimos/models/vl/types.py	Adds `"litellm"` to the `VlModelName` Literal; straightforward and correct.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant LiteLLMVlModel
    participant _completion
    participant litellm

    Caller->>LiteLLMVlModel: query(image, prompt)
    LiteLLMVlModel->>LiteLLMVlModel: _prepare_image(image)
    LiteLLMVlModel->>LiteLLMVlModel: image.to_base64() → JPEG data URI
    LiteLLMVlModel->>_completion: model, messages, [response_format]
    _completion->>_completion: inject api_key / api_base from config
    _completion->>litellm: "completion(drop_params=True, **kwargs)"
    litellm-->>_completion: ModelResponse
    _completion-->>LiteLLMVlModel: ModelResponse
    LiteLLMVlModel-->>Caller: choices[0].message.content or ""

    Caller->>LiteLLMVlModel: query_batch(images, prompt)
    LiteLLMVlModel->>_completion: model, messages (all images in one call)
    _completion->>litellm: "completion(drop_params=True, **kwargs)"
    litellm-->>_completion: ModelResponse
    _completion-->>LiteLLMVlModel: single response_text
    LiteLLMVlModel-->>Caller: "[response_text] * len(images)"

_{Reviews (2): Last reviewed commit: "[autofix.ci] apply automated fixes" | Re-trigger Greptile}

greptile-apps · 2026-06-02T17:04:04Z

+    def query_batch(
+        self,
+        images: list[Image],
+        query: str,
+        response_format: dict[str, Any] | None = None,
+        **kwargs: Any,
+    ) -> list[str]:
+        """Query VLM with multiple images using a single API call."""
+        if not images:
+            return []
+
+        content: list[dict[str, Any]] = [
+            {
+                "type": "image_url",
+                "image_url": {
+                    "url": f"data:image/jpeg;base64,{self._prepare_image(img)[0].to_base64()}"
+                },
+            }
+            for img in images
+        ]
+        content.append({"type": "text", "text": query})
+
+        api_kwargs: dict[str, Any] = {
+            "model": self.config.model_name,
+            "messages": [{"role": "user", "content": content}],
+        }
+        if response_format:
+            api_kwargs["response_format"] = response_format
+
+        response = self._completion(**api_kwargs)
+        response_text = response.choices[0].message.content or ""
+        return [response_text] * len(images)


query_batch sends all images in one call regardless of provider support

query_batch packs every image into a single API message and returns [response_text] * len(images). This means (a) providers that don't support multi-image inputs (many Bedrock/Vertex models available through LiteLLM) will throw an exception for any multi-image call that would otherwise succeed per-image, and (b) callers expecting per-image responses always receive the same combined description repeated — silently wrong data.

The base-class fallback (query() per image) is both safer and correct for the per-image contract. The QwenVlModel shares the same design, but LiteLLMVlModel targets a far broader provider surface where single-message multi-image is not universal.

greptile-apps · 2026-06-02T17:04:05Z

+        if response_format:
+            api_kwargs["response_format"] = response_format
+
+        response = self._completion(**api_kwargs)
+        return response.choices[0].message.content or ""


**kwargs from query() and query_batch() are silently dropped

Both methods accept **kwargs but never forward them to _completion. Any extra kwargs a caller passes (temperature, max_tokens, stream, provider-specific flags) are silently discarded. Adding api_kwargs.update(kwargs) before the _completion call preserves caller intent and matches LiteLLM's flexibility.

Suggested change

if response_format:

api_kwargs["response_format"] = response_format

response = self._completion(**api_kwargs)

return response.choices[0].message.content or ""

if response_format:

api_kwargs["response_format"] = response_format

api_kwargs.update(kwargs)

response = self._completion(**api_kwargs)

return response.choices[0].message.content or ""

greptile-apps · 2026-06-02T17:04:06Z

+        model = LiteLLMVlModel(model_name="gpt-4o-mini", api_key="sk-test")
+        with patch(_COMPLETION_PATH, return_value=_resp()) as mock:
+            model.query(_img(), "hi")
+        assert mock.call_args.kwargs.get("api_key") is None  # forwarded inside _completion
+
+    def test_api_base_forwarded_when_set(self) -> None:
+        model = LiteLLMVlModel(
+            model_name="azure/gpt-4o",
+            api_base="https://my-resource.openai.azure.com",
+        )
+        with patch(_COMPLETION_PATH, return_value=_resp()) as mock:
+            model.query(_img(), "hi")
+        assert mock.call_args.kwargs.get("api_base") is None  # forwarded inside _completion


Credential forwarding tests assert the wrong thing

Both test_api_key_forwarded_when_set and test_api_base_forwarded_when_set assert mock.call_args.kwargs.get("api_key") is None and get("api_base") is None. Since the mock targets _completion, the query() method's api_kwargs never include those fields (they are injected inside the real _completion). The assertion is trivially true regardless of whether credentials are configured — a regression in _completion's credential injection would pass these tests undetected.

paul-nechifor · 2026-06-02T22:44:12Z

    "soundfile",
 ]

+litellm = [


Please also add litellm to all = [ below.

paul-nechifor · 2026-06-02T22:48:02Z

+        if self.config.api_key:
+            kwargs["api_key"] = self.config.api_key


Does LiteLLM work without an api_key? I assume not. So instead of conditionally adding api_key to kwargs, it should throw an error if api_key is missing.

greptile-apps · 2026-06-02T22:54:15Z

+litellm = [
+    "litellm>=1.80,<1.87",
+]
+


Version range allows compromised litellm builds

The constraint litellm>=1.80,<1.87 includes versions 1.82.7 and 1.82.8, which were published to PyPI on March 24, 2026 as part of a confirmed supply-chain attack. Those builds embedded code that exfiltrated cloud credentials, SSH keys, and Kubernetes secrets at import time. PyPI quarantined the packages roughly 40 minutes after publication, so a fresh pip install will not resolve to them — but the constraint still formally permits those versions, meaning any environment that holds a local cache of those wheels (common in CI layer caches or corporate artifact proxies) could install them.

Raising the lower bound to >=1.83.0 (the first clean release after the incident) removes the compromised range entirely from the allowed set.

Suggested change

litellm = [

"litellm>=1.80,<1.87",

]

litellm = [

"litellm>=1.83.0,<1.87",

]

RheagalFire added 3 commits June 2, 2026 22:05

feat: add LiteLLM as AI gateway vision-language model provider

1fb914f

fix: remove non-standard env vars, handle null responses, comprehensi…

d5a621f

…ve tests

fix: use image/jpeg mime type (to_base64 encodes JPEG, not PNG)

3d4830e

RheagalFire requested review from leshy, mustafab0, paul-nechifor and spomichter as code owners June 2, 2026 17:00

greptile-apps Bot reviewed Jun 2, 2026

View reviewed changes

paul-nechifor reviewed Jun 2, 2026

View reviewed changes

Comment thread pyproject.toml

"soundfile",

]

litellm = [

Copy link
Copy Markdown

Contributor

paul-nechifor Jun 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add litellm to all = [ below.

paul-nechifor reviewed Jun 2, 2026

View reviewed changes

[autofix.ci] apply automated fixes

d5c2aa3

greptile-apps Bot reviewed Jun 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add LiteLLM as AI gateway vision-language model provider#2328

feat: add LiteLLM as AI gateway vision-language model provider#2328
RheagalFire wants to merge 4 commits into
dimensionalOS:mainfrom
RheagalFire:feat/add-litellm-provider

RheagalFire commented Jun 2, 2026

Uh oh!

greptile-apps Bot commented Jun 2, 2026 •

edited

Loading

Security Review

Uh oh!

greptile-apps Bot Jun 2, 2026

Uh oh!

greptile-apps Bot Jun 2, 2026

Uh oh!

greptile-apps Bot Jun 2, 2026

Uh oh!

paul-nechifor Jun 2, 2026

Uh oh!

paul-nechifor Jun 2, 2026

Uh oh!

greptile-apps Bot Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if self.config.api_key:
		kwargs["api_key"] = self.config.api_key

Conversation

RheagalFire commented Jun 2, 2026

Problem

Solution

Integration bug caught during deep-dive and fixed

How to Test

Contributor License Agreement

Uh oh!

greptile-apps Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Security Review

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

paul-nechifor Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

paul-nechifor Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented Jun 2, 2026 •

edited

Loading