Skip to content

feat: add LiteLLM as AI gateway vision-language model provider#2328

Open
RheagalFire wants to merge 4 commits into
dimensionalOS:mainfrom
RheagalFire:feat/add-litellm-provider
Open

feat: add LiteLLM as AI gateway vision-language model provider#2328
RheagalFire wants to merge 4 commits into
dimensionalOS:mainfrom
RheagalFire:feat/add-litellm-provider

Conversation

@RheagalFire
Copy link
Copy Markdown

Problem

dimos currently has OpenAI and Qwen (via DashScope) as cloud VLM providers. Users who want to use Anthropic Claude's vision, Google Gemini, Azure-hosted models, or AWS Bedrock need to write a new provider from scratch. LiteLLM is a Python SDK that provides a unified completion() interface to 100+ LLM providers, so a single new VlModel covers all of them.

Solution

Added LiteLLMVlModel extending VlModel, following the same pattern as OpenAIVlModel and QwenVlModel:

  • dimos/models/vl/litellm.py -- LiteLLMVlModel with LiteLLMVlModelConfig (model_name, api_key, api_base). Calls litellm.completion() directly as an SDK with drop_params=True for cross-provider compatibility. Lazy-imports litellm so the base install is unaffected.
  • dimos/models/vl/types.py -- added "litellm" to VlModelName literal
  • dimos/models/vl/create.py -- added "litellm" case to the factory
  • pyproject.toml -- added [project.optional-dependencies].litellm = ["litellm>=1.80,<1.87"]
  • dimos/models/vl/test_litellm.py -- 27 unit tests across 8 categories

Key decisions:

  • drop_params=True silently drops provider-unsupported kwargs (e.g. strict, seed) so the same config works across OpenAI, Anthropic, Gemini, etc.
  • Credentials forwarded only when explicitly set in config; when blank, litellm reads provider-specific env vars (ANTHROPIC_API_KEY, OPENAI_API_KEY, etc.) directly.
  • Null/empty response content returns "" instead of crashing.
  • litellm is an optional extra (pip install 'dimos[litellm]'), additive only, existing providers untouched.

Integration bug caught during deep-dive and fixed

MIME type mismatch: Image.to_base64() encodes images as JPEG, but the existing OpenAI and Qwen providers label the data URI as data:image/png;base64,.... OpenAI and Qwen are lenient about this mismatch, but Anthropic (via LiteLLM) strictly validates and rejects with 400 Bad Request: "The image was specified using the image/png media type, but the image appears to be a image/jpeg image". The LiteLLM provider uses the correct data:image/jpeg;base64,... label. The existing providers have the same latent bug but that's outside the scope of this PR.

How to Test

pip install 'dimos[litellm]'
pytest dimos/models/vl/test_litellm.py -v -o "addopts="

Unit tests (27 pass):

TestCoreDispatch::test_query_sends_correct_model_and_messages PASSED
TestCoreDispatch::test_query_sends_base64_image_url PASSED
TestCoreDispatch::test_response_format_forwarded PASSED
TestCoreDispatch::test_response_format_omitted_when_none PASSED
TestCredentials::test_api_key_forwarded_when_set PASSED
TestCredentials::test_api_base_forwarded_when_set PASSED
TestCredentials::test_no_api_key_no_env_lets_litellm_handle_it PASSED
TestBatch::test_batch_returns_per_image_response PASSED
TestBatch::test_batch_empty_input PASSED
TestBatch::test_batch_single_image PASSED
TestNullResponse::test_query_null_content_returns_empty_string PASSED
TestNullResponse::test_query_empty_string_content PASSED
TestNullResponse::test_batch_null_content_returns_empty_strings PASSED
TestExceptionPropagation::test_authentication_error_propagates PASSED
TestExceptionPropagation::test_not_found_error_propagates PASSED
TestExceptionPropagation::test_rate_limit_error_propagates PASSED
TestExceptionPropagation::test_generic_exception_not_swallowed PASSED
TestExceptionPropagation::test_batch_exception_propagates PASSED
TestNumpyInput::test_numpy_array_triggers_deprecation_warning PASSED
TestNumpyInput::test_numpy_array_still_works PASSED
TestDetections::test_query_detections_parses_json_response PASSED
TestDetections::test_query_detections_empty_response PASSED
TestDetections::test_query_detections_malformed_json PASSED
TestDetections::test_caption_uses_query PASSED
TestFactory::test_factory_creates_litellm_model PASSED
TestFactory::test_litellm_in_vlmodel_name_type PASSED
TestImportGuard::test_import_error_without_litellm PASSED
=================== 27 passed, 1 skipped in 0.64s ===================

Live E2E (Anthropic claude-sonnet-4-6 via Azure Foundry):

Querying model: claude-sonnet-4-6
Response: 'Black'

Testing query_detections...
Detections: 0 found

Testing caption...
Caption: 'The image shows a black silhouette of what appears to be a person
or figure, set against a dark background.'

SUCCESS - all live E2E tests passed

Usage examples:

from dimos.models.vl.litellm import LiteLLMVlModel
from dimos.msgs.sensor_msgs.Image import Image

# Use any provider via LiteLLM's model format: "provider/model-name"
model = LiteLLMVlModel(model_name="anthropic/claude-sonnet-4-20250514")
# export ANTHROPIC_API_KEY=...

image = Image.from_file("path/to/image.jpg").to_rgb()
response = model.query(image, "What do you see?")

# Google Gemini
model = LiteLLMVlModel(model_name="gemini/gemini-2.5-flash")
# export GEMINI_API_KEY=...

# Azure OpenAI
model = LiteLLMVlModel(model_name="azure/gpt-4o", api_base="https://my-resource.openai.azure.com")
# export AZURE_API_KEY=...

# Works with all VlModel methods: query_batch, query_detections, query_points, caption
detections = model.query_detections(image, "person")
caption = model.caption(image)

# Factory also works
from dimos.models.vl.create import create
model = create("litellm")

Contributor License Agreement

  • I have read and approved the CLA.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jun 2, 2026

Greptile Summary

This PR adds LiteLLMVlModel, a new VlModel provider that routes vision-language queries through LiteLLM's unified completion() interface, giving callers access to Anthropic Claude, Google Gemini, Azure OpenAI, AWS Bedrock, and 100+ other providers without writing new provider code. It also correctly fixes the JPEG/PNG MIME-type mismatch that affects the existing providers.

  • dimos/models/vl/litellm.py — new provider with lazy litellm import, proper image/jpeg data-URI encoding, null-response guard, and deprecation warning for raw numpy input.
  • pyproject.toml — adds litellm optional-dependency group; the current range >=1.80,<1.87 formally permits versions 1.82.7 and 1.82.8, which were part of a confirmed March 2026 PyPI supply-chain attack and should be excluded by raising the lower bound to >=1.83.0.
  • dimos/models/vl/test_litellm.py — 27 unit tests covering dispatch, credentials, batching, null responses, error propagation, numpy input, detection parsing, factory, and import guard.

Confidence Score: 4/5

Safe to merge after raising the litellm lower bound past the compromised 1.82.x builds; the new provider code itself is well-structured and additive.

The only blocking concern is in pyproject.toml: the declared range >=1.80,<1.87 formally allows litellm 1.82.7 and 1.82.8, which were confirmed malicious PyPI releases (supply-chain attack, March 2026). Those versions are yanked, so fresh installs are safe, but environments with cached wheels or corporate artifact proxies that captured the packages before quarantine could still resolve to them. Bumping the lower bound to >=1.83.0 closes this gap. Everything else — the provider implementation, MIME-type fix, lazy import, factory wiring, and test suite — looks correct and consistent with the existing codebase.

pyproject.toml — the litellm version range needs its lower bound updated to exclude compromised builds.

Security Review

  • Supply-chain risk in pyproject.toml: The declared range litellm>=1.80,<1.87 includes versions 1.82.7 and 1.82.8, which contained malicious code that executed at import time and exfiltrated cloud credentials, SSH keys, and Kubernetes secrets (PyPI supply-chain incident, March 24, 2026). Those versions were yanked from PyPI within ~40 minutes, so a standard pip install will not resolve to them, but the range still formally permits them — environments using a corporate artifact proxy or a warmed pip cache that captured those packages before quarantine remain at risk. The lower bound should be raised to >=1.83.0.

Important Files Changed

Filename Overview
pyproject.toml Adds litellm optional-dependency group; the range >=1.80,<1.87 includes the supply-chain-compromised versions 1.82.7 and 1.82.8 — lower bound should be >=1.83.0.
dimos/models/vl/litellm.py New LiteLLMVlModel provider following existing patterns; lazy-imports litellm, uses correct image/jpeg MIME type, and handles null responses gracefully.
dimos/models/vl/test_litellm.py 27 unit tests with broad coverage; credential-forwarding tests mock _completion rather than litellm.completion so they cannot detect regressions inside _completion itself (already flagged in prior review thread).
dimos/models/vl/create.py Adds "litellm" case to the factory with a lazy inline import; no issues.
dimos/models/vl/types.py Adds "litellm" to the VlModelName Literal; straightforward and correct.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant LiteLLMVlModel
    participant _completion
    participant litellm

    Caller->>LiteLLMVlModel: query(image, prompt)
    LiteLLMVlModel->>LiteLLMVlModel: _prepare_image(image)
    LiteLLMVlModel->>LiteLLMVlModel: image.to_base64() → JPEG data URI
    LiteLLMVlModel->>_completion: model, messages, [response_format]
    _completion->>_completion: inject api_key / api_base from config
    _completion->>litellm: "completion(drop_params=True, **kwargs)"
    litellm-->>_completion: ModelResponse
    _completion-->>LiteLLMVlModel: ModelResponse
    LiteLLMVlModel-->>Caller: choices[0].message.content or ""

    Caller->>LiteLLMVlModel: query_batch(images, prompt)
    LiteLLMVlModel->>_completion: model, messages (all images in one call)
    _completion->>litellm: "completion(drop_params=True, **kwargs)"
    litellm-->>_completion: ModelResponse
    _completion-->>LiteLLMVlModel: single response_text
    LiteLLMVlModel-->>Caller: "[response_text] * len(images)"
Loading

Reviews (2): Last reviewed commit: "[autofix.ci] apply automated fixes" | Re-trigger Greptile

Comment on lines +112 to +143
def query_batch(
self,
images: list[Image],
query: str,
response_format: dict[str, Any] | None = None,
**kwargs: Any,
) -> list[str]:
"""Query VLM with multiple images using a single API call."""
if not images:
return []

content: list[dict[str, Any]] = [
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{self._prepare_image(img)[0].to_base64()}"
},
}
for img in images
]
content.append({"type": "text", "text": query})

api_kwargs: dict[str, Any] = {
"model": self.config.model_name,
"messages": [{"role": "user", "content": content}],
}
if response_format:
api_kwargs["response_format"] = response_format

response = self._completion(**api_kwargs)
response_text = response.choices[0].message.content or ""
return [response_text] * len(images)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 query_batch sends all images in one call regardless of provider support

query_batch packs every image into a single API message and returns [response_text] * len(images). This means (a) providers that don't support multi-image inputs (many Bedrock/Vertex models available through LiteLLM) will throw an exception for any multi-image call that would otherwise succeed per-image, and (b) callers expecting per-image responses always receive the same combined description repeated — silently wrong data.

The base-class fallback (query() per image) is both safer and correct for the per-image contract. The QwenVlModel shares the same design, but LiteLLMVlModel targets a far broader provider surface where single-message multi-image is not universal.

Comment on lines +106 to +110
if response_format:
api_kwargs["response_format"] = response_format

response = self._completion(**api_kwargs)
return response.choices[0].message.content or ""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 **kwargs from query() and query_batch() are silently dropped

Both methods accept **kwargs but never forward them to _completion. Any extra kwargs a caller passes (temperature, max_tokens, stream, provider-specific flags) are silently discarded. Adding api_kwargs.update(kwargs) before the _completion call preserves caller intent and matches LiteLLM's flexibility.

Suggested change
if response_format:
api_kwargs["response_format"] = response_format
response = self._completion(**api_kwargs)
return response.choices[0].message.content or ""
if response_format:
api_kwargs["response_format"] = response_format
api_kwargs.update(kwargs)
response = self._completion(**api_kwargs)
return response.choices[0].message.content or ""

Comment on lines +103 to +115
model = LiteLLMVlModel(model_name="gpt-4o-mini", api_key="sk-test")
with patch(_COMPLETION_PATH, return_value=_resp()) as mock:
model.query(_img(), "hi")
assert mock.call_args.kwargs.get("api_key") is None # forwarded inside _completion

def test_api_base_forwarded_when_set(self) -> None:
model = LiteLLMVlModel(
model_name="azure/gpt-4o",
api_base="https://my-resource.openai.azure.com",
)
with patch(_COMPLETION_PATH, return_value=_resp()) as mock:
model.query(_img(), "hi")
assert mock.call_args.kwargs.get("api_base") is None # forwarded inside _completion
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Credential forwarding tests assert the wrong thing

Both test_api_key_forwarded_when_set and test_api_base_forwarded_when_set assert mock.call_args.kwargs.get("api_key") is None and get("api_base") is None. Since the mock targets _completion, the query() method's api_kwargs never include those fields (they are injected inside the real _completion). The assertion is trivially true regardless of whether credentials are configured — a regression in _completion's credential injection would pass these tests undetected.

Comment thread pyproject.toml
"soundfile",
]

litellm = [
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add litellm to all = [ below.

Comment on lines +62 to +63
if self.config.api_key:
kwargs["api_key"] = self.config.api_key
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does LiteLLM work without an api_key? I assume not. So instead of conditionally adding api_key to kwargs, it should throw an error if api_key is missing.

Comment thread pyproject.toml
Comment on lines +222 to +225
litellm = [
"litellm>=1.80,<1.87",
]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 security Version range allows compromised litellm builds

The constraint litellm>=1.80,<1.87 includes versions 1.82.7 and 1.82.8, which were published to PyPI on March 24, 2026 as part of a confirmed supply-chain attack. Those builds embedded code that exfiltrated cloud credentials, SSH keys, and Kubernetes secrets at import time. PyPI quarantined the packages roughly 40 minutes after publication, so a fresh pip install will not resolve to them — but the constraint still formally permits those versions, meaning any environment that holds a local cache of those wheels (common in CI layer caches or corporate artifact proxies) could install them.

Raising the lower bound to >=1.83.0 (the first clean release after the incident) removes the compromised range entirely from the allowed set.

Suggested change
litellm = [
"litellm>=1.80,<1.87",
]
litellm = [
"litellm>=1.83.0,<1.87",
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants