Skip to content

Conversation

mattf
Copy link
Collaborator

@mattf mattf commented Oct 3, 2025

What does this PR do?

  • implement get_api_key instead of relying on LiteLLMOpenAIMixin.get_api_key
  • remove use of LiteLLMOpenAIMixin
  • add default initialize/shutdown methods to OpenAIMixin
  • remove __init__s to allow proper pydantic construction
  • remove dead code from vllm adapter and associated / duplicate unit tests
  • update vllm adapter to use openaimixin for model registration
  • remove ModelRegistryHelper from fireworks & together adapters
  • remove Inference from nvidia adapter
  • complete type hints on embedding_model_metadata
  • allow extra fields on OpenAIMixin, for model_store, provider_id, etc
  • new recordings for ollama
  • enhance the list models error handling
  • update cerebras (remove cerebras-cloud-sdk) and anthropic (custom model listing) inference adapters
  • parametrized test_inference_client_caching
  • remove cerebras, databricks, fireworks, together from blanket mypy exclude
  • removed unnecessary litellm deps

Test Plan

ci

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 3, 2025
@mattf mattf force-pushed the make-openaimix-pydantic branch 4 times, most recently from 241e77c to dc97160 Compare October 5, 2025 10:05
…c.BaseModel

 - implement get_api_key instead of relying on LiteLLMOpenAIMixin.get_api_key
 - remove use of LiteLLMOpenAIMixin
 - add default initialize/shutdown methods to OpenAIMixin
 - remove __init__s to allow proper pydantic construction
 - remove dead code from vllm adapter and associated / duplicate unit tests
 - update vllm adapter to use openaimixin for model registration
 - remove ModelRegistryHelper from fireworks & together adapters
 - remove Inference from nvidia adapter
 - complete type hints on embedding_model_metadata
 - allow extra fields on OpenAIMixin, for model_store, __provider_id__, etc
 - new recordings for ollama
 - enhance the list models error handling w/ new tests
 - update cerebras (remove cerebras-cloud-sdk) and anthropic (custom model listing) inference adapters
 - parametrized test_inference_client_caching
 - remove cerebras, databricks, fireworks, together from blanket mypy exclude
@mattf
Copy link
Collaborator Author

mattf commented Oct 6, 2025

this provides a foundation for #3517

Copy link
Collaborator

@franciscojavierarceo franciscojavierarceo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this lgtm as you answered the open questions I had in the PR description but i'll wait for someone else who has spent more time on inference to do another pass. but it looks a rebase is needed.

Copy link
Collaborator

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing 2 files (vllm.py and openai_mixin.py,)

adapter_type="anthropic",
provider_type="remote::anthropic",
pip_packages=["litellm"],
pip_packages=["litellm", "anthropic"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need litellm here? I only see OpenAIMixin being used?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. i'll remove all the others too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

from urllib.parse import urljoin

from llama_stack.apis.inference import ChatCompletionRequest
from llama_stack.providers.utils.inference.litellm_openai_mixin import (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove litellm from RemoteProviderSpec in the registry?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

)
api_token: SecretStr = Field(
default=SecretStr(None),
default=SecretStr(None), # type: ignore[arg-type]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not set default=SecretStr("") and avoid ignore type comment? (yes I know, that same discussion again)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i hope to handle this later. i removed the blanket exclude so we can see the specific instances.

)
api_key: SecretStr = Field(
default=SecretStr(os.environ.get("CEREBRAS_API_KEY")),
default=SecretStr(os.environ.get("CEREBRAS_API_KEY")), # type: ignore[arg-type]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
default=SecretStr(os.environ.get("CEREBRAS_API_KEY")), # type: ignore[arg-type]
default_factory=lambda: SecretStr(os.getenv("CEREBRAS_API_KEY", ""))

And mypy will be happy

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i removed the blanket excludes. i want to avoid rolling more things into this PR.

'Pass Fireworks API Key in the header X-LlamaStack-Provider-Data as { "fireworks_api_key": <your api key>}'
)
return provider_data.fireworks_api_key
return self.config.api_key.get_secret_value() if self.config.api_key else None # type: ignore[return-value]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need to type ignore comment?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i removed the blanket excludes to expose specific places that need improvement

response.usage = OpenAIEmbeddingUsage(prompt_tokens=-1, total_tokens=-1)

return response
return response # type: ignore[no-any-return]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you leave a comment about why we are ignoring?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i removed the blanket exclude

@mattf
Copy link
Collaborator Author

mattf commented Oct 6, 2025

missing 2 files (vllm.py and openai_mixin.py,)

@leseb what do you mean?

@leseb
Copy link
Collaborator

leseb commented Oct 6, 2025

missing 2 files (vllm.py and openai_mixin.py,)

@leseb what do you mean?

Sorry, I typed too fast, I meant that I still haven't gone through those two files :)

Copy link
Collaborator

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HUGE! Thanks for this one!

A few questions:

  • can we remove llama_stack/providers/utils/inference/litellm_openai_mixin.py now (along with test files)?
  • can we remove litellm from pyproject after that too?

@mattf
Copy link
Collaborator Author

mattf commented Oct 6, 2025

HUGE! Thanks for this one!

A few questions:

* can we remove llama_stack/providers/utils/inference/litellm_openai_mixin.py now (along with test files)?

* can we remove litellm from pyproject after that too?

unfortunately we cannot. the watsonx adapter is being updated and will rely on LiteLLMOpenAIMixin because watsonx does not provide an openai-compat endpoint.

@mattf mattf merged commit d23ed26 into llamastack:main Oct 6, 2025
45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants