Skip to content

Remove mamba-ssm package #22409

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Remove mamba-ssm package #22409

wants to merge 2 commits into from

Conversation

tlrmchlsmth
Copy link
Collaborator

@tlrmchlsmth tlrmchlsmth commented Aug 6, 2025

Purpose

The packages mamba-ssm and causal-conv-1d are not used by vLLM but are used to compare against the huggingface transformer baselines in tests.

In particular, those packages are need in order to run PLaMo2 at all. The model definition for PLaMo2 is at https://huggingface.co/pfnet/plamo-2-1b/blob/main/modeling_plamo.py, rather than being in huggingface transformers.

This PR removes the packages to avoid pain points when upgrading PyTorch or CUDA.

Test Plan

Test Result

Signed-off-by: Tyler Michael Smith <[email protected]>
@tlrmchlsmth tlrmchlsmth added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 6, 2025
@mergify mergify bot added documentation Improvements or additions to documentation ci/build labels Aug 6, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request removes the mamba-ssm package and its related causal-conv-1d dependency, which were only used for testing the PLaMo2 model against its Hugging Face implementation. The changes correctly remove the dependencies from requirements/test.in, the Dockerfile, and documentation.

However, I've identified a critical issue in tests/models/language/generation/test_hybrid.py where the PLaMo2 model (pfnet/plamo-2-1b) is completely removed from the test suite, instead of just skipping the comparison with the Hugging Face baseline. I've provided a suggestion to fix this to ensure the model remains tested by vLLM.

Copy link

github-actions bot commented Aug 6, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Tyler Michael Smith <[email protected]>
@tlrmchlsmth tlrmchlsmth mentioned this pull request Aug 6, 2025
10 tasks
@DarkLight1337
Copy link
Member

DarkLight1337 commented Aug 7, 2025

Some other tests still require mamba-ssm. It's just that causal-conv1d is not required

@tlrmchlsmth
Copy link
Collaborator Author

Some other tests still require mamba-ssm. It's just that causal-conv1d is not required

I see, you're right - there are both numeric issues and also seeing this:

[2025-08-07T01:50:59Z]         if self.use_fast_kernels:
--
  | [2025-08-07T01:50:59Z]             if not is_fast_path_available or "cuda" not in self.x_proj.weight.device.type:
  | [2025-08-07T01:50:59Z] >               raise ValueError(
  | [2025-08-07T01:50:59Z]                     "Fast Mamba kernels are not available. Make sure to they are installed and that the mamba module is on a CUDA device"
  | [2025-08-07T01:50:59Z]                 )
  | [2025-08-07T01:50:59Z] E               ValueError: Fast Mamba kernels are not available. Make sure to they are installed and that the mamba module is on a CUDA device
  | [2025-08-07T01:50:59Z]
  | [2025-08-07T01:50:59Z] /usr/local/lib/python3.12/dist-packages/transformers/models/jamba/modeling_jamba.py:826: ValueError

@Alnusjaponica
Copy link
Contributor

As mentioned earlier #20047 (comment), some tests run pfnet/palmo-2-1b inference with transformers implementation, which implicitly depends on both mamba-ssm and causal-conv1d. Sorry for the inconvenience.
Maybe we can avoid this dependency by hardcoding the inference results. Is it acceptable for the unit tests?

@tdoublep
Copy link
Member

tdoublep commented Aug 8, 2025

Can't we just remove these as vLLM dependencies but install them inside the test image? This is what we are doing now for the causal_conv_1d package.

Related topic: the hybrid tests are completely messed up right now due to some CUDA version mismatch issue. This would also be solved if we install mamba_ssm inside the test container but in the "right way" (--no-build-isolation).

@tdoublep
Copy link
Member

tdoublep commented Aug 8, 2025

This is what I propose as a different solution to this problem:
#22541

@DarkLight1337
Copy link
Member

Closing in favor of #22541

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants