fix: handle 3D KV tensors in prefix cache by Thump604 · Pull Request #286 · waybarrios/vllm-mlx

Thump604 · 2026-04-11T16:40:31Z

Restack of #144 onto current main.

What changed:

Validation:

PYTHONPATH=/Users/David/code/vllm-mlx-pr144-restack /opt/ai-runtime/venv-live/bin/python -m pytest /Users/David/code/vllm-mlx-pr144-restack/tests/test_paged_cache.py -q
PYTHONPATH=/Users/David/code/vllm-mlx-pr144-restack /opt/ai-runtime/venv-live/bin/python -m py_compile /Users/David/code/vllm-mlx-pr144-restack/vllm_mlx/prefix_cache.py /Users/David/code/vllm-mlx-pr144-restack/tests/test_paged_cache.py
PYTHONPATH=/Users/David/code/vllm-mlx-pr144-restack /opt/ai-runtime/venv-live/bin/python -m black --check --fast /Users/David/code/vllm-mlx-pr144-restack/vllm_mlx/prefix_cache.py /Users/David/code/vllm-mlx-pr144-restack/tests/test_paged_cache.py

Fixes #142. Supersedes #144.

fix: handle 3D KV tensors in prefix cache

66acddb

Thump604 mentioned this pull request Apr 11, 2026

fix: handle 3D KV tensors in prefix cache for Qwen3.5 models #144

Closed

Provide feedback