Skip to content

fix: handle 3D KV tensors in prefix cache#286

Open
Thump604 wants to merge 1 commit intowaybarrios:mainfrom
Thump604:codex/pr144-restack
Open

fix: handle 3D KV tensors in prefix cache#286
Thump604 wants to merge 1 commit intowaybarrios:mainfrom
Thump604:codex/pr144-restack

Conversation

@Thump604
Copy link
Copy Markdown
Collaborator

Restack of #144 onto current main.

What changed:

  • keep the existing 4D block-concat behavior intact
  • add the missing 3D KV path for Qwen3.5-style (heads, seq, dim) caches
  • store/use the correct sequence axis in block metadata
  • fix reconstructed offset accounting for 3D states
  • add a regression test that reconstructs a partial prefix from a 3D KV cache

Validation:

  • PYTHONPATH=/Users/David/code/vllm-mlx-pr144-restack /opt/ai-runtime/venv-live/bin/python -m pytest /Users/David/code/vllm-mlx-pr144-restack/tests/test_paged_cache.py -q
  • PYTHONPATH=/Users/David/code/vllm-mlx-pr144-restack /opt/ai-runtime/venv-live/bin/python -m py_compile /Users/David/code/vllm-mlx-pr144-restack/vllm_mlx/prefix_cache.py /Users/David/code/vllm-mlx-pr144-restack/tests/test_paged_cache.py
  • PYTHONPATH=/Users/David/code/vllm-mlx-pr144-restack /opt/ai-runtime/venv-live/bin/python -m black --check --fast /Users/David/code/vllm-mlx-pr144-restack/vllm_mlx/prefix_cache.py /Users/David/code/vllm-mlx-pr144-restack/tests/test_paged_cache.py

Fixes #142. Supersedes #144.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prefix cache tensor slicing fails for Qwen3.5 MoE models (3D KV cache)

1 participant