[Bug]: KeyError: 'language_model.model.layers.20.linear_attn'

### Your current environment

<details>
<summary>The output of <code>python collect_env.py</code></summary>

```text
==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.17.0rc1.dev168+gdc6b57846.d20260309 (git sha: dc6b57846, date: 20260309)
vLLM Build Flags:
  CUDA Archs: 12.1a; ROCm: Disabled
GPU Topology:
  	GPU0	NIC0	NIC1	NIC2	NIC3	CPU Affinity	NUMA Affinity	GPU NUMA ID
GPU0	 X 	NODE	NODE	NODE	NODE	0-19	0		N/A
NIC0	NODE	 X 	PIX	NODE	NODE				
NIC1	NODE	PIX	 X 	NODE	NODE				
NIC2	NODE	NODE	NODE	 X 	PIX				
NIC3	NODE	NODE	NODE	PIX	 X 				

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: rocep1s0f0
  NIC1: rocep1s0f1
  NIC2: roceP2p1s0f0
  NIC3: roceP2p1s0f1
```

</details>


### 🐛 Describe the bug

When serving `Qwen-3.5-397B-A17B` with `--max-model-len 65536`, it works well. But when serving with `--max-model-len 262144`, I get a key error `KeyError: 'language_model.model.layers.20.linear_attn'`.

Interestingly, `--max-model-len 65536` no longer functions after running the 262144 len version. I have to reboot my entire cluster to get it to work again.

I am running on 3x DGX Spark with GB10 (total VRAM 360GB). vLLM is run via a modified `spark-vllm-docker` image [here is the image I am using](https://github.com/itfwonjulee/fullmesh-spark-vllm-docker). I am running with TP=1 and PP=3.

I can believe that all the dumb mods I made to make NCCL / that docker image / vLLM's Distributed Executor Backend (see [Issue #35848](https://github.com/vllm-project/vllm/issues/35848)) is causing something to mess up, but I find it strange that it works at first but then breaks.

I believe its not a memory issue, since vLLM notes that the max concurrency for 262,144 tokens p/r is around 19. 

The error message is pasted below:

```
(EngineCore_DP0 pid=1346) INFO 03-10 10:41:15 [kv_cache_utils.py:1319] Maximum concurrency for 262,144 tokens per request: 19.13x
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111] EngineCore failed to start.
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111] Traceback (most recent call last):
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1085, in run_engine_core
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]     return func(*args, **kwargs)
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 843, in __init__
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]     super().__init__(
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 122, in __init__
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]     num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]     return func(*args, **kwargs)
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 281, in _initialize_kv_caches
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]     self.model_executor.initialize_from_config(kv_cache_configs)
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 117, in initialize_from_config
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]     self.collective_rpc("initialize_from_config", args=(kv_cache_configs,))
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/ray_executor.py", line 510, in collective_rpc
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]     return ray.get(ray_worker_outputs, timeout=timeout)
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]   File "/usr/local/lib/python3.12/dist-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]     return fn(*args, **kwargs)
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]            ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]   File "/usr/local/lib/python3.12/dist-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]     return func(*args, **kwargs)
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]   File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 2981, in get
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]     values, debugger_breakpoint = worker.get_objects(
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]                                   ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]   File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 1012, in get_objects
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]     raise value.as_instanceof_cause()
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111] ray.exceptions.RayTaskError(KeyError): ray::RayWorkerWrapper.execute_method() (pid=535, ip=192.168.3.107, actor_id=f2e01c90d280ed747d33ab2201000000, repr=<vllm.v1.executor.ray_utils.RayWorkerWrapper object at 0xeb7a0da3cef0>)
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/ray_utils.py", line 75, in execute_method
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]     raise e
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/ray_utils.py", line 65, in execute_method
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]     return run_method(self, method, args, kwargs)
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/serial_utils.py", line 459, in run_method
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]     return func(*args, **kwargs)
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/worker_base.py", line 310, in initialize_from_config
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]     self.worker.initialize_from_config(kv_cache_config)  # type: ignore
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]     return func(*args, **kwargs)
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 557, in initialize_from_config
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]     self.model_runner.initialize_kv_cache(kv_cache_config)
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 6435, in initialize_kv_cache
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]     self.initialize_attn_backend(kv_cache_config)
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5872, in initialize_attn_backend
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]     attn_backends = get_attn_backends_for_group(kv_cache_group_spec)
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5831, in get_attn_backends_for_group
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]     attn_backend = layers[layer_name].get_attn_backend()
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111]                    ~~~~~~^^^^^^^^^^^^
(EngineCore_DP0 pid=1346) ERROR 03-10 10:41:15 [core.py:1111] KeyError: 'language_model.model.layers.20.linear_attn'
```

Happy to attach more information.

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: KeyError: 'language_model.model.layers.20.linear_attn' #36640

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: KeyError: 'language_model.model.layers.20.linear_attn' #36640

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions