Fix Prefix Finetuning for Group Query Attention (GQA) (#825)

ha405 · web-flow · commit e10ebf08cd40 · 2025-10-12T10:30:55.000+02:00
Resolves: #819 ## Problem PrefixTuning currently fails on modern architectures with **Grouped-Query Attention (GQA)** (e.g., Llama 3.1), raising shape mismatches in the attention forward pass. Issues: - Assumes `num_attention_heads` = KV heads (invalid for GQA). - Computes per-head dim by dividing `hidden_size` by KV heads, which only works for standard MHA. ## Solution This PR updates `PrefixTuningLayer.add_adapter` to properly support GQA and similar mechanisms: - **Correct head count:** Prefer `config.num_key_value_heads` when available, fallback to `num_attention_heads`. - **Robust per-head dim:** - Use `config.d_kv` if defined (e.g., T5). - Else compute as `hidden_size // num_attention_heads`. This ensures prefix tensors align with internal KV states across **MHA, GQA, and MQA**, fixing Llama 3.1 while preserving compatibility with existing models.
diff --git a/src/adapters/methods/prefix_tuning.py b/src/adapters/methods/prefix_tuning.py
@@ -362,12 +362,17 @@ def add_adapter(self, adapter_name: str, layer_idx: int) -> bool:
             location_key=used_location_key,
         )
         if prefix_tuning_config is not None:
+            num_kv_heads = getattr(self.model_config, "num_key_value_heads", self.model_config.num_attention_heads)
+            head_dim = getattr(self.model_config, "d_kv", None)
+
+            if head_dim is None:
+                head_dim = self.model_config.hidden_size // self.model_config.num_attention_heads
             prefix_id = self.pool.indicate_prefix(
                 adapter_name,
                 self.location_key,
-                n_heads=self.model_config.num_attention_heads,
+                n_heads=num_kv_heads,
                 input_size=self.model_config.hidden_size,
-                n_embd_per_head=getattr(self.model_config, "d_kv", None),  # this is currently specific to T5-3B
+                n_embd_per_head=head_dim,
             )
             self.prefixes[adapter_name] = prefix_id