Bug in Bottleneck when using the adapter interface for multi-GPU training of custom models

```
layer.output_adapters = BottleneckLayer("output_adapter", is_layer_hooked=True)
ln_2_get_fn = lambda: multigetattr(layer, model.adapter_interface.layer_ln_2, None)
layer_output_proj.register_forward_hook(partial(hook_fn, layer.output_adapters, ln_2_get_fn))
```
This code causes the `layer.output_adapters` of cuda:n to always point to the `layer.output_adapters` of cuda 0 during multi-GPU training with the default distributed settings of the Huggingface trainer. The model can be properly distributed to different GPUs. I suspect it is due to `partial`. So I tried to save variables like `layer.xxx` and `layer` in the context so that it can run on multiple GPUs.

Variables like `residual` and `hidden state` are both shown to be on `cuda1` during debugging, but `layer` is shown to be on `cuda0`. I printed the addresses of the `layer` variable on two GPUs. The address of `layer` on `cuda:1` is the same as that on `cuda:0`. Since my GPU can't handle models like Qwen, and it's not easy to provide data for my own model, could you please test whether this problem occurs in multi-GPU training? Thank you! I followed the process of [adapters-for-any-transformer](https://adapterhub.ml/blog/2025/05/adapters-for-any-transformer/).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in Bottleneck when using the adapter interface for multi-GPU training of custom models #822

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug in Bottleneck when using the adapter interface for multi-GPU training of custom models #822

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions