-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
Outline & Motivation
The BackboneFinetuning callback exposes a train_bn parameter intended to control whether BatchNorm layers are trainable during backbone finetuning. However, the current implementation only applies this parameter during the unfreezing phase, not during the initial frozen phase.
In freeze_before_training(), the callback calls:
self.freeze(pl_module.backbone)which uses the default train_bn=True. As a result, BatchNorm layers remain trainable during the frozen stage, regardless of the train_bn value passed to the callback.
This leads to a somewhat counter-intuitive behavior if train_bn=False:
- It does not freeze BN during the frozen phase.
- it freezes BN when the backbone is unfrozen.
So the meaning of the parameter becomes:
“Train BN while the backbone is frozen, and optionally freeze it once the backbone is unfrozen.”
This is not what the parameter name suggests, and is rarely the intended finetuning strategy.
| Phase | train_bn=True | train_bn=False |
|---|---|---|
| Frozen phase | Backbone: frozen BN: trainable |
Backbone: frozen BN: trainable |
| After unfreeze | Backbone: trainable BN: trainable |
Backbone: trainable BN: frozen |
Pitch
To keep current behavior available while making BN handling explicit and predictable:
- Deprecate the existing
train_bnparameter. - Introduce two new parameters:
train_bn_frozen_phase: controls whether BatchNorm layers are trainable while the backbone is frozen.train_bn_unfrozen_phase: controls whether BatchNorm layers are trainable after the backbone is unfrozen.
- Set the default values to match the current behavior:
train_bn_frozen_phase=Truetrain_bn_unfrozen_phase=True
- Keep the old
train_bnparameter for one deprecation cycle, mapping it internally totrain_bn_unfrozen_phase=train_bn - Emit a deprecation warning when
train_bnis used, directing users to the new parameters. - Remove
train_bnin a future major release once the transition period is over.
Happy to discuss any other directions / improvements you have in mind.
Additional context
I’m happy to open a PR if the direction makes sense.