-
Notifications
You must be signed in to change notification settings - Fork 30.4k
Description
System Info
In the release of transformers v4.56.0, this PR #39509 introduced a refactor of the public get_decoder
method which previously existed on modes by moving it to the PreTrainedModel class.
Unfortunately this introduced a significant behavior change in that *CausalForLM
models no longer have the same behavior of having get_decoder()
return the underlying base model.
For example a MistralForCausalLM
model named model
returns None
when model.get_decoder()
is called.
The logic for why is occurring is obvious when looking at the offending PR:
def get_decoder(self):
"""
Best-effort lookup of the *decoder* module.
Order of attempts (covers ~85 % of current usages):
1. `self.decoder`
2. `self.model` (many wrappers store the decoder here)
3. `self.model.get_decoder()` (nested wrappers)
4. fallback: raise for the few exotic models that need a bespoke rule
"""
if hasattr(self, "decoder"):
return self.decoder
if hasattr(self, "model"):
inner = self.model
if hasattr(inner, "get_decoder"):
return inner.get_decoder()
return inner
return None
In these cases the if hasattr(self, "model"):
conditional block is entered, and the underlying model has a get_decoder
method, as it is a PreTrainedModel
, as all transformers models are. This block will always be entered. At this point we are now in the decoder itself calling its get_decoder
method. The decoder has no decoder or model attribute, so the function returns None
, which is the passed to the parent caller.
There are a couple of ways this could be fixed, but I don't know what their current impact would be on other parts of the code. I may open a PR, but I am quite busy at the moment. @molbap @ArthurZucker since you were the authors and reviewers here, do you mind taking another look at this?
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Use get_decoder
on say a MistralForCausalLM
model.
Expected behavior
The underlying model
attribute should be returned for *ForCausalLM
models, not None, as these models are decoder only models by transformers convention.