Skip to content

get_decoder feature regression in 4.56.0Β #40815

@KyleMylonakisProtopia

Description

@KyleMylonakisProtopia

System Info

In the release of transformers v4.56.0, this PR #39509 introduced a refactor of the public get_decoder method which previously existed on modes by moving it to the PreTrainedModel class.

Unfortunately this introduced a significant behavior change in that *CausalForLM models no longer have the same behavior of having get_decoder() return the underlying base model.

For example a MistralForCausalLM model named model returns None when model.get_decoder() is called.

The logic for why is occurring is obvious when looking at the offending PR:

def get_decoder(self):
        """
        Best-effort lookup of the *decoder* module.
        Order of attempts (covers ~85 % of current usages):
        1. `self.decoder`
        2. `self.model`                       (many wrappers store the decoder here)
        3. `self.model.get_decoder()`         (nested wrappers)
        4. fallback: raise for the few exotic models that need a bespoke rule
        """
        if hasattr(self, "decoder"):
            return self.decoder

        if hasattr(self, "model"):
            inner = self.model
            if hasattr(inner, "get_decoder"):
                return inner.get_decoder()
            return inner

        return None

In these cases the if hasattr(self, "model"): conditional block is entered, and the underlying model has a get_decoder method, as it is a PreTrainedModel, as all transformers models are. This block will always be entered. At this point we are now in the decoder itself calling its get_decoder method. The decoder has no decoder or model attribute, so the function returns None, which is the passed to the parent caller.

There are a couple of ways this could be fixed, but I don't know what their current impact would be on other parts of the code. I may open a PR, but I am quite busy at the moment. @molbap @ArthurZucker since you were the authors and reviewers here, do you mind taking another look at this?

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Use get_decoder on say a MistralForCausalLM model.

Expected behavior

The underlying model attribute should be returned for *ForCausalLM models, not None, as these models are decoder only models by transformers convention.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions