Skip to content

Why are register tokens disabled during pre-training but enabled during fine-tuning? #17

@einnullnull

Description

@einnullnull

Hi, and thank you for open-sourcing this excellent project!
I noticed that num_registers = 4 is used only during fine-tuning. Was this setting shown to be necessary—i.e., does removing registers cause a clear performance drop—or was it chosen mainly for convenience?
Have you run an ablation without registers in the fine-tuning stage?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions