Hi, and thank you for open-sourcing this excellent project!
I noticed that num_registers = 4 is used only during fine-tuning. Was this setting shown to be necessary—i.e., does removing registers cause a clear performance drop—or was it chosen mainly for convenience?
Have you run an ablation without registers in the fine-tuning stage?
Hi, and thank you for open-sourcing this excellent project!
I noticed that
num_registers = 4is used only during fine-tuning. Was this setting shown to be necessary—i.e., does removing registers cause a clear performance drop—or was it chosen mainly for convenience?Have you run an ablation without registers in the fine-tuning stage?