Hi,
I noticed that freeze_vision_tower is set to False in the current implementation. In most multimodal training frameworks (like LLaVA), this parameter is typically set to True by default to keep the pre-trained visual weights intact.
Is this an intentional design choice for this project? I'd appreciate it if you could clarify the rationale behind training the vision tower here. Thanks!
@zli12321 @wyu97
Hi,
I noticed that freeze_vision_tower is set to False in the current implementation. In most multimodal training frameworks (like LLaVA), this parameter is typically set to True by default to keep the pre-trained visual weights intact.
Is this an intentional design choice for this project? I'd appreciate it if you could clarify the rationale behind training the vision tower here. Thanks!
@zli12321 @wyu97