-
Notifications
You must be signed in to change notification settings - Fork 67
Open
Description
In trainers/maple.py line 195, the code gets image features with the help of visual prompt:
image_features = self.image_encoder(image.type(self.dtype), shared_ctx, deep_compound_prompts_vision)
However, in line 184, the image_encoder is just the visual part of the standard clip. Its forward process does not support visual prompt as extra input:
self.image_encoder = clip_model.visual
It seems that a new VisualEncoder should be defined to support the code in line 195, just like the TextEncoder defined in line 43. But I can not find any relevant code. Could someone please answer my confusion? If the original code is correct, how does it enable the visual encoder to support visual prompt as extra input?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels