Question about visual prompt

In `trainers/maple.py` line 195, the code gets image features with the help of visual prompt:
```
image_features = self.image_encoder(image.type(self.dtype), shared_ctx, deep_compound_prompts_vision)
```
However, in line 184, the image_encoder is just the visual part of the standard clip. Its forward process does not support visual prompt as extra input:
```
self.image_encoder = clip_model.visual
```
It seems that a new VisualEncoder should be defined to support the code in line 195, just like the TextEncoder defined in line 43. But I can not find any relevant code. Could someone please answer my confusion? If the original code is correct, how does it enable the visual encoder to support visual prompt as extra input?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about visual prompt #90

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question about visual prompt #90

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions