Skip to content

Question about visual prompt #90

@nightrain-vampire

Description

@nightrain-vampire

In trainers/maple.py line 195, the code gets image features with the help of visual prompt:

image_features = self.image_encoder(image.type(self.dtype), shared_ctx, deep_compound_prompts_vision)

However, in line 184, the image_encoder is just the visual part of the standard clip. Its forward process does not support visual prompt as extra input:

self.image_encoder = clip_model.visual

It seems that a new VisualEncoder should be defined to support the code in line 195, just like the TextEncoder defined in line 43. But I can not find any relevant code. Could someone please answer my confusion? If the original code is correct, how does it enable the visual encoder to support visual prompt as extra input?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions