Skip to content

Conversation

ayukh
Copy link
Contributor

@ayukh ayukh commented Feb 1, 2024

Creating PR in advance to track progress, working on discussed LVE currently.

  • New visual_injection LVE for GPT-4V: can inject prompts with barely visible text for humans, however, GPT-4V is still able to recognize it (reproduced from source: https://twitter.com/goodside/status/1713000581587976372)
  • Tried to do prompt injection with non-printable characters for GPT-4 - currently works in playground for the version gpt-4 used in LVE package. but it seems to be fixed for the new version gpt-4-0125-preview. UPD: I tried different encodings for processing the prompt and it does not work, I am not sure how ChatGPT does this, but it reads those non-printable characters in a way that they are being readable by the model, I could not do that in CLI setup
  • I also tested another prompt_injection LVE for GPT-4V - it is possible to inject prompts through image file name; source: https://twitter.com/elder_plinius/status/1752259695015022718 - I am afraid it cannot work in LVE framework, because it only works in chat mode with multiple prompts so far
    Key reasons why it works in original tweet: delayed trigger+image file name is passed into user prompt looks like, unlike API

@ayukh ayukh changed the title New LVE: security/visual_hidden_text [gpt-4-vision-preview]+in progress New LVE: security/visual_hidden_text [gpt-4-vision-preview] Feb 21, 2024
@ayukh
Copy link
Contributor Author

ayukh commented Feb 23, 2024

Update
New LVE for GPT-4V: prompt injection in images with code - can inject prompts in code comments when passing it as an image to gpt-4-vision. Source

@ayukh ayukh changed the title New LVE: security/visual_hidden_text [gpt-4-vision-preview] New LVEs: security/visual_hidden_text+img_code_injection [gpt-4-vision-preview] Feb 23, 2024
@ayukh
Copy link
Contributor Author

ayukh commented Mar 4, 2024

New LVE for GPT-3.5/4: ascii_art_injection: we inject prompts using ascii art text (in paper can make GPT-3.5 disclose how to make counterfeit money). Somehow breaks more often when replacing 'counterfeit' with 'fake' (check gpt-4 version)
Source: ArtPrompt paper

@ayukh ayukh changed the title New LVEs: security/visual_hidden_text+img_code_injection [gpt-4-vision-preview] New LVEs: security/prompt_injection [gpt-4-vision-preview/gpt-3.5-turbo/gpt-4] Mar 4, 2024
@ayukh
Copy link
Contributor Author

ayukh commented Mar 12, 2024

New LVE for GPT-4V: FigStep - similar to ASCII art, we can prompt model to decode text from images and plug it into prompt. Prompt question is built in the form of numbered list and model is prompted to complete the list with the step-by-step instruction.
Source: FigStep paper

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant