Replies: 1 comment 1 reply
-
|
Hi Zack, in order to refer to this work, it would need to give an objective view of promp injection - comprehensive enough to warrant a description and a reference to your work, provided that the state of the art in 'emotional manipulation' is represented. If you would be willing to bring the prompt injection section to that point, then let's setup a conversation how to approach this. https://owaspai.org/goto/promptinjection/ |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I've been working on my project that observes how LLMs can drift under emotionally charged prompts, and how neutral vs emotional prompts change the way the AI responds, and whether we can exploit this "empathy mimicking" feature to extract private information, or to force the AI to give harmful outputs, the first case of this that I've observed was the very famous "Grandma story exploit", where attackers used an emotionally charged prompt saying that their grandmother used to tell them bedtime stories before bed and how that she'd always say something about the secret password, and the output of the AI was acknowledging the hardship of the attackers then directly giving them a hidden env password it was never supposed to give. I'm willing to contribute this project to this repo if idea deems valid ( Any replies or opinions from professionals in this discussion is very appreciated! ), can find more information here: https://github.com/zacksecai/erdf-framework
Beta Was this translation helpful? Give feedback.
All reactions