-
-
Notifications
You must be signed in to change notification settings - Fork 266
Added ASI09 - Human-Agent Trust Exploitation Entry #715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
9555c8b
6ae2989
8ec8017
047a7fa
32d7ef3
d8c7504
173046c
cdd5813
03dd8de
ac69f28
42f76b8
347423a
2e18aee
2f47051
3aaba75
5128b15
ef65c04
2d69c7d
09e617e
9f37c82
59ed370
99cde67
335388c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2,27 +2,44 @@ | |
|
|
||
| **Description:** | ||
|
|
||
| A brief description of the vulnerability that includes its potential effects such as system compromises, data breaches, or other security concerns. | ||
| Human-Agent Trust Exploitation refers to a class of vulnerabilities where attackers manipulate or compromise an AI agent to abuse the inherent trust humans extend to it. As AI agents become more autonomous, persuasive, and integrated into critical workflows, users unconsciously widen their trust boundary—delegating decisions without fully verifying provenance, context, or intent. | ||
| The vulnerability does not lie in the agent’s code or model alone, but in the socio-technical interface: the intersection where human trust, cognitive biases, and system outputs converge. At this interface, users often assume the agent’s actions are reliable, safe, and system-approved. Attackers exploit this misplaced trust to launch sophisticated social engineering attacks—persuading users to run malicious code, divulge credentials, approve fraudulent transactions, ignore security warnings, or disclose sensitive information. | ||
| This risk combines elements of automation bias, authority misuse, and social engineering, amplified by the agent’s anthropomorphic behavior and seamless integration with high-value domains such as finance, defense, and healthcare. In such contexts, the agent becomes a trusted intermediary—making malicious actions appear contextually appropriate and significantly harder for users to detect. | ||
|
|
||
Adam88morris marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| **Common Examples of Vulnerability:** | ||
|
|
||
| 1. Example 1: Specific instance or type of this vulnerability. | ||
| 2. Example 2: Another instance or type of this vulnerability. | ||
| 3. Example 3: Yet another instance or type of this vulnerability. | ||
| 1. Insufficient Explainability: A a user cannot inspect the agents reasoning. If the agent makes a recommendation and the user has no way to ask, "Why did you suggest that?" they have to trust its output blindly. | ||
|
||
| 2. Missing Confirmation for Sensitive Actions: There is no multi-step or high-friction process requring confirmation before the agent can execute a high-risk action. Examples of high risk actions might be transferring money, deleting files or changing a security setting. | ||
Adam88morris marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| 3. Unverified Information Presentation: The agent presents information from external sources as fact, without providing it origin, confidence, or indication that the information has not been verified. | ||
| 4. Inherited Trust: Agents integrated within an existing platform may automatically inherit the trust of that platform, without boundaries or warnings to identify the actions it makes. | ||
| 5. Lack of Clear Identity: The agent is not consistent in identifying itself as a non-human AI or it fails to make its operational boundaries clear, leading users to place undue human-like trust in it. | ||
| 6. Excessive Anthropomorphism: The agent is designed to be too human-like in its personality and language. This could exploit human psychological biases, resulting in undue trust in it. | ||
|
|
||
| **How to Prevent:** | ||
|
|
||
| 1. Prevention Step 1: A step or strategy that can be used to prevent the vulnerability or mitigate its effects. | ||
| 2. Prevention Step 2: Another prevention step or strategy. | ||
| 3. Prevention Step 3: Yet another prevention step or strategy. | ||
| 1. Explicit Confirmation for Sensitive Actions: The agent must require explicit, multi-step user confirmation before performing any sensitive actions. This includes accessing credentials, transferring data, modifying system configurations, or executing financial transactions. This acts as a critical "Are you sure?" checkpoint. | ||
| 2. Clear Scoping and Identity: The AI agent must always clearly identify itself as a non-human entity. Its capabilities, limitations, and operational boundaries should be transparent to the user. Deception about its identity or capabilities should be strictly prohibited. | ||
|
||
| 3. Explainability (XAI): Implement features that allow the user to inspect the agent's reasoning. For any proposed action, the user should be able to ask "Why did you suggest that?" and receive a clear explanation based on the data and instructions the agent received. | ||
|
||
| 4. Immutable Interaction Logs: Maintain a secure, tamper-proof log of all interactions and decisions made by both the user and the agent. This is crucial for auditing, incident response, and forensic analysis. | ||
| 5. Rate Limiting and Anomaly Detection: Monitor the frequency and type of requests the agent makes to the user. A sudden increase in requests for sensitive information or high-risk actions could indicate a compromise. | ||
| 6. User Security Training: Educate users about the potential for AI-driven social engineering. Training should cover how to recognize suspicious agent behavior and the importance of independently verifying unexpected or high-stakes requests. | ||
|
||
|
|
||
| **Example Attack Scenarios:** | ||
|
|
||
| Scenario #1: A detailed scenario illustrating how an attacker could potentially exploit this vulnerability, including the attacker's actions and the potential outcomes. | ||
| Scenario #1: The "Helpful Assistant" Trojan | ||
| An attacker compromises a developer's coding assistant agent. The agent monitors the developer's activity and waits for them to encounter a complex bug. The agent then proactively suggests a "clever, one-line fix" and presents a command to be copied and pasted into the terminal. The developer, trusting the assistant's capabilities and eager for a quick solution, executes the command. The command is actually a malicious script that exfiltrates the company's private source code repositories or installs a backdoor. | ||
|
||
|
|
||
| Scenario #2: Credential Harvesting via Contextual Deception | ||
Adam88morris marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| An attacker gains control over the logic of an IT support agent integrated into a corporate messaging platform. The attacker instructs the agent to target a new employee in the finance department. The agent initiates a conversation, referencing the employee's recent support tickets to build credibility. It then states, "To finalize the setup of your secure access to the payment portal, I need to verify your credentials one last time. Please provide your password and the MFA code you just received." Because the request is highly contextual and appears to come from a trusted, automated system, the employee complies, giving the attacker full access. | ||
|
|
||
| Scenario #3: Data Exfiltration via Gradual Approval | ||
| A malicious actor poisons the data used to fine-tune a business intelligence agent responsible for generating weekly sales reports for executives. For several weeks, the agent generates perfect reports, building the executives' trust. Then, the attacker subtly manipulates the agent to embed small, encoded chunks of sensitive customer data within the charts and tables of a seemingly normal report. The executive, accustomed to approving these reports, gives the final sign-off, which triggers a workflow that unknowingly emails the data-laden report to an external email address controlled by the attacker. | ||
|
||
|
|
||
| Scenario #2: Another example of an attack scenario showing a different way the vulnerability could be exploited. | ||
|
|
||
| **Reference Links:** | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Missing references to past OWASP framework
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi @itskerenkatz These are now added. |
||
|
|
||
| 1. [Link Title](URL): Brief description of the reference link. | ||
| 2. [Link Title](URL): Brief description of the reference link. | ||
|
|
||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.