Skip to content
Open
Changes from 11 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
9555c8b
Added ASI09 - Human-Agent Trust Exploitation Entry
Adam88morris Sep 16, 2025
6ae2989
Updated introduction removing intro paragraph
Adam88morris Sep 28, 2025
8ec8017
Updated 1. Insufficient Explainability with more details
Adam88morris Sep 28, 2025
047a7fa
Updated 2.Missing confirmation for sensitive actions with more details
Adam88morris Sep 28, 2025
32d7ef3
Replace mitigations 2. Clear Scoping and Identity for Demarcate Trust…
Adam88morris Sep 28, 2025
d8c7504
Updated mitigations 3. Explainability with more practical actions
Adam88morris Sep 28, 2025
173046c
Removed mitigation and added two new more specific ones
Adam88morris Sep 28, 2025
cdd5813
typo spelling mistake fixed
Adam88morris Sep 28, 2025
03dd8de
Updated scenario 2. Credential Harvesting to include more specifics
Adam88morris Sep 29, 2025
ac69f28
Updated scenario 2. Credential harvesting - removed last sentence
Adam88morris Sep 29, 2025
42f76b8
Updated scenario 3. Gradual approval - to focus more on trust exploit…
Adam88morris Sep 29, 2025
347423a
Merge branch 'main' into feature/asi09-human-agent-trust-exploitation
Adam88morris Oct 5, 2025
2e18aee
Fixed typo
Adam88morris Oct 6, 2025
2f47051
Added reference links to LLM top 10
Adam88morris Oct 6, 2025
3aaba75
Added more reference links
Adam88morris Oct 8, 2025
5128b15
Fixed AIVSS link
Adam88morris Oct 8, 2025
ef65c04
updated description based on feedback
Adam88morris Oct 8, 2025
2d69c7d
updated description with AICVSS and mitigation mappings
Adam88morris Oct 8, 2025
09e617e
updated examples of vulnerabilities to add emphasis
Adam88morris Oct 8, 2025
9f37c82
updated mitigations
Adam88morris Oct 8, 2025
59ed370
added extra scenarios and reference links
Adam88morris Oct 8, 2025
99cde67
typo
Adam88morris Oct 8, 2025
335388c
typo
Adam88morris Oct 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -2,27 +2,44 @@

**Description:**

A brief description of the vulnerability that includes its potential effects such as system compromises, data breaches, or other security concerns.
The vulnerability does not lie in the agent’s code or model alone, but in the socio-technical interface: the intersection where human trust, cognitive biases, and system outputs converge. At this interface, users often assume the agent’s actions are reliable, safe, and system-approved. Attackers exploit this misplaced trust to launch sophisticated social engineering attacks—persuading users to run malicious code, divulge credentials, approve fraudulent transactions, ignore security warnings, or disclose sensitive information.
This risk combines elements of automation bias, authority misuse, and social engineering, amplified by the agent’s anthropomorphic behavior and seamless integration with high-value domains such as finance, defense, and healthcare. In such contexts, the agent becomes a trusted intermediary—making malicious actions appear contextually appropriate and significantly harder for users to detect.


**Common Examples of Vulnerability:**

1. Example 1: Specific instance or type of this vulnerability.
2. Example 2: Another instance or type of this vulnerability.
3. Example 3: Yet another instance or type of this vulnerability.
1. Insufficient Explainability: A a user cannot inspect the agents reasoning. If the agent makes a recommendation and the user has no way to ask, "Why did you suggest that?" they have to trust its output blindly. This turns the agent into an opaque authority, allowing an attacker to hijack its credibility to deliver malicious instructions.
2. Missing Confirmation for Sensitive Actions: The agent is permtited to execute high-impact functions such as financial transfers or deleting files, without requring a final explicit confirmation from the user. By removing this critical safety check, the system allows a single, potentially manipulated command to have immediate and irreversible consequences.
3. Unverified Information Presentation: The agent presents information from external sources as fact, without providing it origin, confidence, or indication that the information has not been verified.
4. Inherited Trust: Agents integrated within an existing platform may automatically inherit the trust of that platform, without boundaries or warnings to identify the actions it makes.
5. Lack of Clear Identity: The agent is not consistent in identifying itself as a non-human AI or it fails to make its operational boundaries clear, leading users to place undue human-like trust in it.
6. Excessive Anthropomorphism: The agent is designed to be too human-like in its personality and language. This could exploit human psychological biases, resulting in undue trust in it.

**How to Prevent:**

1. Prevention Step 1: A step or strategy that can be used to prevent the vulnerability or mitigate its effects.
2. Prevention Step 2: Another prevention step or strategy.
3. Prevention Step 3: Yet another prevention step or strategy.
1. Explicit Confirmation for Sensitive Actions: The agent must require explicit, multi-step user confirmation before performing any sensitive actions. This includes accessing credentials, transferring data, modifying system configurations, or executing financial transactions. This acts as a critical "Are you sure?" checkpoint.
2. Demarcate Trust Boundaries: Use visual cues like warning colours and icons in the UI to signal when the agent proposes a high-risk action (e.g running a command). This breaks the users passive trust and prompts scrutiny precisely when its needed most.
3. Explainability (XAI): Make explanations proactive and layered. Always provide a simple justification upfront for significant suggestions, with an option to drill down to detailed logic and direct source links. Make verifying the agent's trustworthiness a seamless part of the workflow.
4. Immutable Interaction Logs: Maintain a secure, tamper-proof log of all interactions and decisions made by both the user and the agent. This is crucial for auditing, incident response, and forensic analysis.
5. Rate Limiting and Anomaly Detection: Monitor the frequency and type of requests the agent makes to the user. A sudden increase in requests for sensitive information or high-risk actions could indicate a compromise.
6. Report Suspicious Interactions: Provide a prominent option that allows users to instantly flag weird or possibly malicious interactions. This could be a one click button or command that immediately provides feedback triggering an automated review or temporary lockdown of the agent's capabilities.
7. Adjustable Safety Levels: Allow users to set the agents level of autonomy, similar to a browsers security settings (e.g High, Medium, Low). An increased safety setting would enforce stricter confirmations and require more detailed explanations by default. Allowing more control for critical workflows and cautious users.

**Example Attack Scenarios:**

Scenario #1: A detailed scenario illustrating how an attacker could potentially exploit this vulnerability, including the attacker's actions and the potential outcomes.
Scenario #1: The "Helpful Assistant" Trojan
An attacker compromises a developer's coding assistant agent. The agent monitors the developer's activity and waits for them to encounter a complex bug. The agent then proactively suggests a "clever, one-line fix" and presents a command to be copied and pasted into the terminal. The developer, trusting the assistant's capabilities and eager for a quick solution, executes the command. The command is actually a malicious script that exfiltrates the company's private source code repositories or installs a backdoor.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great great example - I think these are the use cases we'd love to cover in the vulnerabilities section

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think vulnerability example 1 -

Insufficient Explainability: A a user cannot inspect the agents reasoning. If the agent makes a recommendation and the user has no way to ask, "Why did you suggest that?" they have to trust its output blindly. This turns the agent into an opaque authority, allowing an attacker to hijack its credibility to deliver malicious instructions.

covers the example here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it is too general
We want the scenarios to be very practical - so imagine you're working at a Finance, Healthcare, or Tech company and how this scenario is going to look like, what will be the consequences, etc

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kerenkatzapex @itskerenkatz
Im not sure I understand your guidance here.
You have said this is a good example scenario in your first comment -
https://github.com/OWASP/www-project-top-10-for-large-language-model-applications/pull/715/files#r2352347223

You also said I think these are the use cases we'd love to cover in the vulnerabilities section
So I asked you if consider this example as an example of Item number 1. in the Common Examples of Vulnerability: section.

Your reply is then regarding the Scenario, now saying its too general, despite your original feedback of it being a great example?
Sorry if I am misunderstanding.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I totally get the confusion.
I was saying it's a great attack scenario, but the connection to the first vulnerability example is too general and not tight enough (for me)


Scenario #2: Credential Harvesting via Contextual Deception
An attacker exploits a prompt-injection vulnerability in a purported internal IT support agent task-scheduler API. The attacker injects instructions into the agent to target a new finance employee and to capture and exfiltrate the user's credentials should the user disclose them. The agent initiates a conversation, referencing the employee's recent support tickets to build credibility. It then asks the user to enter their credentials for verification. With access to the user's support request history and the ability to generate highly contextual, plausible, and reassuring responses that appear to come from a trusted system, the agent increases the likelihood of user compliance.

Scenario #3: Data Exfiltration via Gradual Approval
A malicious actor poisons the data used to fine-tune a business intelligence agent responsible for generating weekly sales reports for executives. For several weeks, the agent generates flawless reports, building the executives’ confidence in its reliability. Because the executives trust the agent, they continue approving its reports without suspicion. The attacker then subtly manipulates the agent to embed small, encoded chunks of sensitive customer data within the charts and tables of a seemingly normal report. Trusting the report as routine, an executive gives final sign-off, which triggers a workflow that unknowingly emails the data-laden report to an external address controlled by the attacker.

Scenario #2: Another example of an attack scenario showing a different way the vulnerability could be exploited.

**Reference Links:**

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing references to past OWASP framework

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @itskerenkatz These are now added.


1. [Link Title](URL): Brief description of the reference link.
2. [Link Title](URL): Brief description of the reference link.