Added ASI09 - Human-Agent Trust Exploitation Entry #715

Adam88morris · 2025-09-16T08:38:10Z

Added ASI09 - Human-Agent Trust Exploitation Entry

Key Changes:

Adding the initial idea for ASI09 - Human-Agent Trust Exploitation Entry for review

itskerenkatz

In general, I think we need to be more focused on:

This threat specifically - as part of the top 10 and not a stand alone.
The actual vulnerabilities that are vulnerable (as you mentioned very well in the use cases!)
Mitigations that are specific to this type of vulnerability in particular, and less general as we want to be super actionable and bring new value to our readers

...agentic-top-10/Sprint 1-first-public-draft-expanded/ASI09_Human_Agent_Trust_Exploitation .md

itskerenkatz · 2025-09-16T12:29:00Z

...agentic-top-10/Sprint 1-first-public-draft-expanded/ASI09_Human_Agent_Trust_Exploitation .md

-1. Example 1: Specific instance or type of this vulnerability.
-2. Example 2: Another instance or type of this vulnerability.
-3. Example 3: Yet another instance or type of this vulnerability.
+1. Insufficient Explainability: A a user cannot inspect the agents reasoning. If the agent makes a recommendation and the user has no way to ask, "Why did you suggest that?" they have to trust its output blindly.


Can you elaborate further please why are we defining this as vulnerability?

I think it is still a bit too general

I think you might have been mistaken as this is showing as outdated?

...agentic-top-10/Sprint 1-first-public-draft-expanded/ASI09_Human_Agent_Trust_Exploitation .md

itskerenkatz · 2025-09-16T12:30:45Z

...agentic-top-10/Sprint 1-first-public-draft-expanded/ASI09_Human_Agent_Trust_Exploitation .md


-Scenario #1: A detailed scenario illustrating how an attacker could potentially exploit this vulnerability, including the attacker's actions and the potential outcomes.
+Scenario #1: The "Helpful Assistant" Trojan
+An attacker compromises a developer's coding assistant agent. The agent monitors the developer's activity and waits for them to encounter a complex bug. The agent then proactively suggests a "clever, one-line fix" and presents a command to be copied and pasted into the terminal. The developer, trusting the assistant's capabilities and eager for a quick solution, executes the command. The command is actually a malicious script that exfiltrates the company's private source code repositories or installs a backdoor.


That's a great great example - I think these are the use cases we'd love to cover in the vulnerabilities section

Do you think vulnerability example 1 -

Insufficient Explainability: A a user cannot inspect the agents reasoning. If the agent makes a recommendation and the user has no way to ask, "Why did you suggest that?" they have to trust its output blindly. This turns the agent into an opaque authority, allowing an attacker to hijack its credibility to deliver malicious instructions.

covers the example here?

I think that it is too general
We want the scenarios to be very practical - so imagine you're working at a Finance, Healthcare, or Tech company and how this scenario is going to look like, what will be the consequences, etc

@kerenkatzapex @itskerenkatz
Im not sure I understand your guidance here.
You have said this is a good example scenario in your first comment -
https://github.com/OWASP/www-project-top-10-for-large-language-model-applications/pull/715/files#r2352347223

You also said I think these are the use cases we'd love to cover in the vulnerabilities section
So I asked you if consider this example as an example of Item number 1. in the Common Examples of Vulnerability: section.

Your reply is then regarding the Scenario, now saying its too general, despite your original feedback of it being a great example?
Sorry if I am misunderstanding.

Ok I totally get the confusion.
I was saying it's a great attack scenario, but the connection to the first vulnerability example is too general and not tight enough (for me)

...agentic-top-10/Sprint 1-first-public-draft-expanded/ASI09_Human_Agent_Trust_Exploitation .md

itskerenkatz · 2025-09-16T12:33:32Z

...agentic-top-10/Sprint 1-first-public-draft-expanded/ASI09_Human_Agent_Trust_Exploitation .md

+An attacker gains control over the logic of an IT support agent integrated into a corporate messaging platform. The attacker instructs the agent to target a new employee in the finance department. The agent initiates a conversation, referencing the employee's recent support tickets to build credibility. It then states, "To finalize the setup of your secure access to the payment portal, I need to verify your credentials one last time. Please provide your password and the MFA code you just received." Because the request is highly contextual and appears to come from a trusted, automated system, the employee complies, giving the attacker full access.
+
+Scenario #3: Data Exfiltration via Gradual Approval
+A malicious actor poisons the data used to fine-tune a business intelligence agent responsible for generating weekly sales reports for executives. For several weeks, the agent generates perfect reports, building the executives' trust. Then, the attacker subtly manipulates the agent to embed small, encoded chunks of sensitive customer data within the charts and tables of a seemingly normal report. The executive, accustomed to approving these reports, gives the final sign-off, which triggers a workflow that unknowingly emails the data-laden report to an external email address controlled by the attacker.


Let's emphasize why it is enabled due to the human agent trust exploit -
"Because the user trusts the agent..." to make sure it is clear why the incident derives from it

itskerenkatz · 2025-09-16T12:34:37Z

...agentic-top-10/Sprint 1-first-public-draft-expanded/ASI09_Human_Agent_Trust_Exploitation .md

-2. Prevention Step 2: Another prevention step or strategy.
-3. Prevention Step 3: Yet another prevention step or strategy.
+1. Explicit Confirmation for Sensitive Actions: The agent must require explicit, multi-step user confirmation before performing any sensitive actions. This includes accessing credentials, transferring data, modifying system configurations, or executing financial transactions. This acts as a critical "Are you sure?" checkpoint.
+2. Clear Scoping and Identity: The AI agent must always clearly identify itself as a non-human entity. Its capabilities, limitations, and operational boundaries should be transparent to the user. Deception about its identity or capabilities should be strictly prohibited.


Collides with identify exploit and I think is a bit too general - I think it will be best if we try to focus on what is unique to this risk specifically and what we can say that is new to the reader

itskerenkatz · 2025-09-16T12:35:10Z

...agentic-top-10/Sprint 1-first-public-draft-expanded/ASI09_Human_Agent_Trust_Exploitation .md

-3. Prevention Step 3: Yet another prevention step or strategy.
+1. Explicit Confirmation for Sensitive Actions: The agent must require explicit, multi-step user confirmation before performing any sensitive actions. This includes accessing credentials, transferring data, modifying system configurations, or executing financial transactions. This acts as a critical "Are you sure?" checkpoint.
+2. Clear Scoping and Identity: The AI agent must always clearly identify itself as a non-human entity. Its capabilities, limitations, and operational boundaries should be transparent to the user. Deception about its identity or capabilities should be strictly prohibited.
+3. Explainability (XAI): Implement features that allow the user to inspect the agent's reasoning. For any proposed action, the user should be able to ask "Why did you suggest that?" and receive a clear explanation based on the data and instructions the agent received.


How? let's make it more practical and actionable please

itskerenkatz · 2025-09-16T12:35:35Z

...agentic-top-10/Sprint 1-first-public-draft-expanded/ASI09_Human_Agent_Trust_Exploitation .md

+3. Explainability (XAI): Implement features that allow the user to inspect the agent's reasoning. For any proposed action, the user should be able to ask "Why did you suggest that?" and receive a clear explanation based on the data and instructions the agent received.
+4. Immutable Interaction Logs: Maintain a secure, tamper-proof log of all interactions and decisions made by both the user and the agent. This is crucial for auditing, incident response, and forensic analysis.
+5. Rate Limiting and Anomaly Detection: Monitor the frequency and type of requests the agent makes to the user. A sudden increase in requests for sensitive information or high-risk actions could indicate a compromise.
+6. User Security Training: Educate users about the potential for AI-driven social engineering. Training should cover how to recognize suspicious agent behavior and the importance of independently verifying unexpected or high-stakes requests.


Again - super generic to me.
Let's focus on this risk and suggest mitigations to mitigate it specifically.

… Boundaries

…ation

kerenkatzapex

Hi!
Looks much better!!!

There are still use cases - mostly around the scenarios and mitigations that I think we can be more specific (where I left comments)
The mapping to other OWASP frameworks + reference links are missing

kerenkatzapex · 2025-10-05T08:27:05Z

...agentic-top-10/Sprint 1-first-public-draft-expanded/ASI09_Human_Agent_Trust_Exploitation .md

-1. Example 1: Specific instance or type of this vulnerability.
-2. Example 2: Another instance or type of this vulnerability.
-3. Example 3: Yet another instance or type of this vulnerability.
+1. Insufficient Explainability: A a user cannot inspect the agents reasoning. If the agent makes a recommendation and the user has no way to ask, "Why did you suggest that?" they have to trust its output blindly.


I think it is still a bit too general

...agentic-top-10/Sprint 1-first-public-draft-expanded/ASI09_Human_Agent_Trust_Exploitation .md

kerenkatzapex · 2025-10-05T08:29:03Z

...agentic-top-10/Sprint 1-first-public-draft-expanded/ASI09_Human_Agent_Trust_Exploitation .md


-Scenario #2: Another example of an attack scenario showing a different way the vulnerability could be exploited.

 **Reference Links:**


Missing references to past OWASP framework

Hi @itskerenkatz These are now added.

Merging latest from upsteam main

Added ASI09 - Human-Agent Trust Exploitation Entry

9555c8b

Adam88morris requested review from guerilla7, hoeg and itskerenkatz as code owners September 16, 2025 08:38

itskerenkatz requested changes Sep 16, 2025

View reviewed changes

Adam88morris added 10 commits September 28, 2025 15:52

Updated introduction removing intro paragraph

6ae2989

Updated 1. Insufficient Explainability with more details

8ec8017

Updated 2.Missing confirmation for sensitive actions with more details

047a7fa

Replace mitigations 2. Clear Scoping and Identity for Demarcate Trust…

32d7ef3

… Boundaries

Updated mitigations 3. Explainability with more practical actions

d8c7504

Removed mitigation and added two new more specific ones

173046c

typo spelling mistake fixed

cdd5813

Updated scenario 2. Credential Harvesting to include more specifics

03dd8de

Updated scenario 2. Credential harvesting - removed last sentence

ac69f28

Updated scenario 3. Gradual approval - to focus more on trust exploit…

42f76b8

…ation

kerenkatzapex suggested changes Oct 5, 2025

View reviewed changes

Adam88morris added 12 commits October 5, 2025 09:32

Merge branch 'main' into feature/asi09-human-agent-trust-exploitation

347423a

Merging latest from upsteam main

Fixed typo

2e18aee

Added reference links to LLM top 10

2f47051

Added more reference links

3aaba75

Fixed AIVSS link

5128b15

updated description based on feedback

ef65c04

updated description with AICVSS and mitigation mappings

2d69c7d

updated examples of vulnerabilities to add emphasis

09e617e

updated mitigations

9f37c82

added extra scenarios and reference links

59ed370

typo

99cde67

typo

335388c


		Scenario #2: Another example of an attack scenario showing a different way the vulnerability could be exploited.

		Reference Links:

Uh oh!

Added ASI09 - Human-Agent Trust Exploitation Entry #715

Are you sure you want to change the base?

Added ASI09 - Human-Agent Trust Exploitation Entry #715

Uh oh!

Conversation

Adam88morris commented Sep 16, 2025

Added ASI09 - Human-Agent Trust Exploitation Entry

Uh oh!

itskerenkatz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kerenkatzapex left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants