Skip to content

Conversation

tradeqvest
Copy link
Contributor

@tradeqvest tradeqvest commented Sep 22, 2025

Fix parallel tool call limit enforcement

Problem

The tool_calls_limit in UsageLimits was not properly enforced for parallel tool execution. When multiple tools were called in parallel, all tools would start executing before the limit was checked, allowing the limit to be exceeded before raising UsageLimitExceeded.

For example, if tool_calls_limit=6 and the model returned 8 parallel tool calls, all 8 tools would start executing before the error was raised.

Solution

This PR modifies the parallel tool execution logic in _agent_graph.py to enforce the limit before starting tool tasks:

  1. Pre-execution limit check: Before creating async tasks for parallel tools, we now check how many tool calls are remaining within the limit
  2. Limited execution: Only start tasks for the allowed number of tool calls
  3. Immediate error: Raise UsageLimitExceeded after the allowed tools complete if more were requested

@tradeqvest tradeqvest marked this pull request as ready for review September 22, 2025 13:10
@DouweM
Copy link
Collaborator

DouweM commented Sep 26, 2025

@tradeqvest Since we'll raise an error regardless, is it worth executing tool calls up to the limit, instead of just raising immediately?

I was thinking we could do something like this:

usage = ctx.state.usage
if ctx.deps.usage_limits.count_tokens_before_request:
# Copy to avoid modifying the original usage object with the counted usage
usage = deepcopy(usage)
counted_usage = await ctx.deps.model.count_tokens(message_history, model_settings, model_request_parameters)
usage.incr(counted_usage)
ctx.deps.usage_limits.check_before_request(usage)

Where we optimistically increment a copied version of the usage, and check the usage limit against that.

@tradeqvest
Copy link
Contributor Author

tradeqvest commented Sep 26, 2025

@tradeqvest Since we'll raise an error regardless, is it worth executing tool calls up to the limit, instead of just raising immediately?

I was thinking we could do something like this:

usage = ctx.state.usage
if ctx.deps.usage_limits.count_tokens_before_request:
# Copy to avoid modifying the original usage object with the counted usage
usage = deepcopy(usage)
counted_usage = await ctx.deps.model.count_tokens(message_history, model_settings, model_request_parameters)
usage.incr(counted_usage)
ctx.deps.usage_limits.check_before_request(usage)

Where we optimistically increment a copied version of the usage, and check the usage limit against that.

@DouweM It would definitely be simpler, yet I was thinking that the tool output up until the UsageLimit violation could still be of value, captured and further processed. Let me know what you think.

@DouweM
Copy link
Collaborator

DouweM commented Sep 29, 2025

@tradeqvest I think it'd be misleading if those results never get sent back to the model to use, and the user will think their action failed even though some tools (with side effects) may have in fact been executed. If we had a way to, instead of failing hard, tell the model "this call was not executed because you hit the limit" for the calls over the limit, executing the earlier ones makes sense, but until we have such a mode I'd rather not run the tools at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants