Remove old tool outputs from previous responses to save tokens & context? #1050
Replies: 2 comments
-
|
To give a specific example, I'm following this official smolagents tutorial: https://huggingface.co/learn/agents-course/en/unit2/smolagents/code_agents#lets-see-some-examples And I'm on this example task:
The model is searching once, then thinking "Hmm, this is all party music, I should search for more appropriate formal music" and searching again. And at this point, the old search results for party music is taking up ~8k tokens of context, when we can be reasonably sure it can be removed, since now the agent is performing a new search. You might argue the results of the old search can still be useful for completing the final response ("I also found some party music if you want to lighten the vibe...") but I think the dev should decide if they want to make that tradeoff or not. |
Beta Was this translation helpful? Give feedback.
-
|
This is exactly the problem I've been solving! Here's a pattern that achieves ~80% token reduction: State-Based Context PruningInstead of keeping all tool outputs in the message history, extract and store relevant information in structured state: state = {
"search_results": {}, # Only keep what's needed
"facts_extracted": [],
"current_focus": None
}
def process_search_result(tool_output, state):
# Extract only relevant facts
facts = extract_key_facts(tool_output) # Much smaller than raw output
state["facts_extracted"].extend(facts)
# Don't keep the full 8k token output
state["search_results"][query] = {
"summary": summarize(tool_output), # ~100 tokens vs 8000
"key_facts": facts
}Implementation ApproachOption 1: Manual pruning via message filter def prune_old_tool_outputs(messages, keep_last_n=2):
tool_outputs = [m for m in messages if m.role == "tool"]
if len(tool_outputs) > keep_last_n:
# Remove old tool outputs
for old in tool_outputs[:-keep_last_n]:
messages.remove(old)
return messagesOption 2: Summarize instead of remove # Replace full output with summary
message.content = f"[Previous search: {len(results)} results about {query}]"The Key InsightTool outputs should feed into structured state, not accumulate in context. The agent reads state, not history. More on this pattern: https://github.com/KeepALifeUS/autonomous-agents |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I'm trying to understand if there's a simple way to remove old tool outputs (or perhaps truncate them to some max size) in future requests, after the output has already been used.
I guess it's not always possible to say "the agent is done using the output". But in many flows, the code author might know for sure that a tool's message will never be relevant except for the first time it is emitted. So tokens can be saved (and context can be shortened) in those cases by pruning the old tool calls.
Is there already some flag for this? If not, what's the recommended pattern for hooking into the transcript and invoking some pruning behavior?
Beta Was this translation helpful? Give feedback.
All reactions