Remove old tool outputs from previous responses to save tokens & context? #1050

andysalerno · 2025-03-22T00:59:31Z

andysalerno
Mar 22, 2025

Hi,

I'm trying to understand if there's a simple way to remove old tool outputs (or perhaps truncate them to some max size) in future requests, after the output has already been used.

I guess it's not always possible to say "the agent is done using the output". But in many flows, the code author might know for sure that a tool's message will never be relevant except for the first time it is emitted. So tokens can be saved (and context can be shortened) in those cases by pruning the old tool calls.

Is there already some flag for this? If not, what's the recommended pattern for hooking into the transcript and invoking some pruning behavior?

andysalerno · 2025-03-22T01:02:53Z

andysalerno
Mar 22, 2025
Author

To give a specific example, I'm following this official smolagents tutorial: https://huggingface.co/learn/agents-course/en/unit2/smolagents/code_agents#lets-see-some-examples

And I'm on this example task:

Search for the best music recommendations for a party at the Wayne's mansion

The model is searching once, then thinking "Hmm, this is all party music, I should search for more appropriate formal music" and searching again.

And at this point, the old search results for party music is taking up ~8k tokens of context, when we can be reasonably sure it can be removed, since now the agent is performing a new search. You might argue the results of the old search can still be useful for completing the final response ("I also found some party music if you want to lighten the vibe...") but I think the dev should decide if they want to make that tradeoff or not.

0 replies

KeepALifeUS · 2026-02-12T20:46:21Z

KeepALifeUS
Feb 12, 2026

This is exactly the problem I've been solving! Here's a pattern that achieves ~80% token reduction:

State-Based Context Pruning

Instead of keeping all tool outputs in the message history, extract and store relevant information in structured state:

state = {
    "search_results": {},  # Only keep what's needed
    "facts_extracted": [],
    "current_focus": None
}

def process_search_result(tool_output, state):
    # Extract only relevant facts
    facts = extract_key_facts(tool_output)  # Much smaller than raw output
    state["facts_extracted"].extend(facts)
    
    # Don't keep the full 8k token output
    state["search_results"][query] = {
        "summary": summarize(tool_output),  # ~100 tokens vs 8000
        "key_facts": facts
    }

Implementation Approach

Option 1: Manual pruning via message filter

def prune_old_tool_outputs(messages, keep_last_n=2):
    tool_outputs = [m for m in messages if m.role == "tool"]
    if len(tool_outputs) > keep_last_n:
        # Remove old tool outputs
        for old in tool_outputs[:-keep_last_n]:
            messages.remove(old)
    return messages

Option 2: Summarize instead of remove

# Replace full output with summary
message.content = f"[Previous search: {len(results)} results about {query}]"

The Key Insight

Tool outputs should feed into structured state, not accumulate in context. The agent reads state, not history.

More on this pattern: https://github.com/KeepALifeUS/autonomous-agents

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove old tool outputs from previous responses to save tokens & context? #1050

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Remove old tool outputs from previous responses to save tokens & context? #1050

Uh oh!

andysalerno Mar 22, 2025

Replies: 2 comments

Uh oh!

andysalerno Mar 22, 2025 Author

Uh oh!

KeepALifeUS Feb 12, 2026

State-Based Context Pruning

Implementation Approach

The Key Insight

andysalerno
Mar 22, 2025

andysalerno
Mar 22, 2025
Author

KeepALifeUS
Feb 12, 2026