Skip to content

fix: include tool and trace state in evaluation cache keys#2561

Open
aerosta wants to merge 1 commit intoconfident-ai:mainfrom
aerosta:fix/cache-key-include-execution-state
Open

fix: include tool and trace state in evaluation cache keys#2561
aerosta wants to merge 1 commit intoconfident-ai:mainfrom
aerosta:fix/cache-key-include-execution-state

Conversation

@aerosta
Copy link
Copy Markdown
Contributor

@aerosta aerosta commented Mar 19, 2026

Summary

The evaluation cache key originally only considered the text fields on LLMTestCase: input, actual_output, expected_output, context, and retrieval_context.

For tool-based and trace-based evaluation, that was incomplete. Metrics such as ToolCorrectnessMetric and trace-level metrics also depend on execution-state fields like tools_called, expected_tools, MCP call data, and _trace_dict. Two test cases with the same text but different tool calls or trace structure could therefore produce the same cache key and reuse stale metric results.

This change moves cache key construction into CachedTestCase.create_cache_key() and includes the execution-state fields that affect scoring.

Changes

  • add CachedTestCase.create_cache_key() in deepeval/test_run/cache.py
  • replace inline cache-key construction in get_cached_test_case() and cache_test_case()
  • include tool, MCP, and trace fields in the cache key
  • normalize nested values before serialization so key generation stays deterministic

Context

Tool-calling and trace-level metrics evaluate fields such as tools_called, expected_tools, and _trace_dict. Because those fields were not part of the cache key, changing only the execution state between runs could still return cached scores from a previous run.

This updates the cache key format, so existing cache entries will miss on the first run after upgrade. That is intentional, a one-time re-evaluation is safer than serving stale results.

Tests

  • add tests/test_core/test_run/test_cache_keys.py
  • cover tool differentiation, trace differentiation, identity, and text-only cases

@vercel
Copy link
Copy Markdown

vercel bot commented Mar 19, 2026

@aerosta is attempting to deploy a commit to the Confident AI Team on Vercel.

A member of the Team first needs to authorize it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant