MAF ↔ AgentEval.Memory Concept Mapping

This document explains how AgentEval.Memory evaluators work with Microsoft Agent Framework (MAF) 1.3.0's pipeline architecture, and why no code changes are needed.

Architecture Overlap

MAF 1.3.0 and AgentEval.Memory both deal with conversation history and memory, but at different abstraction levels:

MAF manages memory inside the agent pipeline: ChatHistoryProvider, AIContextProvider, CompactionStrategy
AgentEval.Memory evaluates memory quality from outside: it sends prompts, resets sessions, and measures retention

The two systems are complementary, not competing.

Concept Mapping Table

AgentEval.Memory Concept	MAF 1.3.0 Equivalent	Relationship
`ISessionResettableAgent.ResetSessionAsync()`	`agent.CreateSessionAsync()` (new session)	Same effect. `MAFAgentAdapter.ResetSessionAsync()` calls `CreateSessionAsync()` internally. New session = fresh conversation history.
`IHistoryInjectableAgent.InjectConversationHistory()`	`ChatHistoryProvider.ProvideChatHistoryAsync()`	Different purpose. AgentEval injects synthetic test data to skip LLM setup calls. MAF providers manage real conversation history. `MAFAgentAdapter` implements both.
`ChatClientAgentAdapter._conversationHistory`	`InMemoryChatHistoryProvider`	Reimplementation. Both maintain `List<ChatMessage>`. `ChatClientAgentAdapter` does this outside MAF's pipeline (for raw `IChatClient` wrapping).
`LLMPersistentMemoryAgent` (Sample G5)	`AIContextProvider` subclass	Same pattern, different implementation. Manual memory management vs. pipeline-integrated. See Sample G6 for the MAF-native approach.
`ReducerEvaluator`	`CompactionStrategy` (experimental)	Complementary. `ReducerEvaluator` measures compression quality. `CompactionStrategy` performs compression. Different layers — one evaluates, the other executes.
`CrossSessionEvaluator`	`AgentSession` lifecycle	Compatible. Evaluator calls `ResetSessionAsync()` → adapter creates new session → `ChatHistoryProvider` loses history → `AIContextProvider` retains long-term memory. Correctly tests persistent memory.
`ReachBackEvaluator`	`InMemoryChatHistoryProvider` + reducers	Compatible. Noise turns fill context window → reducers may drop early turns → evaluator measures what the agent still recalls.

Key Insight: Why It Works Without Changes

AgentEval.Memory evaluators operate at the IEvaluableAgent abstraction level:

CrossSessionEvaluator
    ↓ calls
IEvaluableAgent.InvokeAsync(prompt)
ISessionResettableAgent.ResetSessionAsync()
    ↓ which is
MAFAgentAdapter
    ↓ delegates to
ChatClientAgent.RunAsync(messages, session)
    ↓ which triggers  
ChatHistoryProvider → AIContextProviders → IChatClient → LLM

The evaluators don't need to know about ChatHistoryProvider, AIContextProvider, or CompactionStrategy. They test behavior (does the agent recall facts?) not mechanism (how does it store them?).

Session Lifecycle

┌────────────────────────────────────────────────┐
│  CrossSessionEvaluator / ReachBackEvaluator    │
│  Calls: InvokeAsync(), ResetSessionAsync()     │
└──────────────────┬─────────────────────────────┘
                   │
┌──────────────────▼─────────────────────────────┐
│  MAFAgentAdapter                               │
│  ResetSessionAsync() → CreateSessionAsync()    │
│  InvokeAsync() → agent.RunAsync(msg, session)  │
└──────────────────┬─────────────────────────────┘
                   │
┌──────────────────▼─────────────────────────────┐
│  MAF Agent Pipeline                            │
│  ChatHistoryProvider  → session-scoped history  │
│  AIContextProvider    → persistent memory       │
│  CompactionStrategy   → context window mgmt     │
│  IChatClient          → LLM API call            │
└────────────────────────────────────────────────┘

On ResetSessionAsync():

ChatHistoryProvider state is lost (new session, empty history) ✅
AIContextProvider state persists (lives outside the session) ✅
This correctly models: "conversation context" vs. "long-term memory"

When to Use Which Adapter

Scenario	Adapter	Why
MAF `ChatClientAgent` with pipeline features	`MAFAgentAdapter`	Gets `AIContextProvider`, `ChatHistoryProvider`, session management
Raw `IChatClient` without MAF pipeline	`ChatClientAgentAdapter`	Manages its own conversation history
Any `IChatClient` (quick setup)	`.AsEvaluableAgent()`	Extension method, wraps in `ChatClientAgentAdapter`

Samples

Sample	Description
Sample A6 (Session Lifecycle)	Shows `CreateSessionAsync` → multi-turn → `ResetSessionAsync` → isolation
Sample G5 (Cross-Session Memory)	Manual memory: `LLMPersistentMemoryAgent` with `_longTermMemory` dict
Sample G6 (AIContextProvider Memory)	MAF-native: `PersistentMemoryProvider : AIContextProvider` in pipeline

Current Limitations

Session Observability

AgentSession.StateBag is not exposed to evaluators. AgentEval.Memory evaluates observable behavior — whether the agent recalls facts across turns or sessions — rather than inspecting what an AIContextProvider stored internally.

CompactionStrategy Configuration

ReducerEvaluator measures the effects of history reduction through agent responses, but it does not configure a MAF CompactionStrategy or report internal pre/post-compaction message counts. Agent configuration remains the responsibility of the host application; the evaluator measures the resulting behavior.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAF ↔ AgentEval.Memory Concept Mapping

Architecture Overlap

Concept Mapping Table

Key Insight: Why It Works Without Changes

Session Lifecycle

When to Use Which Adapter

Samples

Current Limitations

Session Observability

CompactionStrategy Configuration

FilesExpand file tree

maf-memory-integration.md

Latest commit

History

maf-memory-integration.md

File metadata and controls

MAF ↔ AgentEval.Memory Concept Mapping

Architecture Overlap

Concept Mapping Table

Key Insight: Why It Works Without Changes

Session Lifecycle

When to Use Which Adapter

Samples

Current Limitations

Session Observability

CompactionStrategy Configuration