Motivation
Current behavior: each conversation turn is stored as a separate memory. This is optimal for fine-grained fact retrieval ("what did the user say about X?") but disadvantages session-level benchmarks like LongMemEval, which measure whether the correct session appears in the top-K results.
LongMemEval benchmark comparison:
- MCP Memory Service (turn-level, zero LLM): 80.4% R@5
- MemPalace (session-level, zero LLM): 96.6% R@5
The gap is largest for temporal-reasoning (72% vs 96%) and multi-session questions (71% vs 98%), where session context provides strong retrieval signal.
Proposed Design
Add memory_type="session" with session-scoped ingestion:
- New MCP tool
store_session — accepts a list of turns, concatenates them with speaker labels, stores as one Memory
- Session tag convention —
session_id stored as tag for dedup and grouping
- Benchmark mode flag —
--ingestion-mode session|turn|both in benchmark_longmemeval.py
Trade-offs
|
Turn-level (current) |
Session-level (proposed) |
| Retrieval granularity |
Individual statements |
Full conversation context |
| LongMemEval R@5 |
80.4% |
~95%+ (estimated) |
| Memory count per session |
10–20 entries |
1 entry |
| Search space |
Large (hundreds of turns) |
Small (dozens of sessions) |
| Use case |
"What did user say about X?" |
"What happened in the session about Y?" |
Both strategies have merit — this would add session-level as an option, not replace turn-level storage.
Acceptance Criteria
Effort estimate
~2–3 days (Memory model minor extension, 1 new tool, benchmark update, tests)
Contributions welcome — see CONTRIBUTING.md.
Motivation
Current behavior: each conversation turn is stored as a separate memory. This is optimal for fine-grained fact retrieval ("what did the user say about X?") but disadvantages session-level benchmarks like LongMemEval, which measure whether the correct session appears in the top-K results.
LongMemEval benchmark comparison:
The gap is largest for temporal-reasoning (72% vs 96%) and multi-session questions (71% vs 98%), where session context provides strong retrieval signal.
Proposed Design
Add
memory_type="session"with session-scoped ingestion:store_session— accepts a list of turns, concatenates them with speaker labels, stores as one Memorysession_idstored as tag for dedup and grouping--ingestion-mode session|turn|bothinbenchmark_longmemeval.pyTrade-offs
Both strategies have merit — this would add session-level as an option, not replace turn-level storage.
Acceptance Criteria
store_sessionMCP tool or HTTP endpoint--ingestion-mode sessionflagEffort estimate
~2–3 days (Memory model minor extension, 1 new tool, benchmark update, tests)
Contributions welcome — see CONTRIBUTING.md.