Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
132 changes: 132 additions & 0 deletions docs/superpowers/plans/2026-05-09-drop-quality-score-columns.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# Drop importance_score and confidence columns

> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to resume this plan from the checkpoint. Check the "Status as of last commit" section first to see what's already done.

**Goal:** Remove the two broken-by-default `memory` columns (`importance_score` always 1.0; `confidence` always 0.5 for observations) and every code path that reads, writes, or ranks by them. Lore's recall stack already does cosine-similarity + FTS hybrid scoring; importance/confidence multipliers are no-ops on the current data and only confuse the UI.

**Architecture:** Schema migration drops columns; persistence layer stops carrying the fields; service / MCP / HTTP / CLI / export layers stop accepting and emitting them; `importance.py` was deleted and replaced by `decay.py` with just the still-needed pure functions. `min_confidence` parameters that are actually min-score thresholds get renamed to `min_score`. Recall scoring is now `cosine × time_decay × tier_weight × graph_boost` (no importance multiplier; was always a no-op since every row had identical 1.0).

**Out of scope (separate "confidence" concepts — DO NOT TOUCH):**
- `recommend/engine.py` / `recommend/types.py` — RecommendationConfidence (`0.6 * magnitude + 0.4 * agreement`)
- `freshness/detector.py` — staleness confidence per commit-count threshold
- `services/graph/`, `graph_extraction.py`, graph relationships — these have their own per-relationship `confidence` and `weight` fields on a *different* table (entities/mentions/relationships)
- `classify/` — classification confidence per axis

---

## Status as of last commit (2af9ead)

**Done:**
- ✅ Phase 1: schema migrations (`migrations/025_drop_quality_score_columns.sql` + SQLite version)
- ✅ Phase 2: dataclass field removals (NewMemory, StoredMemory, MemoryPatch, ExportedMemory in `persistence/types.py`)
- ✅ Phase 3a: SQLite persistence (INSERT, SELECT, recall scoring, record_memory_access, bump_access_counts, recommendation candidate ordering, search_memories_text, _row_to_memory, _row_to_exported, _MEMORY_COLS, GraphStats.avg_importance)
- ✅ Phase 3b: PostgreSQL persistence (matching shape changes — RETURNING clauses, recall scoring, FTS branch, recall_by_entities, list_memories_without_mentions, list_temporal_buckets, search_memories_text, record_memory_access, list_candidate_memories_for_recommendation, AVG aggregates)
- ✅ Phase 3c: protocol.py (bump_access_counts docstring; import_extracted_memory and upsert_memory_with_embedding signatures lose the confidence parameter)
- ✅ Phase 4: `src/lore/importance.py` deleted; new `src/lore/decay.py` with `decay_factor` + `resolve_half_life`; `lore.py` updated to use `_memory_decay` helper instead of `time_adjusted_importance`; upvote/downvote no longer recompute importance; `cleanup_expired` uses decay-only thresholding; `recalculate_importance` method removed; `tests/test_importance_scoring.py` deleted
- ✅ Phase 5a: 3 services (`services/observations.py`, `services/memories.py`, `services/lessons.py`) — drop `confidence` parameter, drop `min_confidence` (renamed to `min_score`), strip from response dicts and bulk-upsert path

**Remaining (Phase 5b onward):**

### Phase 5b — services + adjacent
- `src/lore/services/conversations.py` — drop confidence pass-through to memory creation
- `src/lore/services/snapshots.py` — drop importance_score from response/dataclass mappings
- `src/lore/services/retrieve.py` — line ~524 has `float(memory.importance_score) if ... else 0.5` to drop
- `src/lore/services/graph/graph.py`, `services/graph/review.py` — sweep for memory.importance_score / memory.confidence references; LEAVE per-relationship graph confidence/weight alone
- `src/lore/services/graph_extraction.py` — sweep for memory.importance_score / memory.confidence (NOT the graph extraction's own LLM-emitted confidence per fact, that's a different concept)
- `src/lore/consolidation.py` — drop importance_score from consolidated NewMemory; replace `max(memories, key=lambda m: m.importance_score)` with `max(..., key=lambda m: m.created_at)`
- `src/lore/conversation/extractor.py` — drop `confidence=candidate.get("confidence", 0.8)` from create-memory call (the LLM still emits per-fact confidence, we just stop persisting it on the memory column)
- `src/lore/extract/extractor.py`, `extract/prompts.py`, `extract/resolver.py` — drop confidence from any NewMemory constructors
- `src/lore/ingest/dedup.py` — line 68 `min_confidence=0.0` → rename to `min_score=0.0`

### Phase 6 — MCP server
- `src/lore/mcp/server.py::remember()` — drop `confidence: float = 1.0` parameter from signature; drop from the `lore.remember(...)` call
- Sweep response dicts for confidence / importance_score

### Phase 7 — HTTP routes & response models
- `src/lore/server/models.py` — drop confidence/importance_score from any pydantic MemoryResponse / MemoryRow
- `src/lore/server/routes/recent.py` — `RecentMemoryItem.importance_score` field gone; drop from `_to_item()` and the markdown formatter
- `src/lore/server/routes/retention.py` — drop `min_importance_score` query parameter; drop importance_score from response model. Replace with min_age_days/tier or just remove.
- `src/lore/server/routes/lessons.py` — drop confidence from create body; rename `min_confidence` → `min_score`; drop confidence + importance_score from response model
- `src/lore/server/routes/memories.py` — drop both fields from MemoryResponse; replace any min_confidence/min_importance_score with min_score
- `src/lore/server/routes/observations.py` — drop fields from response (they're in `meta`, but verify)
- `src/lore/server/routes/temporal.py` — drop importance_score from temporal response models
- `src/lore/server/routes/review.py` — sweep for the dropped columns (leave graph confidence alone)
- `src/lore/server/routes/recommendations.py` — sweep; leave RecommendationConfidence alone, only touch references to memory column
- `src/lore/server/routes/graph/memories.py`, `graph/models.py`, `graph/stats.py` — drop avg_importance from GraphStats response; leave per-relationship graph confidence

### Phase 8 — top-level + CLI + export + UI source
- `src/lore/lore.py::remember()` — drop `confidence` parameter from public API; drop `0 <= confidence <= 1.0` guard
- `src/lore/async_lore.py` — same surgery
- `src/lore/types.py` — drop `confidence: float = 1.0` and `importance_score: float = 1.0` from any top-level Memory / ScoredMemory dataclasses; drop avg_importance / avg_confidence from MemoryStats
- `src/lore/temporal.py:181` — drop the `importance: {mem.importance_score:.2f}` formatter line
- `src/lore/recent.py`, `src/lore/retention.py` — drop importance_score from data structures; drop importance threshold parameters
- `src/lore/store/http.py` — drop importance_score / confidence from the request payloads and response parsers
- `src/lore/cli/__init__.py:575` — drop `--min-importance` argument
- `src/lore/cli/commands/manage.py` — drop importance-recompute commands
- `src/lore/cli/commands/misc.py` — sweep references
- `src/lore/cli/commands/remember.py` — drop `--confidence` flag
- `src/lore/export/markdown.py:137` — drop `"confidence": mem.confidence,`
- `src/lore/export/serializers.py` — drop both fields from JSON serializer
- `src/lore/ui/src/panels/detail.js` — drop the rows that render "Importance: 100%" and "Confidence: 50%"
- `src/lore/ui/src/state.js` — sweep
- `src/lore/ui/dist/app.js` — leave (build artifact); note in PR that UI build needs re-run

### Phase 9 — tests update + add regression
~30 test files reference these fields. Sweep:
```bash
grep -rln "importance_score\|memory\.confidence\|min_confidence\|min_importance" tests/ | grep -v __pycache__
```
For each: remove `importance_score=`/`confidence=` from fixture constructors, drop assertions on those fields, rename `min_confidence` → `min_score` parameters.

Add `tests/persistence/test_quality_columns_dropped.py` regression test:
```python
import pytest
pytest.importorskip("aiosqlite")
pytest.importorskip("sqlite_vec")
from lore.persistence.sqlite import SqliteStore

@pytest.mark.asyncio
async def test_memories_table_has_no_quality_score_columns(tmp_path):
db = tmp_path / "lore.db"
store = await SqliteStore.create(f"sqlite:///{db}", run_migrations=True)
try:
async with store._pool.acquire() as conn:
cur = await conn.execute("PRAGMA table_info(memories)")
cols = [row[1] async for row in cur]
assert "importance_score" not in cols
assert "confidence" not in cols
finally:
await store.close()


def test_importance_module_does_not_exist():
with pytest.raises(ImportError):
from lore import importance # noqa: F401
```

### Phase 10 — verify, lint, push, PR

```bash
PYTHONPATH=src python3 -m pytest tests/ -x -q --ignore=tests/integration
ruff check src/ tests/
git push
gh pr ready <PR#> # if originally opened as draft
```

---

## Helpful invariants for the next executor

- **`m.confidence` is OK to keep** in queries against `entity_mentions` table (graph relationship confidence — different column, different concept). Verify by checking the FROM clause: if it's `FROM entity_mentions` or has a `JOIN entity_mentions m ON ...` with `m.` prefix, that's graph.
- The `_row_to_mention` function in both sqlite.py and postgres.py keeps its `confidence=row["confidence"]` line — that's the graph mention.
- `recommend/engine.py` `_compute_confidence` is a separate computation (signal magnitude × agreement). Don't touch.
- `freshness/detector.py` `confidence` is a per-staleness-status threshold. Don't touch.
- `classify/` axis confidence is unrelated. Don't touch.

## Done-state self-check

- `grep -rn "importance_score\|memory\.confidence" src/ | grep -v "graph\|recommend\|freshness\|classify"` returns zero hits
- `python3 -c "from lore import importance"` raises ImportError
- `pytest tests/ --ignore=tests/integration` is fully green
- `ruff check src/ tests/` is clean
- migration runs cleanly on a fresh DB
86 changes: 86 additions & 0 deletions migrations/025_drop_quality_score_columns.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
-- Migration 025: drop importance_score and confidence columns from memories.
--
-- Both columns carried mechanical defaults that no caller ever overrode:
-- * importance_score = 1.0 (schema default; INSERT statements never
-- populated it before PR #295038e; the recall-side multiplier was a
-- no-op since every row had identical 1.0).
-- * confidence = 0.5 hardcoded for observations (services/observations.py),
-- uncalibrated user-supplied number for lessons; never used in ranking,
-- only echoed back in API responses.
--
-- Lore's recall stack already does cosine-similarity + ts_rank/FTS hybrid
-- scoring with optional fts_weight per profile. The dropped columns added
-- noise (UI rendered a flat "Importance 100% / Confidence 50%" on every
-- memory) without changing any actual ranking on existing data.
--
-- Industry consensus across mem0 / Letta / LangMem / Zep-graphiti / Cognee
-- is to skip per-memory quality scores entirely.
--
-- Out of scope (DO NOT confuse with these): graph-table per-relationship
-- ``confidence`` and ``weight`` columns live in entities/mentions/
-- relationships and are unaffected.

-- Indexes first — keeping them after a column drop would leave dangling
-- references in some DB engines.
DROP INDEX IF EXISTS idx_lessons_importance;
DROP INDEX IF EXISTS idx_memories_importance;
DROP INDEX IF EXISTS idx_memories_confidence;

-- The legacy ``lessons`` view (created in 009_rename_to_memories.sql)
-- references both columns; CASCADE drops it together with its rewrite
-- rules so the ALTER TABLE below succeeds. We recreate the view at the
-- end without the dropped columns.
DROP VIEW IF EXISTS lessons CASCADE;

-- Drop the on-access trigger that recomputed importance on read. Names
-- vary — try the variants we've used historically.
DROP TRIGGER IF EXISTS on_update_importance ON memories;
DROP TRIGGER IF EXISTS memories_recompute_importance_on_access ON memories;
DROP FUNCTION IF EXISTS update_importance_score() CASCADE;
DROP FUNCTION IF EXISTS memories_recompute_importance() CASCADE;

ALTER TABLE memories DROP COLUMN IF EXISTS importance_score;
ALTER TABLE memories DROP COLUMN IF EXISTS confidence;

-- Recreate the read/write lessons compatibility view (minus the dropped
-- columns) so any legacy /v1/lessons callers that still query it keep
-- working. Mirrors the original CREATE VIEW + rules in migration 009.
CREATE OR REPLACE VIEW lessons AS
SELECT id, org_id,
content AS problem,
context AS resolution,
NULL::text AS context,
tags, source, project, embedding,
created_at, updated_at, expires_at,
upvotes, downvotes, meta,
access_count, last_accessed_at,
reputation_score, quality_signals
FROM memories;

CREATE OR REPLACE RULE lessons_insert AS ON INSERT TO lessons
DO INSTEAD
INSERT INTO memories (id, org_id, content, context, tags, source, project,
embedding, created_at, updated_at, expires_at, upvotes, downvotes, meta)
VALUES (NEW.id, NEW.org_id, NEW.problem, NEW.resolution, NEW.tags,
NEW.source, NEW.project, NEW.embedding, NEW.created_at, NEW.updated_at,
NEW.expires_at, NEW.upvotes, NEW.downvotes, NEW.meta);

CREATE OR REPLACE RULE lessons_update AS ON UPDATE TO lessons
DO INSTEAD
UPDATE memories SET
content = NEW.problem,
context = NEW.resolution,
tags = NEW.tags,
source = NEW.source,
project = NEW.project,
embedding = NEW.embedding,
updated_at = NEW.updated_at,
expires_at = NEW.expires_at,
upvotes = NEW.upvotes,
downvotes = NEW.downvotes,
meta = NEW.meta
WHERE id = OLD.id;

CREATE OR REPLACE RULE lessons_delete AS ON DELETE TO lessons
DO INSTEAD
DELETE FROM memories WHERE id = OLD.id;
34 changes: 34 additions & 0 deletions migrations_sqlite/025_drop_quality_score_columns.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
-- Migration 025 (SQLite): drop importance_score and confidence columns.
-- See migrations/025_drop_quality_score_columns.sql for rationale.
--
-- The memories table got its current name in 009_rename_to_memories.sql
-- (formerly ``lessons``). The migration runner only applies once per file,
-- so we target ``memories`` directly.
--
-- SQLite 3.35.0+ (Mar 2021) supports native ``ALTER TABLE DROP COLUMN``
-- but refuses to drop a column referenced by indexes or views. Drop the
-- ``lessons`` compatibility view (recreated below without the dropped
-- columns) and the per-column indexes before the ALTERs.

DROP VIEW IF EXISTS lessons;
DROP INDEX IF EXISTS idx_lessons_importance;
DROP INDEX IF EXISTS idx_memories_importance;
DROP INDEX IF EXISTS idx_memories_confidence;

ALTER TABLE memories DROP COLUMN importance_score;
ALTER TABLE memories DROP COLUMN confidence;

-- Recreate the read-only lessons view minus the dropped columns. Keeps the
-- legacy ``problem``/``resolution``/``context`` shape for any old callers
-- that still hit the view.
CREATE VIEW IF NOT EXISTS lessons AS
SELECT id, org_id,
content AS problem,
context AS resolution,
NULL AS context,
tags, source, project,
created_at, updated_at, expires_at,
upvotes, downvotes, meta,
access_count, last_accessed_at,
reputation_score, quality_signals
FROM memories;
23 changes: 3 additions & 20 deletions src/lore/async_lore.py
Original file line number Diff line number Diff line change
Expand Up @@ -398,7 +398,6 @@ async def remember(
source: Optional[str] = None,
embedding: Optional[Sequence[float]] = None,
context: Optional[str] = None,
confidence: float = 0.5,
meta: Optional[dict[str, Any]] = None,
) -> StoredMemory:
"""Store a memory. Returns the persisted ``StoredMemory``.
Expand All @@ -415,7 +414,6 @@ async def remember(
embedding=vec,
context=context,
tags=tuple(tags or ()),
confidence=confidence,
source=source,
project=project,
meta=meta or {},
Expand Down Expand Up @@ -852,31 +850,17 @@ async def enrich_memories(
)

async def cleanup_expired(
self, importance_threshold: Optional[float] = None # noqa: ARG002 - parity
self, decay_threshold: Optional[float] = None # noqa: ARG002 - parity
) -> int:
"""Purge expired memories (TTL-based). Returns rowcount.

``importance_threshold`` is accepted for parity with the sync
``decay_threshold`` is accepted for parity with the sync
``Lore`` API but is currently ignored — the async layer doesn't
track importance_score outside of recall scoring. Phase 4C will
re-introduce importance-based pruning if it proves needed.
do decay-based pruning at this level.
"""
store = self._require_store()
return await store.expire_memories()

async def recalculate_importance(
self, project: Optional[str] = None # noqa: ARG002 - parity
) -> int:
"""Recompute importance for every memory in scope.

The async Store recomputes importance on every read/recall/vote
already, so this is currently a no-op that returns 0. Kept for
symmetry with the sync class so callers can switch implementations
without code changes.
"""
self._require_store()
return 0

# ── Phase 4B: classify + as_prompt ──────────────────────────────────

async def classify(self, text: str) -> "Classification":
Expand Down Expand Up @@ -947,7 +931,6 @@ async def as_prompt(
updated_at=h.updated_at.isoformat() if h.updated_at else "",
ttl=None,
expires_at=h.expires_at.isoformat() if h.expires_at else None,
confidence=h.confidence,
)
score = getattr(h, "score", 0.0)
results.append(RecallResult(memory=mem, score=score, verbatim=verbatim))
Expand Down
9 changes: 1 addition & 8 deletions src/lore/cli/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,6 @@ def build_parser() -> argparse.ArgumentParser:
p.add_argument("--context", default=None, help="Additional context for the memory")
p.add_argument("--ttl", type=int, default=None, help="Time-to-live in seconds")
p.add_argument("--source", default=None)
p.add_argument("--confidence", type=float, default=1.0)
p.add_argument("--project", default=None, help="Project namespace")
p.add_argument("--metadata", default=None, help="JSON metadata (e.g. '{\"key\": \"val\"}')")

Expand Down Expand Up @@ -112,10 +111,6 @@ def build_parser() -> argparse.ArgumentParser:
help="Filter by memory tier",
)
p.add_argument("--limit", type=int, default=None)
p.add_argument(
"--sort", type=str, choices=["created", "importance"],
default="created", help="Sort order (default: created)",
)

# stats
sub.add_parser("stats", help="Show memory statistics")
Expand Down Expand Up @@ -569,11 +564,9 @@ def build_parser() -> argparse.ArgumentParser:
p_restore.add_argument("--input", "-i", required=True, dest="input_file", help="Path to backup JSON file")

# retention
p_ret = sub.add_parser("retention", help="Apply retention policy to remove old low-importance memories")
p_ret = sub.add_parser("retention", help="Apply retention policy to remove old memories")
p_ret.add_argument("--max-age-days", type=int, default=90, dest="max_age_days",
help="Delete memories older than N days (default: 90)")
p_ret.add_argument("--min-importance", type=float, default=0.3, dest="min_importance",
help="Only delete memories below this importance score (default: 0.3)")
p_ret.add_argument("--archive", action="store_true", default=False,
help="Export affected memories to JSON before deleting")
p_ret.add_argument("--dry-run", action="store_true", default=False, dest="dry_run",
Expand Down
Loading
Loading