Skip to content

feat(meta-rules): weight graduation by applicability#231

Open
Gradata wants to merge 1 commit into
mainfrom
feat/meta-rule-applicability-score
Open

feat(meta-rules): weight graduation by applicability#231
Gradata wants to merge 1 commit into
mainfrom
feat/meta-rule-applicability-score

Conversation

@Gradata
Copy link
Copy Markdown
Owner

@Gradata Gradata commented May 29, 2026

Summary

  • add applicability-weighted meta-rule final scoring with GRADATA_APPLICABILITY_WEIGHT defaulting to 0.5
  • persist/load applicability_observed_count and final_score for meta rules
  • add migration 006 for applicability columns
  • add tests proving niche rules score below broad rules at equal confidence and legacy missing data preserves confidence-only behavior

Test Plan

  • python3 -m pytest Gradata/tests/test_meta_rules.py -q

Closes: ba83b6dd-8301-48cb-bb7d-e6b313346460
Paperclip: GRA-1296

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

Review Change Stack

📝 Walkthrough
  • Add applicability-weighted meta-rule scoring: combines confidence and applicability_observed_count via new GRADATA_APPLICABILITY_WEIGHT env var (defaults to 0.5)
  • New MetaRule fields: applicability_observed_count (int | None) and final_score (float | None) for tracking observed applicability and final weighted scores
  • Database migration 006 adds applicability_observed_count column to meta_rules table with backward-compatible schema detection
  • New public API methods: applicability_norm(), applicability_score(), meta_rule_final_score(), meta_rule_applies_to_context(), count_meta_rule_applicability()
  • Meta-rule ranking updated to use weighted final_score instead of raw confidence; niche rules now score below broad rules at equal confidence
  • SQLite persistence layer extended to save/load new applicability_observed_count field with schema-aware migration
  • Backward-compatible: when applicability data is missing, scoring preserves legacy confidence-only behavior
  • Test coverage added for applicability weighting behavior and backward compatibility scenarios

Walkthrough

This PR introduces applicability-weighted scoring for meta-rules. The MetaRule dataclass gains applicability_observed_count and final_score fields. New utilities compute normalized applicability and combine it with confidence via weighted scoring. Storage layer, SQLite migration, and discovery/ranking pipelines are updated to persist, detect, and use applicability-based scores.

Changes

Applicability-weighted meta-rule scoring

Layer / File(s) Summary
MetaRule data model and applicability scoring utilities
src/gradata/enhancements/meta_rules.py
Adds applicability_observed_count and final_score fields to MetaRule. Introduces applicability_norm() to normalize observed counts over a window, applicability_score() to weight confidence with applicability via environment-controlled weighting, meta_rule_final_score() to resolve effective rank score, and helpers to detect if a meta-rule applies to a context and count applicability matches across session contexts.
Storage layer: schema, persistence, and loading
src/gradata/enhancements/meta_rules_storage.py
Extends meta_rules table DDL with applicability_observed_count column. Updates save_meta_rules() to persist the field in INSERT statements. Extends load_meta_rules() to probe schema and retrieve the optional column, mapping values into reconstructed MetaRule objects. Adds migration SQL constant for backward-compatible schema updates.
SQLite migration: detection, application, and tracking
src/gradata/_migrations/006_meta_rule_applicability.py, src/gradata/_migrations/__init__.py
Introduces numbered migration 006_meta_rule_applicability with plan() to detect schema state, up() to conditionally add the column, and _main() CLI runner for validation, pragma configuration, migration tracking, and application. Registers migration in inline list.
Discovery and ranking: applicability-aware sorting and weighting
src/gradata/enhancements/meta_rules.py
discover_meta_rules() accepts optional applicability_contexts and applicability_window_sessions, conditionally populates final_score, and sorts by meta_rule_final_score(). rank_meta_rules_by_context() weights meta_rule_final_score() by context multiplier. refresh_meta_rules() parses applicability inputs, wraps metas with applicability scoring when data present, and sorts by final score.
Test coverage: persistence, ranking, and backward compatibility
tests/test_meta_rules.py
Extends SQLite round-trip test to persist and assert applicability_observed_count. Adds test verifying broader applicability ranks higher at equal confidence, and test validating fallback to confidence-only scoring when applicability data absent.

Possibly related PRs

  • Gradata/gradata#102: Both PRs modify meta_rules table schema and migration logic; this PR's applicability_observed_count column work builds on the schema-migration patterns established in #102.

Suggested labels

feature

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(meta-rules): weight graduation by applicability' accurately summarizes the main change: adding applicability-weighted scoring to meta-rules as demonstrated throughout the changeset.
Description check ✅ Passed The description is directly related to the changeset, providing a clear summary of applicability-weighted scoring, persistence, migration, and testing—all present in the modified files.
Docstring Coverage ✅ Passed Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/meta-rule-applicability-score

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 OpenGrep (1.22.0)

OpenGrep fatal error (exit code 2):
┌──────────────┐
│ Opengrep CLI │
└──────────────┘

�[32m✔�[39m �[1mOpengrep OSS�[0m
�[32m✔�[39m Basic security coverage for first-party code vulnerabilities.

�[1m Loading rules from local config...�[0m
[00.17][ERROR]: Error: exception Glob.Lexer.Syntax_error("malformed glob pattern: missing ']'")
Raised at Glob__Lexer.syntax_error in file "libs/glob/Lexer.mll", line 8, characters 2-26
Called from Glob__Lexer.__ocaml_lex_token_rec in file "libs/glob/Lexer.mll", line 29, characters 26-53
Cal


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added the feature label May 29, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
Gradata/tests/test_meta_rules.py (1)

162-200: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add a round-trip assertion for final_score once persisted.

This test now covers applicability_observed_count; adding final_score here will prevent regressions in storage contract for ranking behavior.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/tests/test_meta_rules.py` around lines 162 - 200, The
test_sqlite_roundtrip currently asserts persisted fields like
applicability_observed_count but misses verifying MetaRule.final_score; update
the test to include a round-trip assertion that after calling
save_meta_rules(db_path, metas) and load_meta_rules(db_path) the loaded MetaRule
objects have the expected final_score values (and maintain sort order by
final_score if that's the intended ranking). Locate the MetaRule instances
created in test_sqlite_roundtrip and add assertions comparing
loaded[0].final_score and loaded[1].final_score to the expected computed scores
(or at least that loaded[0].final_score >= loaded[1].final_score) so storage and
ranking behavior for final_score is validated after save/load via
save_meta_rules and load_meta_rules.
Gradata/src/gradata/enhancements/meta_rules_storage.py (1)

40-55: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Persist/load MetaRule.final_score as part of the schema update.

MetaRule now carries final_score, but this file only stores/loads applicability_observed_count. The computed rank signal is lost after round-trip (Line 124+ insert, Line 191+ select). This breaks data continuity and misses the stated migration objective.

Suggested patch (schema + save + load)
 _CREATE_TABLE_SQL = """
 CREATE TABLE IF NOT EXISTS meta_rules (
@@
     context_weights TEXT,
     applicability_observed_count INTEGER,
+    final_score REAL,
     tenant_id TEXT,
     visibility TEXT DEFAULT 'private'
 );
 """
@@
 _ADD_APPLICABILITY_OBSERVED_COUNT_SQL = (
     "ALTER TABLE meta_rules ADD COLUMN applicability_observed_count INTEGER"
 )
+_ADD_FINAL_SCORE_SQL = "ALTER TABLE meta_rules ADD COLUMN final_score REAL"
@@
         for stmt in (
@@
             _ADD_APPLICABILITY_OBSERVED_COUNT_SQL,
+            _ADD_FINAL_SCORE_SQL,
             "ALTER TABLE meta_rules ADD COLUMN tenant_id TEXT",
@@
                    scope, examples, context_weights, applies_when, never_when,
-                    transfer_scope, source, applicability_observed_count,
+                    transfer_scope, source, applicability_observed_count, final_score,
                     tenant_id, visibility)
-                   VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 'private')""",
+                   VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 'private')""",
                 (
@@
                     meta.source,
                     meta.applicability_observed_count,
+                    meta.final_score,
                     _tid,
                 ),
             )
@@
         applicability_expr = (
             "applicability_observed_count"
             if "applicability_observed_count" in existing_cols
             else "NULL AS applicability_observed_count"
         )
+        final_score_expr = "final_score" if "final_score" in existing_cols else "NULL AS final_score"
@@
-                      transfer_scope, {source_expr}, {applicability_expr}
+                      transfer_scope, {source_expr}, {applicability_expr}, {final_score_expr}
@@
-                    applicability_observed_count=row[14],
+                    applicability_observed_count=row[14],
+                    final_score=row[15],
                 )

Also applies to: 124-130, 184-195, 222-223

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/src/gradata/enhancements/meta_rules_storage.py` around lines 40 - 55,
The schema and persistence code in meta_rules_storage.py omitted
MetaRule.final_score so it isn't round-tripped; update the _CREATE_TABLE_SQL to
add a final_score REAL column, update the save/insert logic (the block that
inserts into meta_rules—refer to the INSERT code around where
applicability_observed_count is written) to include final_score in the INSERT
columns and values, and update the load/select logic (the code that SELECTs rows
and constructs MetaRule objects) to read final_score from the result set and
assign it to MetaRule.final_score so the computed rank persists across
migrations and reloads.
Gradata/src/gradata/enhancements/meta_rules.py (1)

470-471: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update stale return-contract wording in discover_meta_rules docstring.

Line 470 still states sorting by confidence, but Line 519 now sorts by meta_rule_final_score. This can mislead callers reading API docs.

Also applies to: 519-519

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/src/gradata/enhancements/meta_rules.py` around lines 470 - 471,
Update the docstring for discover_meta_rules to reflect the current return
contract: replace the outdated "sorted by confidence" wording with a precise
statement that results are sorted by meta_rule_final_score (and return an empty
list when no cluster meets min_group_size); mention meta_rule_final_score and
min_group_size in the docstring so callers know the sorting key and the
empty-list behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@Gradata/src/gradata/enhancements/meta_rules_storage.py`:
- Around line 40-55: The schema and persistence code in meta_rules_storage.py
omitted MetaRule.final_score so it isn't round-tripped; update the
_CREATE_TABLE_SQL to add a final_score REAL column, update the save/insert logic
(the block that inserts into meta_rules—refer to the INSERT code around where
applicability_observed_count is written) to include final_score in the INSERT
columns and values, and update the load/select logic (the code that SELECTs rows
and constructs MetaRule objects) to read final_score from the result set and
assign it to MetaRule.final_score so the computed rank persists across
migrations and reloads.

In `@Gradata/src/gradata/enhancements/meta_rules.py`:
- Around line 470-471: Update the docstring for discover_meta_rules to reflect
the current return contract: replace the outdated "sorted by confidence" wording
with a precise statement that results are sorted by meta_rule_final_score (and
return an empty list when no cluster meets min_group_size); mention
meta_rule_final_score and min_group_size in the docstring so callers know the
sorting key and the empty-list behavior.

In `@Gradata/tests/test_meta_rules.py`:
- Around line 162-200: The test_sqlite_roundtrip currently asserts persisted
fields like applicability_observed_count but misses verifying
MetaRule.final_score; update the test to include a round-trip assertion that
after calling save_meta_rules(db_path, metas) and load_meta_rules(db_path) the
loaded MetaRule objects have the expected final_score values (and maintain sort
order by final_score if that's the intended ranking). Locate the MetaRule
instances created in test_sqlite_roundtrip and add assertions comparing
loaded[0].final_score and loaded[1].final_score to the expected computed scores
(or at least that loaded[0].final_score >= loaded[1].final_score) so storage and
ranking behavior for final_score is validated after save/load via
save_meta_rules and load_meta_rules.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 781bd30e-1f14-43c5-9c3b-8f646d2cbb9e

📥 Commits

Reviewing files that changed from the base of the PR and between a197bff and 4afde4e.

📒 Files selected for processing (5)
  • Gradata/src/gradata/_migrations/006_meta_rule_applicability.py
  • Gradata/src/gradata/_migrations/__init__.py
  • Gradata/src/gradata/enhancements/meta_rules.py
  • Gradata/src/gradata/enhancements/meta_rules_storage.py
  • Gradata/tests/test_meta_rules.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: pytest (py3.12)
  • GitHub Check: pytest (py3.11)
  • GitHub Check: pytest windows-latest / py3.12
  • GitHub Check: pytest macos-latest / py3.11
  • GitHub Check: pytest ubuntu-latest / py3.12
  • GitHub Check: pytest macos-latest / py3.12
  • GitHub Check: pytest ubuntu-latest / py3.11
  • GitHub Check: pytest windows-latest / py3.11
🧰 Additional context used
📓 Path-based instructions (2)
Gradata/src/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/src/**/*.py: Prefer sentence-transformers for local embeddings, google-genai for Gemini embeddings, cryptography for AES-GCM encrypted system.db, bm25s for BM25 rule ranking, and mem0ai for external memory adapters — guard all optional dependency imports with try / except ImportError at the call site, never at module level
Maintain strict layering: Layer 0 (Primitives: _types.py, _db.py, _events.py, _paths.py, _file_lock.py; Patterns: contrib/patterns/) must never import from Layer 1 (Enhancements: enhancements/, rules/) or Layer 2 (Public API: brain.py, cli.py, daemon.py, mcp_server.py)
Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product
Never import from out-of-scope sibling directories ../Sprites/ or ../Hausgem/ within gradata/* code — that is a layering bug
Never leak private-sibling paths into public docs/code — no references to ../Sprites/, ../Hausgem/, email addresses, OneDrive paths, or Sprites-specific examples from inside gradata/*
Use atomic-write helper when writing JSON files to prevent corruption from mid-write crashes

Files:

  • Gradata/src/gradata/_migrations/__init__.py
  • Gradata/src/gradata/_migrations/006_meta_rule_applicability.py
  • Gradata/src/gradata/enhancements/meta_rules_storage.py
  • Gradata/src/gradata/enhancements/meta_rules.py
Gradata/tests/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/tests/**/*.py: Set BRAIN_DIR environment variable via tmp_path in conftest.py for test isolation — ensure _paths.py module cache refreshes when calling Brain.init() directly inside tests
Add unit tests in tests/test_*.py for every CI push without LLM calls (deterministic); mark integration tests with @pytest.mark.integration and skip them by default (they hit real LLM APIs)

Files:

  • Gradata/tests/test_meta_rules.py
🔇 Additional comments (4)
Gradata/src/gradata/enhancements/meta_rules.py (1)

35-35: LGTM!

Also applies to: 85-91, 100-197, 503-518, 766-767, 807-830

Gradata/src/gradata/_migrations/006_meta_rule_applicability.py (1)

1-85: LGTM!

Gradata/src/gradata/_migrations/__init__.py (1)

98-98: LGTM!

Gradata/tests/test_meta_rules.py (1)

32-39: LGTM!

Also applies to: 178-179, 199-199, 205-255

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant