Fix: Document deletion now properly updates the stale knowledge filter source count#1825
Fix: Document deletion now properly updates the stale knowledge filter source count#1825Vchen7629 wants to merge 8 commits into
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (2)
WalkthroughBackend now computes an optional per-filter ChangesActive Source Count Computation and Display
Sequence Diagram(s)sequenceDiagram
participant Client
participant Service as search_knowledge_filters
participant AdminOpenSearch as AdminOpenSearch
participant Frontend as KnowledgeFilterList
Client->>Service: request filters
Service->>AdminOpenSearch: aggregation search for filenames
AdminOpenSearch-->>Service: aggregation buckets (existing filenames)
Service-->>Client: filters with active_source_count
Client->>Frontend: render filters
Frontend-->>Frontend: display active_source_count (or fallback)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
src/utils/opensearch_queries.py (1)
36-50: 💤 Low valueConsider validating or documenting the empty list precondition.
The
termsquery will produce an invalid OpenSearch request iffilenamesis empty. While the caller inknowledge_filter_service.py(line 136) checksif all_filenamesbefore calling this function, the utility should either validate the input or document the precondition in the docstring.🛡️ Proposed validation
def build_existing_filenames_agg_body(filenames: list[str]) -> dict: """ build a search body for checking which of the given filenames currently have indexed chunks Args: - filenames: Filenames to check for existance + filenames: Filenames to check for existence (must be non-empty) Returns: A dict containing the complete OpenSearch search body """ + if not filenames: + raise ValueError("filenames list must not be empty") return {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/utils/opensearch_queries.py` around lines 36 - 50, The function build_existing_filenames_agg_body should validate that the filenames list is not empty (the OpenSearch terms query is invalid for empty lists); add an explicit check at the start of build_existing_filenames_agg_body to raise a ValueError (or similar) with a clear message like "filenames must be a non-empty list" when filenames is empty, and update the function docstring to state the non-empty precondition so callers (e.g., knowledge_filter_service invoking build_existing_filenames_agg_body) are clear about the requirement.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/services/knowledge_filter_service.py`:
- Around line 117-118: Replace the stdlib logging import with get_logger from
utils.logging_config: remove "import logging", add "from utils.logging_config
import get_logger" at the top, create a module logger via "logger =
get_logger(__name__)", and change all uses of the stdlib logging object in this
file (e.g., any calls around the knowledge filter functions such as
logging.info, logging.error, logging.exception at/near where the file currently
references logging) to use logger.info / logger.error / logger.exception
respectively so the module uses get_logger consistently.
In `@src/utils/opensearch_queries.py`:
- Line 41: Fix the typo in the docstring for the parameter "filenames" in
src/utils/opensearch_queries.py by changing "existance" to "existence" so the
docstring reads "filenames: Filenames to check for existence"; locate the
docstring that documents the filenames parameter (near the function or method
that accepts filenames) and update the word only.
In `@tests/unit/test_knowledge_filter_service.py`:
- Around line 160-173: The loop that computes active_source_count in the search
path is parsing each filter's query_data inside a single try/except, so one
malformed JSON aborts the whole loop; in the method that builds results for
search_knowledge_filters (the loop that processes each knowledge filter in
KnowledgeFilterService), move the JSON parsing of filter.query_data into an
inner per-filter try/except so a JSONDecodeError only skips computing
active_source_count for that specific filter (log/debug the parse error and
continue), leaving other valid filters to still get active_source_count; keep
the malformed filter in results but without active_source_count.
---
Nitpick comments:
In `@src/utils/opensearch_queries.py`:
- Around line 36-50: The function build_existing_filenames_agg_body should
validate that the filenames list is not empty (the OpenSearch terms query is
invalid for empty lists); add an explicit check at the start of
build_existing_filenames_agg_body to raise a ValueError (or similar) with a
clear message like "filenames must be a non-empty list" when filenames is empty,
and update the function docstring to state the non-empty precondition so callers
(e.g., knowledge_filter_service invoking build_existing_filenames_agg_body) are
clear about the requirement.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a4de7517-ee78-498a-8fb3-261df7f19c04
📒 Files selected for processing (6)
frontend/app/api/queries/useGetFiltersSearchQuery.tsfrontend/components/knowledge-filter-list.tsxfrontend/components/knowledge-filter-panel.tsxsrc/services/knowledge_filter_service.pysrc/utils/opensearch_queries.pytests/unit/test_knowledge_filter_service.py
|
This still requires the user to refresh the page for the sources count badge in knowledge-filter-list.tsx for the count to be up to date |
Summary
Fixes #1254. Knowledge filter containing specific documents used to show stale source counts even though the document was deleted and would show inflated counts if more sources are added even after the page was refreshed. This is because the source count was computed from query_data.filters.data_sources, a saved snapshot of filenames that's never pruned when a document is deleted.
Demo
screen-capture.3.webm
Changes
search_knowledge_filtersnow computesactive_source_countper filter by checking which of itsdata_sourcesfilenames still have indexed documents (one batched OpenSearch query via the admin client, so the count is consistent for every viewer of a shared filter)build_existing_filenames_agg_bodyhelper inopensearch_queries.pyknowledge-filter-list.tsxbadge now usesactive_source_count, falling back to the old.lengthif absentknowledge-filter-panel.tsxsource dropdown now filters its selected sources against the currently-active sources before display, so its count matches toosearch_knowledge_filtersSummary by CodeRabbit
New Features
Bug Fixes / Reliability
Tests