Skip to content

Fix: Document deletion now properly updates the stale knowledge filter source count#1825

Open
Vchen7629 wants to merge 8 commits into
langflow-ai:mainfrom
Vchen7629:stale-knowledge-filter-source-count-fix
Open

Fix: Document deletion now properly updates the stale knowledge filter source count#1825
Vchen7629 wants to merge 8 commits into
langflow-ai:mainfrom
Vchen7629:stale-knowledge-filter-source-count-fix

Conversation

@Vchen7629

@Vchen7629 Vchen7629 commented Jun 10, 2026

Copy link
Copy Markdown

Summary

Fixes #1254. Knowledge filter containing specific documents used to show stale source counts even though the document was deleted and would show inflated counts if more sources are added even after the page was refreshed. This is because the source count was computed from query_data.filters.data_sources, a saved snapshot of filenames that's never pruned when a document is deleted.

Demo

screen-capture.3.webm

Changes

  • search_knowledge_filters now computes active_source_count per filter by checking which of its data_sources filenames still have indexed documents (one batched OpenSearch query via the admin client, so the count is consistent for every viewer of a shared filter)
  • Added build_existing_filenames_agg_body helper in opensearch_queries.py
  • knowledge-filter-list.tsx badge now uses active_source_count, falling back to the old .length if absent
  • knowledge-filter-panel.tsx source dropdown now filters its selected sources against the currently-active sources before display, so its count matches too
  • Added unit tests for search_knowledge_filters

Summary by CodeRabbit

  • New Features

    • Knowledge filters now show an active source count indicating how many referenced data sources have indexed documents.
    • Filter panels preserve and validate selected sources, preventing selection of unavailable sources.
  • Bug Fixes / Reliability

    • Counting of active sources is resilient to malformed filter data and backend issues; failures log warnings without breaking results.
  • Tests

    • Added tests covering deleted sources, malformed filter data, and duplicate source handling.

@github-actions github-actions Bot added community frontend 🟨 Issues related to the UI/UX backend 🔷 Issues related to backend services (OpenSearch, Langflow, APIs) tests and removed community labels Jun 10, 2026
@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: df19f020-52a4-47f9-9a2d-cc9c0bb6d154

📥 Commits

Reviewing files that changed from the base of the PR and between ef7da2d and e643636.

📒 Files selected for processing (3)
  • src/services/knowledge_filter_service.py
  • src/utils/opensearch_queries.py
  • tests/unit/test_knowledge_filter_service.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/utils/opensearch_queries.py
  • tests/unit/test_knowledge_filter_service.py

Walkthrough

Backend now computes an optional per-filter active_source_count by aggregating existing filenames in OpenSearch; the API type includes this field, frontend components display and sanitize selection using it, and tests cover deleted sources, malformed query_data, and filename deduplication.

Changes

Active Source Count Computation and Display

Layer / File(s) Summary
API Type Contract and Backend Utility
frontend/app/api/queries/useGetFiltersSearchQuery.ts, src/utils/opensearch_queries.py
KnowledgeFilter now has optional active_source_count?: number. Adds build_existing_filenames_agg_body(filenames) to produce an OpenSearch aggregation body for checking which filenames have indexed chunks.
Service Active Count Computation
src/services/knowledge_filter_service.py
search_knowledge_filters() parses each filter's query_data for filters.data_sources, skips wildcard/empty cases, queries the admin OpenSearch aggregation to determine which filenames exist, and attaches active_source_count to each filter; errors are logged and do not prevent returning results.
Frontend Display and Form Validation
frontend/components/knowledge-filter-list.tsx, frontend/components/knowledge-filter-panel.tsx
KnowledgeFilterList prefers filter.active_source_count when showing source counts and falls back to parsed data_sources length. KnowledgeFilterPanel preserves wildcard ("*") selections and sanitizes non-wildcard selections against available sourceOptions.
Test Helpers and Coverage
tests/unit/test_knowledge_filter_service.py
Introduces shared test helpers and new tests verifying active_source_count behavior for deleted sources (0), malformed query_data (omit count, no failure), and filename deduplication across filters.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant Service as search_knowledge_filters
  participant AdminOpenSearch as AdminOpenSearch
  participant Frontend as KnowledgeFilterList
  Client->>Service: request filters
  Service->>AdminOpenSearch: aggregation search for filenames
  AdminOpenSearch-->>Service: aggregation buckets (existing filenames)
  Service-->>Client: filters with active_source_count
  Client->>Frontend: render filters
  Frontend-->>Frontend: display active_source_count (or fallback)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

bug

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main fix: knowledge filter source counts now properly reflect actual indexed documents after deletions, which is the core problem addressed.
Linked Issues check ✅ Passed All coding requirements from issue #1254 are met: backend now computes active_source_count from indexed documents, frontend displays counts using this value, and stale counts are resolved.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing issue #1254: backend aggregation logic, frontend display updates, and corresponding tests. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
src/utils/opensearch_queries.py (1)

36-50: 💤 Low value

Consider validating or documenting the empty list precondition.

The terms query will produce an invalid OpenSearch request if filenames is empty. While the caller in knowledge_filter_service.py (line 136) checks if all_filenames before calling this function, the utility should either validate the input or document the precondition in the docstring.

🛡️ Proposed validation
 def build_existing_filenames_agg_body(filenames: list[str]) -> dict:
     """
     build a search body for checking which of the given filenames currently have indexed chunks
 
     Args:
-        filenames: Filenames to check for existance
+        filenames: Filenames to check for existence (must be non-empty)
 
     Returns:
         A dict containing the complete OpenSearch search body
     """
+    if not filenames:
+        raise ValueError("filenames list must not be empty")
     return {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/utils/opensearch_queries.py` around lines 36 - 50, The function
build_existing_filenames_agg_body should validate that the filenames list is not
empty (the OpenSearch terms query is invalid for empty lists); add an explicit
check at the start of build_existing_filenames_agg_body to raise a ValueError
(or similar) with a clear message like "filenames must be a non-empty list" when
filenames is empty, and update the function docstring to state the non-empty
precondition so callers (e.g., knowledge_filter_service invoking
build_existing_filenames_agg_body) are clear about the requirement.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/services/knowledge_filter_service.py`:
- Around line 117-118: Replace the stdlib logging import with get_logger from
utils.logging_config: remove "import logging", add "from utils.logging_config
import get_logger" at the top, create a module logger via "logger =
get_logger(__name__)", and change all uses of the stdlib logging object in this
file (e.g., any calls around the knowledge filter functions such as
logging.info, logging.error, logging.exception at/near where the file currently
references logging) to use logger.info / logger.error / logger.exception
respectively so the module uses get_logger consistently.

In `@src/utils/opensearch_queries.py`:
- Line 41: Fix the typo in the docstring for the parameter "filenames" in
src/utils/opensearch_queries.py by changing "existance" to "existence" so the
docstring reads "filenames: Filenames to check for existence"; locate the
docstring that documents the filenames parameter (near the function or method
that accepts filenames) and update the word only.

In `@tests/unit/test_knowledge_filter_service.py`:
- Around line 160-173: The loop that computes active_source_count in the search
path is parsing each filter's query_data inside a single try/except, so one
malformed JSON aborts the whole loop; in the method that builds results for
search_knowledge_filters (the loop that processes each knowledge filter in
KnowledgeFilterService), move the JSON parsing of filter.query_data into an
inner per-filter try/except so a JSONDecodeError only skips computing
active_source_count for that specific filter (log/debug the parse error and
continue), leaving other valid filters to still get active_source_count; keep
the malformed filter in results but without active_source_count.

---

Nitpick comments:
In `@src/utils/opensearch_queries.py`:
- Around line 36-50: The function build_existing_filenames_agg_body should
validate that the filenames list is not empty (the OpenSearch terms query is
invalid for empty lists); add an explicit check at the start of
build_existing_filenames_agg_body to raise a ValueError (or similar) with a
clear message like "filenames must be a non-empty list" when filenames is empty,
and update the function docstring to state the non-empty precondition so callers
(e.g., knowledge_filter_service invoking build_existing_filenames_agg_body) are
clear about the requirement.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a4de7517-ee78-498a-8fb3-261df7f19c04

📥 Commits

Reviewing files that changed from the base of the PR and between 8530ab0 and ef7da2d.

📒 Files selected for processing (6)
  • frontend/app/api/queries/useGetFiltersSearchQuery.ts
  • frontend/components/knowledge-filter-list.tsx
  • frontend/components/knowledge-filter-panel.tsx
  • src/services/knowledge_filter_service.py
  • src/utils/opensearch_queries.py
  • tests/unit/test_knowledge_filter_service.py

Comment thread src/services/knowledge_filter_service.py Outdated
Comment thread src/utils/opensearch_queries.py Outdated
Comment thread tests/unit/test_knowledge_filter_service.py Outdated
@Vchen7629

Copy link
Copy Markdown
Author

This still requires the user to refresh the page for the sources count badge in knowledge-filter-list.tsx for the count to be up to date

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend 🔷 Issues related to backend services (OpenSearch, Langflow, APIs) frontend 🟨 Issues related to the UI/UX tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Knowledge filter shows 1 source even when filter contains no documents

1 participant