Skip to content

test: add event filtering and non-code file tests for realtime updater#485

Merged
vitali87 merged 7 commits intomainfrom
test/realtime-updater-coverage
Mar 24, 2026
Merged

test: add event filtering and non-code file tests for realtime updater#485
vitali87 merged 7 commits intomainfrom
test/realtime-updater-coverage

Conversation

@vitali87
Copy link
Owner

Summary

Follow-up to PR #405 (realtime updater fixes by @bhargavchippada). Adds 13 tests covering the bug fixes and new functionality.

Tests added (13 total)

Event filtering (6):

  • MODIFIED, CREATED, DELETED events trigger graph updates
  • "opened", "closed_no_write", "access" events are ignored

Non-code file handling (4):

  • .md and .json files create File nodes
  • Deleting non-code file removes File node
  • Non-code files don't create Module nodes

Mixed sequences (2):

  • Rapid create/modify/delete on same file
  • Multiple files changed simultaneously

Query isolation (1):

  • DELETE_FILE only targets specific path

Test plan

  • 13 tests pass
  • Lint and format clean

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the test coverage for the realtime updater component. It introduces a suite of new tests to validate the updater's behavior across various file system events, ensuring that only relevant changes trigger graph updates and that non-code files are processed correctly without generating unnecessary module nodes. The new tests also cover complex scenarios involving rapid file changes and the precision of delete operations, contributing to a more robust and reliable codebase graph.

Highlights

  • Event Filtering Validation: Ensured that MODIFIED, CREATED, and DELETED file system events correctly trigger graph updates, while opened, closed_no_write, and access events are appropriately ignored by the realtime updater.
  • Non-Code File Processing: Verified that non-code files like Markdown and JSON are correctly handled, creating File nodes but not Module nodes, and that their deletion properly removes corresponding File nodes.
  • Mixed Event Sequence Handling: Added tests to confirm the updater's robustness when handling rapid sequences of create/modify/delete operations on the same file and when multiple files are changed simultaneously.
  • Delete Query Specificity: Confirmed that the DELETE_FILE Cypher query specifically targets the intended file path, preventing unintended deletions.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@vitali87
Copy link
Owner Author

@greptile

@vitali87
Copy link
Owner Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a comprehensive suite of tests for the real-time updater, covering event filtering, non-code file handling, and various event sequences. The tests are well-structured and cover the intended functionality thoroughly. My feedback focuses on making some of the assertions more specific to improve test robustness and clarity, ensuring that not just the number of database operations is correct, but also that the right queries are being executed with the correct parameters.

f = temp_repo / "app.py"
f.write_text("x = 1", encoding="utf-8")
handler.dispatch(FileModifiedEvent(str(f)))
assert mock_updater.ingestor.execute_write.call_count == 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current assertion only checks the total number of write operations. To make this test more robust and descriptive, it's better to assert which specific Cypher queries were executed and with what parameters. This ensures that not only the count is correct, but also that the right operations are being performed on the correct file.

Suggested change
assert mock_updater.ingestor.execute_write.call_count == 3
calls = mock_updater.ingestor.execute_write.call_args_list
assert len(calls) == 3
assert calls[0].args[0] == cs.CYPHER_DELETE_MODULE
assert calls[0].args[1] == {cs.KEY_PATH: "app.py"}
assert calls[1].args[0] == cs.CYPHER_DELETE_FILE
assert calls[1].args[1] == {cs.KEY_PATH: "app.py"}
assert calls[2].args[0] == cs.CYPHER_DELETE_CALLS

f = temp_repo / "new.py"
f.write_text("y = 2", encoding="utf-8")
handler.dispatch(FileCreatedEvent(str(f)))
assert mock_updater.ingestor.execute_write.call_count == 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the modified event test, asserting the specific queries and their parameters will make this test more robust. It verifies that the correct cleanup and reprocessing steps are being triggered for a file creation event.

Suggested change
assert mock_updater.ingestor.execute_write.call_count == 3
calls = mock_updater.ingestor.execute_write.call_args_list
assert len(calls) == 3
assert calls[0].args[0] == cs.CYPHER_DELETE_MODULE
assert calls[0].args[1] == {cs.KEY_PATH: "new.py"}
assert calls[1].args[0] == cs.CYPHER_DELETE_FILE
assert calls[1].args[1] == {cs.KEY_PATH: "new.py"}
assert calls[2].args[0] == cs.CYPHER_DELETE_CALLS

) -> None:
f = temp_repo / "gone.py"
handler.dispatch(FileDeletedEvent(str(f)))
assert mock_updater.ingestor.execute_write.call_count == 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve the test's specificity, it's better to assert the exact queries being called upon file deletion. This confirms that the correct DELETE operations are performed against the graph for the specified file.

Suggested change
assert mock_updater.ingestor.execute_write.call_count == 3
calls = mock_updater.ingestor.execute_write.call_args_list
assert len(calls) == 3
assert calls[0].args[0] == cs.CYPHER_DELETE_MODULE
assert calls[0].args[1] == {cs.KEY_PATH: "gone.py"}
assert calls[1].args[0] == cs.CYPHER_DELETE_FILE
assert calls[1].args[1] == {cs.KEY_PATH: "gone.py"}
assert calls[2].args[0] == cs.CYPHER_DELETE_CALLS

# (H) After delete, no re-parse or file node creation
mock_updater.factory.definition_processor.process_file.assert_not_called()
mock_updater.factory.structure_processor.process_generic_file.assert_not_called()
assert mock_updater.ingestor.execute_write.call_count == 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The assertion for the final delete event can be made more specific. By checking the exact queries and parameters, we can be more confident that the deletion part of this sequence behaves as expected, cleaning up the correct entities from the graph.

Suggested change
assert mock_updater.ingestor.execute_write.call_count == 3
calls = mock_updater.ingestor.execute_write.call_args_list
assert len(calls) == 3
assert calls[0].args[0] == cs.CYPHER_DELETE_MODULE
assert calls[0].args[1] == {cs.KEY_PATH: "ephemeral.py"}
assert calls[1].args[0] == cs.CYPHER_DELETE_FILE
assert calls[1].args[1] == {cs.KEY_PATH: "ephemeral.py"}
assert calls[2].args[0] == cs.CYPHER_DELETE_CALLS

handler.dispatch(FileModifiedEvent(str(f1)))
handler.dispatch(FileModifiedEvent(str(f2)))

assert mock_updater.ingestor.execute_write.call_count == 6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Instead of just checking the total number of write calls, it would be more robust to verify the count of each type of query. This ensures that the correct set of operations is performed for each file modification, even when events happen close together. Using collections.Counter can make this check clean and effective.

Suggested change
assert mock_updater.ingestor.execute_write.call_count == 6
from collections import Counter
queries_called = [
c.args[0] for c in mock_updater.ingestor.execute_write.call_args_list
]
counts = Counter(queries_called)
assert len(queries_called) == 6
assert counts[cs.CYPHER_DELETE_MODULE] == 2
assert counts[cs.CYPHER_DELETE_FILE] == 2
assert counts[cs.CYPHER_DELETE_CALLS] == 2

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 23, 2026

Greptile Summary

This PR adds 13 tests that cover the event-filtering and non-code-file bug-fixes from PR #405, and extends the orphan-node pruning logic to be project-scoped so that a multi-project graph database is not corrupted by false deletions.

Production changes

  • CYPHER_ALL_FILE_PATHS and CYPHER_ALL_FOLDER_PATHS now return absolute_path in addition to path, enabling the prune loop to confirm a node belongs to the current repo before deleting it.
  • CYPHER_ALL_MODULE_PATHS is renamed CYPHER_ALL_MODULE_PATHS_INTERNAL and gains a WHERE m.is_external IS NULL OR m.is_external = false guard, limiting pruning to modules ingested from local source code.
  • _prune_orphan_nodes now filters candidate orphans by absolute_path prefix (File/Folder) or qualified_name prefix (Module) before falling through to the path-existence check, preventing cross-project false deletions.

Test changes

  • test_graph_updater_pruning.py is rewritten to provide project-scoped fixture data that matches the new pruning signatures; a new test_prune_skips_other_projects test explicitly verifies that nodes from other projects are never deleted.
  • test_realtime_event_filtering.py is a new file with 13 tests covering MODIFIED / CREATED / DELETED event processing, ignored event types ("opened", "closed_no_write", "access"), non-code file handling (.md, .json), rapid multi-event sequences, and Cypher query isolation.

Notable design point: nodes stored without absolute_path (ingested before this change) bypass the absolute_path guard and remain subject to path-existence-only pruning. This is a known trade-off for backwards compatibility but means a first run after upgrade could still misidentify same-relative-path nodes from other projects as orphans.

Confidence Score: 4/5

  • Safe to merge; the production bugfix is well-covered and the one remaining comment is a non-blocking style improvement to test assertions.
  • The project-scoped pruning change is correct, tested, and addresses a real multi-project data-loss bug. Prior review threads already identified the two most important gaps (missing flush_all assertions on MODIFIED and DELETED test paths). The only new issue flagged here is the use of raw call-count assertions in TestEventFiltering instead of query-identity checks — a style improvement, not a correctness bug. All 13 new tests pass, lint is clean, and the refactored pruning tests align with the new fixture data.
  • Pay close attention to codebase_rag/tests/test_realtime_event_filtering.py (count-only assertions) and codebase_rag/graph_updater.py (backwards-compat note on nodes without absolute_path).

Important Files Changed

Filename Overview
codebase_rag/constants.py Added absolute_path to File/Folder queries and renamed CYPHER_ALL_MODULE_PATHSCYPHER_ALL_MODULE_PATHS_INTERNAL with an is_external guard — clean, minimal, backward-compatible change.
codebase_rag/graph_updater.py Project-scoped pruning logic added to _prune_orphan_nodes: File/Folder nodes are guarded by absolute_path prefix, Module nodes by qualified_name prefix. Nodes missing those fields fall through to path-existence check. Logic is correct but nodes without absolute_path (from older ingestion) remain subject to cross-repo path collisions.
codebase_rag/tests/test_graph_updater_pruning.py Rewritten to use the new project-scoped pruning signatures. Tests for cross-project skip (test_prune_skips_other_projects) and multi-type orphan deletion are new and valuable. Explicit single-type File and Folder orphan tests were removed — coverage lives only in the combined test_prune_multiple_orphans_across_types.
codebase_rag/tests/test_realtime_event_filtering.py 13 new tests covering event filtering, non-code file handling, mixed sequences, and query isolation. Uses autouse monkeypatch to bypass the QueryProtocol runtime check. Some TestEventFiltering tests rely on raw execute_write.call_count == 3 without verifying which 3 calls were made, making them slightly fragile.
realtime_updater.py Cosmetic-only change: multi-line execute_write call condensed to a single line. No behavioral change.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[_prune_orphan_nodes called] --> B[Build project_prefix & repo_abs]
    B --> C{For each prune_spec\nFile / Module / Folder}
    C --> D[fetch_all rows from graph]
    D --> E{path is valid string?}
    E -- No --> F[skip row]
    E -- Yes --> G{abs_path present\nand is string?}
    G -- Yes, not this repo --> F
    G -- No / same repo --> H{qualified_name present\nnon-empty?}
    H -- Yes, different project --> F
    H -- No / same project --> I{path exists on disk?}
    I -- Yes --> F
    I -- No --> J[add to orphans list]
    J --> K[execute_write DELETE query\nfor each orphan]
    K --> C
Loading

Comments Outside Diff (1)

  1. codebase_rag/tests/test_realtime_event_filtering.py, line 701-724 (link)

    P2 Count-only assertions obscure which writes occurred

    TestEventFiltering tests assert execute_write.call_count == 3 but never verify which three queries were issued. Compare this to TestCypherDeleteFileQuery.test_delete_file_only_targets_specific_path, which explicitly inspects each call.

    The count-based style has two problems:

    1. A regression that swaps one query for another leaves the count at 3 and passes silently.
    2. When the count changes (e.g. a new step is added to dispatch), the failure message assert 4 == 3 gives no indication of what changed.

    test_deleted_event_is_processed is the clearest example — it already checks that two factories are not called, so checking the three specific writes (CYPHER_DELETE_MODULE, CYPHER_DELETE_FILE, CYPHER_DELETE_CALLS) would be consistent and complete:

    write_queries = [c.args[0] for c in mock_updater.ingestor.execute_write.call_args_list]
    assert write_queries == [cs.CYPHER_DELETE_MODULE, cs.CYPHER_DELETE_FILE, cs.CYPHER_DELETE_CALLS]

    The same pattern applies to test_modified_event_is_processed (line 701) and test_multiple_files_changed (line 825).

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: codebase_rag/tests/test_realtime_event_filtering.py
    Line: 701-724
    
    Comment:
    **Count-only assertions obscure which writes occurred**
    
    `TestEventFiltering` tests assert `execute_write.call_count == 3` but never verify *which* three queries were issued. Compare this to `TestCypherDeleteFileQuery.test_delete_file_only_targets_specific_path`, which explicitly inspects each call.
    
    The count-based style has two problems:
    1. A regression that swaps one query for another leaves the count at 3 and passes silently.
    2. When the count changes (e.g. a new step is added to `dispatch`), the failure message `assert 4 == 3` gives no indication of what changed.
    
    `test_deleted_event_is_processed` is the clearest example — it already checks that two factories are *not* called, so checking the three specific writes (`CYPHER_DELETE_MODULE`, `CYPHER_DELETE_FILE`, `CYPHER_DELETE_CALLS`) would be consistent and complete:
    
    ```python
    write_queries = [c.args[0] for c in mock_updater.ingestor.execute_write.call_args_list]
    assert write_queries == [cs.CYPHER_DELETE_MODULE, cs.CYPHER_DELETE_FILE, cs.CYPHER_DELETE_CALLS]
    ```
    
    The same pattern applies to `test_modified_event_is_processed` (line 701) and `test_multiple_files_changed` (line 825).
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: codebase_rag/tests/test_realtime_event_filtering.py
Line: 701-724

Comment:
**Count-only assertions obscure which writes occurred**

`TestEventFiltering` tests assert `execute_write.call_count == 3` but never verify *which* three queries were issued. Compare this to `TestCypherDeleteFileQuery.test_delete_file_only_targets_specific_path`, which explicitly inspects each call.

The count-based style has two problems:
1. A regression that swaps one query for another leaves the count at 3 and passes silently.
2. When the count changes (e.g. a new step is added to `dispatch`), the failure message `assert 4 == 3` gives no indication of what changed.

`test_deleted_event_is_processed` is the clearest example — it already checks that two factories are *not* called, so checking the three specific writes (`CYPHER_DELETE_MODULE`, `CYPHER_DELETE_FILE`, `CYPHER_DELETE_CALLS`) would be consistent and complete:

```python
write_queries = [c.args[0] for c in mock_updater.ingestor.execute_write.call_args_list]
assert write_queries == [cs.CYPHER_DELETE_MODULE, cs.CYPHER_DELETE_FILE, cs.CYPHER_DELETE_CALLS]
```

The same pattern applies to `test_modified_event_is_processed` (line 701) and `test_multiple_files_changed` (line 825).

How can I resolve this? If you propose a fix, please make it concise.

Reviews (3): Last reviewed commit: "fix: remove unused CYPHER_ALL_MODULE_PAT..." | Re-trigger Greptile

Comment on lines +47 to +53
def test_modified_event_is_processed(
self, handler: CodeChangeEventHandler, mock_updater: MagicMock, temp_repo: Path
) -> None:
f = temp_repo / "app.py"
f.write_text("x = 1", encoding="utf-8")
handler.dispatch(FileModifiedEvent(str(f)))
assert mock_updater.ingestor.execute_write.call_count == 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing flush_all assertion for MODIFIED event

The MODIFIED event path in realtime_updater.dispatch() calls self.updater.ingestor.flush_all() at Step 5, just like the CREATED path. However, unlike test_created_event_is_processed (line 62) and test_multiple_files_changed (line 183), this test never asserts that flush_all was actually invoked. This means a regression that skips the flush for MODIFIED events would pass this test undetected.

Suggested change
def test_modified_event_is_processed(
self, handler: CodeChangeEventHandler, mock_updater: MagicMock, temp_repo: Path
) -> None:
f = temp_repo / "app.py"
f.write_text("x = 1", encoding="utf-8")
handler.dispatch(FileModifiedEvent(str(f)))
assert mock_updater.ingestor.execute_write.call_count == 3
def test_modified_event_is_processed(
self, handler: CodeChangeEventHandler, mock_updater: MagicMock, temp_repo: Path
) -> None:
f = temp_repo / "app.py"
f.write_text("x = 1", encoding="utf-8")
handler.dispatch(FileModifiedEvent(str(f)))
assert mock_updater.ingestor.execute_write.call_count == 3
mock_updater.ingestor.flush_all.assert_called_once()
Prompt To Fix With AI
This is a comment left during a code review.
Path: codebase_rag/tests/test_realtime_event_filtering.py
Line: 47-53

Comment:
**Missing `flush_all` assertion for MODIFIED event**

The MODIFIED event path in `realtime_updater.dispatch()` calls `self.updater.ingestor.flush_all()` at Step 5, just like the CREATED path. However, unlike `test_created_event_is_processed` (line 62) and `test_multiple_files_changed` (line 183), this test never asserts that `flush_all` was actually invoked. This means a regression that skips the flush for MODIFIED events would pass this test undetected.

```suggestion
    def test_modified_event_is_processed(
        self, handler: CodeChangeEventHandler, mock_updater: MagicMock, temp_repo: Path
    ) -> None:
        f = temp_repo / "app.py"
        f.write_text("x = 1", encoding="utf-8")
        handler.dispatch(FileModifiedEvent(str(f)))
        assert mock_updater.ingestor.execute_write.call_count == 3
        mock_updater.ingestor.flush_all.assert_called_once()
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +64 to +71
def test_deleted_event_is_processed(
self, handler: CodeChangeEventHandler, mock_updater: MagicMock, temp_repo: Path
) -> None:
f = temp_repo / "gone.py"
handler.dispatch(FileDeletedEvent(str(f)))
assert mock_updater.ingestor.execute_write.call_count == 3
mock_updater.factory.definition_processor.process_file.assert_not_called()
mock_updater.factory.structure_processor.process_generic_file.assert_not_called()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing flush_all assertion for DELETED event

realtime_updater.dispatch() calls flush_all unconditionally at Step 5 for all three relevant event types (MODIFIED, CREATED, DELETED). While test_rapid_create_modify_delete (line 169) happens to cover this transitively, no standalone DELETE test verifies it. A regression dropping the flush_all call for DELETE events would slip past this test.

Suggested change
def test_deleted_event_is_processed(
self, handler: CodeChangeEventHandler, mock_updater: MagicMock, temp_repo: Path
) -> None:
f = temp_repo / "gone.py"
handler.dispatch(FileDeletedEvent(str(f)))
assert mock_updater.ingestor.execute_write.call_count == 3
mock_updater.factory.definition_processor.process_file.assert_not_called()
mock_updater.factory.structure_processor.process_generic_file.assert_not_called()
def test_deleted_event_is_processed(
self, handler: CodeChangeEventHandler, mock_updater: MagicMock, temp_repo: Path
) -> None:
f = temp_repo / "gone.py"
handler.dispatch(FileDeletedEvent(str(f)))
assert mock_updater.ingestor.execute_write.call_count == 3
mock_updater.ingestor.flush_all.assert_called_once()
mock_updater.factory.definition_processor.process_file.assert_not_called()
mock_updater.factory.structure_processor.process_generic_file.assert_not_called()
Prompt To Fix With AI
This is a comment left during a code review.
Path: codebase_rag/tests/test_realtime_event_filtering.py
Line: 64-71

Comment:
**Missing `flush_all` assertion for DELETED event**

`realtime_updater.dispatch()` calls `flush_all` unconditionally at Step 5 for all three relevant event types (MODIFIED, CREATED, DELETED). While `test_rapid_create_modify_delete` (line 169) happens to cover this transitively, no standalone DELETE test verifies it. A regression dropping the `flush_all` call for DELETE events would slip past this test.

```suggestion
    def test_deleted_event_is_processed(
        self, handler: CodeChangeEventHandler, mock_updater: MagicMock, temp_repo: Path
    ) -> None:
        f = temp_repo / "gone.py"
        handler.dispatch(FileDeletedEvent(str(f)))
        assert mock_updater.ingestor.execute_write.call_count == 3
        mock_updater.ingestor.flush_all.assert_called_once()
        mock_updater.factory.definition_processor.process_file.assert_not_called()
        mock_updater.factory.structure_processor.process_generic_file.assert_not_called()
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a comprehensive suite of tests for the realtime updater's event filtering and non-code file handling. The tests are well-structured and cover a good range of scenarios. My feedback focuses on improving the consistency and completeness of assertions across the new test file to make the test suite even more robust.

f = temp_repo / "app.py"
f.write_text("x = 1", encoding="utf-8")
handler.dispatch(FileModifiedEvent(str(f)))
assert mock_updater.ingestor.execute_write.call_count == 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with other tests and to ensure the full event handling flow is tested, please also assert that flush_all is called. This aligns with the principle of prioritizing consistency with existing codebase patterns, especially in testing, to ensure a uniform and robust test suite.

Suggested change
assert mock_updater.ingestor.execute_write.call_count == 3
assert mock_updater.ingestor.execute_write.call_count == 3
mock_updater.ingestor.flush_all.assert_called_once()
References
  1. When multiple valid refactoring patterns exist (e.g., using a flag vs. early continue), prioritize consistency with the existing codebase and previous review decisions over adopting a new, equally valid pattern.

handler.dispatch(FileDeletedEvent(str(f)))
assert mock_updater.ingestor.execute_write.call_count == 3
mock_updater.factory.definition_processor.process_file.assert_not_called()
mock_updater.factory.structure_processor.process_generic_file.assert_not_called()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To ensure the test is comprehensive and consistent with other deletion tests, please also assert that flush_all is called after the event is dispatched. This aligns with the principle of prioritizing consistency with existing codebase patterns, especially in testing, to ensure a uniform and robust test suite.

Suggested change
mock_updater.factory.structure_processor.process_generic_file.assert_not_called()
mock_updater.factory.structure_processor.process_generic_file.assert_not_called()
mock_updater.ingestor.flush_all.assert_called_once()
References
  1. When multiple valid refactoring patterns exist (e.g., using a flag vs. early continue), prioritize consistency with the existing codebase and previous review decisions over adopting a new, equally valid pattern.

Comment on lines +109 to +111
mock_updater.factory.structure_processor.process_generic_file.assert_called_once_with(
f, "readme.md"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This test is a bit minimal. For better coverage and consistency with other created event tests (like test_created_event_is_processed), please also assert the number of database writes and that flush_all is called. This aligns with the principle of prioritizing consistency with existing codebase patterns, especially in testing, to ensure a uniform and robust test suite.

        mock_updater.factory.structure_processor.process_generic_file.assert_called_once_with(
            f, "readme.md"
        )
        assert mock_updater.ingestor.execute_write.call_count == 3
        mock_updater.ingestor.flush_all.assert_called_once()
References
  1. When multiple valid refactoring patterns exist (e.g., using a flag vs. early continue), prioritize consistency with the existing codebase and previous review decisions over adopting a new, equally valid pattern.

Comment on lines +119 to +121
mock_updater.factory.structure_processor.process_generic_file.assert_called_once_with(
f, "config.json"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the markdown test, this test could be more comprehensive. Please add assertions for the number of database writes and the call to flush_all to ensure the full event handling process is verified. This aligns with the principle of prioritizing consistency with existing codebase patterns, especially in testing, to ensure a uniform and robust test suite.

        mock_updater.factory.structure_processor.process_generic_file.assert_called_once_with(
            f, "config.json"
        )
        assert mock_updater.ingestor.execute_write.call_count == 3
        mock_updater.ingestor.flush_all.assert_called_once()
References
  1. When multiple valid refactoring patterns exist (e.g., using a flag vs. early continue), prioritize consistency with the existing codebase and previous review decisions over adopting a new, equally valid pattern.

Comment on lines +128 to +137
delete_file_calls = [
c
for c in mock_updater.ingestor.execute_write.call_args_list
if c.args[0] == cs.CYPHER_DELETE_FILE
]
assert len(delete_file_calls) == 1
assert delete_file_calls[0].args[1] == {
cs.KEY_PATH: "notes.md",
}
mock_updater.factory.structure_processor.process_generic_file.assert_not_called()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This test's assertions are not as comprehensive as other deletion tests in this file (e.g., test_delete_file_only_targets_specific_path). To improve consistency and test coverage, please verify all expected database operations for a file deletion event. This includes checking for CYPHER_DELETE_MODULE and CYPHER_DELETE_FILE calls with the correct path, the total number of writes, and the call to flush_all. This aligns with the principle of prioritizing consistency with existing codebase patterns, especially in testing, to ensure a uniform and robust test suite.

        assert mock_updater.ingestor.execute_write.call_count == 3
        mock_updater.ingestor.flush_all.assert_called_once()

        delete_file_calls = [
            c
            for c in mock_updater.ingestor.execute_write.call_args_list
            if c.args[0] == cs.CYPHER_DELETE_FILE
        ]
        assert len(delete_file_calls) == 1
        assert delete_file_calls[0].args[1] == {
            cs.KEY_PATH: "notes.md",
        }

        delete_module_calls = [
            c
            for c in mock_updater.ingestor.execute_write.call_args_list
            if c.args[0] == cs.CYPHER_DELETE_MODULE
        ]
        assert len(delete_module_calls) == 1
        assert delete_module_calls[0].args[1] == {cs.KEY_PATH: "notes.md"}

        mock_updater.factory.structure_processor.process_generic_file.assert_not_called()
References
  1. When multiple valid refactoring patterns exist (e.g., using a flag vs. early continue), prioritize consistency with the existing codebase and previous review decisions over adopting a new, equally valid pattern.

@codecov-commenter
Copy link

codecov-commenter commented Mar 23, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 99.13420% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
codebase_rag/tests/test_graph_updater_pruning.py 97.80% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

@vitali87 vitali87 force-pushed the test/realtime-updater-coverage branch from 9b9c079 to aa354ac Compare March 23, 2026 19:56
@vitali87
Copy link
Owner Author

@greptile

@vitali87
Copy link
Owner Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a comprehensive set of tests for the realtime updater, covering event filtering, non-code file handling, and various event sequences. The changes also refine the orphan node pruning logic to be project-specific by checking qualified names.

My main feedback is regarding the updated pruning logic in graph_updater.py. The change to only prune Module nodes has removed the capability to clean up orphan Folder nodes, which could lead to stale data in the graph. I've left a suggestion to reintroduce folder pruning with the necessary adjustments.

Comment on lines 499 to +518
prune_specs: list[tuple[str, str, str]] = [
(cs.CYPHER_ALL_FILE_PATHS, cs.CYPHER_DELETE_FILE, "File"),
(cs.CYPHER_ALL_MODULE_PATHS, cs.CYPHER_DELETE_MODULE, "Module"),
(cs.CYPHER_ALL_FOLDER_PATHS, cs.CYPHER_DELETE_FOLDER, "Folder"),
(
cs.CYPHER_ALL_MODULE_PATHS_INTERNAL,
cs.CYPHER_DELETE_MODULE,
"Module",
),
]

for query_all, delete_query, label in prune_specs:
rows = self.ingestor.fetch_all(query_all)
orphans = [
r["path"]
for r in rows
if r.get("path")
and not (self.repo_path / r["path"]).exists()
]
orphans = []
for r in rows:
path = r.get("path")
qn = r.get("qualified_name", "")
if not isinstance(path, str) or not path:
continue
if isinstance(qn, str) and qn and not qn.startswith(project_prefix):
continue
if not (self.repo_path / path).exists():
orphans.append(path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This change removes the pruning logic for orphan File and Folder nodes, restricting it to only Module nodes. While the hash-cache mechanism handles deleted files, orphan Folder nodes (e.g., from a deleted empty directory) will no longer be removed from the graph. This can lead to stale data accumulating over time.

Consider reintroducing pruning for Folder nodes. The new orphan detection logic, which checks for a project-prefixed qualified_name, is specific to modules, so you'll need to adapt it for folders.

        prune_specs: list[tuple[str, str, str]] = [
            (
                cs.CYPHER_ALL_MODULE_PATHS_INTERNAL,
                cs.CYPHER_DELETE_MODULE,
                "Module",
            ),
            (
                cs.CYPHER_ALL_FOLDER_PATHS,
                cs.CYPHER_DELETE_FOLDER,
                "Folder",
            ),
        ]

        for query_all, delete_query, label in prune_specs:
            rows = self.ingestor.fetch_all(query_all)
            orphans = []
            for r in rows:
                path = r.get("path")
                if not isinstance(path, str) or not path:
                    continue

                # The project prefix check is only applicable to Modules
                if label == "Module":
                    qn = r.get("qualified_name", "")
                    if isinstance(qn, str) and qn and not qn.startswith(project_prefix):
                        continue

                if not (self.repo_path / path).exists():
                    orphans.append(path)

@vitali87
Copy link
Owner Author

@greptile

@vitali87
Copy link
Owner Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive set of tests for the realtime updater and refines the orphan node pruning logic. The changes to make pruning project-specific are a solid improvement, and the new tests are well-structured, covering various event filtering scenarios, non-code file handling, and edge cases. I have one minor suggestion to remove an unused constant. Overall, this is a valuable contribution that enhances the robustness and test coverage of the codebase.

Comment on lines +889 to +891
CYPHER_ALL_MODULE_PATHS = (
"MATCH (m:Module) RETURN m.path AS path, m.qualified_name AS qualified_name"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This constant CYPHER_ALL_MODULE_PATHS appears to be unused after its usage in graph_updater.py was replaced with CYPHER_ALL_MODULE_PATHS_INTERNAL. To improve maintainability, it would be best to remove this dead code.

@vitali87
Copy link
Owner Author

@greptile

@sonarqubecloud
Copy link

@vitali87 vitali87 merged commit 993017b into main Mar 24, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants