test: add event filtering and non-code file tests for realtime updater#485
test: add event filtering and non-code file tests for realtime updater#485
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the test coverage for the realtime updater component. It introduces a suite of new tests to validate the updater's behavior across various file system events, ensuring that only relevant changes trigger graph updates and that non-code files are processed correctly without generating unnecessary module nodes. The new tests also cover complex scenarios involving rapid file changes and the precision of delete operations, contributing to a more robust and reliable codebase graph. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
|
@greptile |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request adds a comprehensive suite of tests for the real-time updater, covering event filtering, non-code file handling, and various event sequences. The tests are well-structured and cover the intended functionality thoroughly. My feedback focuses on making some of the assertions more specific to improve test robustness and clarity, ensuring that not just the number of database operations is correct, but also that the right queries are being executed with the correct parameters.
| f = temp_repo / "app.py" | ||
| f.write_text("x = 1", encoding="utf-8") | ||
| handler.dispatch(FileModifiedEvent(str(f))) | ||
| assert mock_updater.ingestor.execute_write.call_count == 3 |
There was a problem hiding this comment.
The current assertion only checks the total number of write operations. To make this test more robust and descriptive, it's better to assert which specific Cypher queries were executed and with what parameters. This ensures that not only the count is correct, but also that the right operations are being performed on the correct file.
| assert mock_updater.ingestor.execute_write.call_count == 3 | |
| calls = mock_updater.ingestor.execute_write.call_args_list | |
| assert len(calls) == 3 | |
| assert calls[0].args[0] == cs.CYPHER_DELETE_MODULE | |
| assert calls[0].args[1] == {cs.KEY_PATH: "app.py"} | |
| assert calls[1].args[0] == cs.CYPHER_DELETE_FILE | |
| assert calls[1].args[1] == {cs.KEY_PATH: "app.py"} | |
| assert calls[2].args[0] == cs.CYPHER_DELETE_CALLS |
| f = temp_repo / "new.py" | ||
| f.write_text("y = 2", encoding="utf-8") | ||
| handler.dispatch(FileCreatedEvent(str(f))) | ||
| assert mock_updater.ingestor.execute_write.call_count == 3 |
There was a problem hiding this comment.
Similar to the modified event test, asserting the specific queries and their parameters will make this test more robust. It verifies that the correct cleanup and reprocessing steps are being triggered for a file creation event.
| assert mock_updater.ingestor.execute_write.call_count == 3 | |
| calls = mock_updater.ingestor.execute_write.call_args_list | |
| assert len(calls) == 3 | |
| assert calls[0].args[0] == cs.CYPHER_DELETE_MODULE | |
| assert calls[0].args[1] == {cs.KEY_PATH: "new.py"} | |
| assert calls[1].args[0] == cs.CYPHER_DELETE_FILE | |
| assert calls[1].args[1] == {cs.KEY_PATH: "new.py"} | |
| assert calls[2].args[0] == cs.CYPHER_DELETE_CALLS |
| ) -> None: | ||
| f = temp_repo / "gone.py" | ||
| handler.dispatch(FileDeletedEvent(str(f))) | ||
| assert mock_updater.ingestor.execute_write.call_count == 3 |
There was a problem hiding this comment.
To improve the test's specificity, it's better to assert the exact queries being called upon file deletion. This confirms that the correct DELETE operations are performed against the graph for the specified file.
| assert mock_updater.ingestor.execute_write.call_count == 3 | |
| calls = mock_updater.ingestor.execute_write.call_args_list | |
| assert len(calls) == 3 | |
| assert calls[0].args[0] == cs.CYPHER_DELETE_MODULE | |
| assert calls[0].args[1] == {cs.KEY_PATH: "gone.py"} | |
| assert calls[1].args[0] == cs.CYPHER_DELETE_FILE | |
| assert calls[1].args[1] == {cs.KEY_PATH: "gone.py"} | |
| assert calls[2].args[0] == cs.CYPHER_DELETE_CALLS |
| # (H) After delete, no re-parse or file node creation | ||
| mock_updater.factory.definition_processor.process_file.assert_not_called() | ||
| mock_updater.factory.structure_processor.process_generic_file.assert_not_called() | ||
| assert mock_updater.ingestor.execute_write.call_count == 3 |
There was a problem hiding this comment.
The assertion for the final delete event can be made more specific. By checking the exact queries and parameters, we can be more confident that the deletion part of this sequence behaves as expected, cleaning up the correct entities from the graph.
| assert mock_updater.ingestor.execute_write.call_count == 3 | |
| calls = mock_updater.ingestor.execute_write.call_args_list | |
| assert len(calls) == 3 | |
| assert calls[0].args[0] == cs.CYPHER_DELETE_MODULE | |
| assert calls[0].args[1] == {cs.KEY_PATH: "ephemeral.py"} | |
| assert calls[1].args[0] == cs.CYPHER_DELETE_FILE | |
| assert calls[1].args[1] == {cs.KEY_PATH: "ephemeral.py"} | |
| assert calls[2].args[0] == cs.CYPHER_DELETE_CALLS |
| handler.dispatch(FileModifiedEvent(str(f1))) | ||
| handler.dispatch(FileModifiedEvent(str(f2))) | ||
|
|
||
| assert mock_updater.ingestor.execute_write.call_count == 6 |
There was a problem hiding this comment.
Instead of just checking the total number of write calls, it would be more robust to verify the count of each type of query. This ensures that the correct set of operations is performed for each file modification, even when events happen close together. Using collections.Counter can make this check clean and effective.
| assert mock_updater.ingestor.execute_write.call_count == 6 | |
| from collections import Counter | |
| queries_called = [ | |
| c.args[0] for c in mock_updater.ingestor.execute_write.call_args_list | |
| ] | |
| counts = Counter(queries_called) | |
| assert len(queries_called) == 6 | |
| assert counts[cs.CYPHER_DELETE_MODULE] == 2 | |
| assert counts[cs.CYPHER_DELETE_FILE] == 2 | |
| assert counts[cs.CYPHER_DELETE_CALLS] == 2 |
Greptile SummaryThis PR adds 13 tests that cover the event-filtering and non-code-file bug-fixes from PR #405, and extends the orphan-node pruning logic to be project-scoped so that a multi-project graph database is not corrupted by false deletions. Production changes
Test changes
Notable design point: nodes stored without Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[_prune_orphan_nodes called] --> B[Build project_prefix & repo_abs]
B --> C{For each prune_spec\nFile / Module / Folder}
C --> D[fetch_all rows from graph]
D --> E{path is valid string?}
E -- No --> F[skip row]
E -- Yes --> G{abs_path present\nand is string?}
G -- Yes, not this repo --> F
G -- No / same repo --> H{qualified_name present\nnon-empty?}
H -- Yes, different project --> F
H -- No / same project --> I{path exists on disk?}
I -- Yes --> F
I -- No --> J[add to orphans list]
J --> K[execute_write DELETE query\nfor each orphan]
K --> C
|
| def test_modified_event_is_processed( | ||
| self, handler: CodeChangeEventHandler, mock_updater: MagicMock, temp_repo: Path | ||
| ) -> None: | ||
| f = temp_repo / "app.py" | ||
| f.write_text("x = 1", encoding="utf-8") | ||
| handler.dispatch(FileModifiedEvent(str(f))) | ||
| assert mock_updater.ingestor.execute_write.call_count == 3 |
There was a problem hiding this comment.
Missing
flush_all assertion for MODIFIED event
The MODIFIED event path in realtime_updater.dispatch() calls self.updater.ingestor.flush_all() at Step 5, just like the CREATED path. However, unlike test_created_event_is_processed (line 62) and test_multiple_files_changed (line 183), this test never asserts that flush_all was actually invoked. This means a regression that skips the flush for MODIFIED events would pass this test undetected.
| def test_modified_event_is_processed( | |
| self, handler: CodeChangeEventHandler, mock_updater: MagicMock, temp_repo: Path | |
| ) -> None: | |
| f = temp_repo / "app.py" | |
| f.write_text("x = 1", encoding="utf-8") | |
| handler.dispatch(FileModifiedEvent(str(f))) | |
| assert mock_updater.ingestor.execute_write.call_count == 3 | |
| def test_modified_event_is_processed( | |
| self, handler: CodeChangeEventHandler, mock_updater: MagicMock, temp_repo: Path | |
| ) -> None: | |
| f = temp_repo / "app.py" | |
| f.write_text("x = 1", encoding="utf-8") | |
| handler.dispatch(FileModifiedEvent(str(f))) | |
| assert mock_updater.ingestor.execute_write.call_count == 3 | |
| mock_updater.ingestor.flush_all.assert_called_once() |
Prompt To Fix With AI
This is a comment left during a code review.
Path: codebase_rag/tests/test_realtime_event_filtering.py
Line: 47-53
Comment:
**Missing `flush_all` assertion for MODIFIED event**
The MODIFIED event path in `realtime_updater.dispatch()` calls `self.updater.ingestor.flush_all()` at Step 5, just like the CREATED path. However, unlike `test_created_event_is_processed` (line 62) and `test_multiple_files_changed` (line 183), this test never asserts that `flush_all` was actually invoked. This means a regression that skips the flush for MODIFIED events would pass this test undetected.
```suggestion
def test_modified_event_is_processed(
self, handler: CodeChangeEventHandler, mock_updater: MagicMock, temp_repo: Path
) -> None:
f = temp_repo / "app.py"
f.write_text("x = 1", encoding="utf-8")
handler.dispatch(FileModifiedEvent(str(f)))
assert mock_updater.ingestor.execute_write.call_count == 3
mock_updater.ingestor.flush_all.assert_called_once()
```
How can I resolve this? If you propose a fix, please make it concise.| def test_deleted_event_is_processed( | ||
| self, handler: CodeChangeEventHandler, mock_updater: MagicMock, temp_repo: Path | ||
| ) -> None: | ||
| f = temp_repo / "gone.py" | ||
| handler.dispatch(FileDeletedEvent(str(f))) | ||
| assert mock_updater.ingestor.execute_write.call_count == 3 | ||
| mock_updater.factory.definition_processor.process_file.assert_not_called() | ||
| mock_updater.factory.structure_processor.process_generic_file.assert_not_called() |
There was a problem hiding this comment.
Missing
flush_all assertion for DELETED event
realtime_updater.dispatch() calls flush_all unconditionally at Step 5 for all three relevant event types (MODIFIED, CREATED, DELETED). While test_rapid_create_modify_delete (line 169) happens to cover this transitively, no standalone DELETE test verifies it. A regression dropping the flush_all call for DELETE events would slip past this test.
| def test_deleted_event_is_processed( | |
| self, handler: CodeChangeEventHandler, mock_updater: MagicMock, temp_repo: Path | |
| ) -> None: | |
| f = temp_repo / "gone.py" | |
| handler.dispatch(FileDeletedEvent(str(f))) | |
| assert mock_updater.ingestor.execute_write.call_count == 3 | |
| mock_updater.factory.definition_processor.process_file.assert_not_called() | |
| mock_updater.factory.structure_processor.process_generic_file.assert_not_called() | |
| def test_deleted_event_is_processed( | |
| self, handler: CodeChangeEventHandler, mock_updater: MagicMock, temp_repo: Path | |
| ) -> None: | |
| f = temp_repo / "gone.py" | |
| handler.dispatch(FileDeletedEvent(str(f))) | |
| assert mock_updater.ingestor.execute_write.call_count == 3 | |
| mock_updater.ingestor.flush_all.assert_called_once() | |
| mock_updater.factory.definition_processor.process_file.assert_not_called() | |
| mock_updater.factory.structure_processor.process_generic_file.assert_not_called() |
Prompt To Fix With AI
This is a comment left during a code review.
Path: codebase_rag/tests/test_realtime_event_filtering.py
Line: 64-71
Comment:
**Missing `flush_all` assertion for DELETED event**
`realtime_updater.dispatch()` calls `flush_all` unconditionally at Step 5 for all three relevant event types (MODIFIED, CREATED, DELETED). While `test_rapid_create_modify_delete` (line 169) happens to cover this transitively, no standalone DELETE test verifies it. A regression dropping the `flush_all` call for DELETE events would slip past this test.
```suggestion
def test_deleted_event_is_processed(
self, handler: CodeChangeEventHandler, mock_updater: MagicMock, temp_repo: Path
) -> None:
f = temp_repo / "gone.py"
handler.dispatch(FileDeletedEvent(str(f)))
assert mock_updater.ingestor.execute_write.call_count == 3
mock_updater.ingestor.flush_all.assert_called_once()
mock_updater.factory.definition_processor.process_file.assert_not_called()
mock_updater.factory.structure_processor.process_generic_file.assert_not_called()
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Code Review
This pull request adds a comprehensive suite of tests for the realtime updater's event filtering and non-code file handling. The tests are well-structured and cover a good range of scenarios. My feedback focuses on improving the consistency and completeness of assertions across the new test file to make the test suite even more robust.
| f = temp_repo / "app.py" | ||
| f.write_text("x = 1", encoding="utf-8") | ||
| handler.dispatch(FileModifiedEvent(str(f))) | ||
| assert mock_updater.ingestor.execute_write.call_count == 3 |
There was a problem hiding this comment.
For consistency with other tests and to ensure the full event handling flow is tested, please also assert that flush_all is called. This aligns with the principle of prioritizing consistency with existing codebase patterns, especially in testing, to ensure a uniform and robust test suite.
| assert mock_updater.ingestor.execute_write.call_count == 3 | |
| assert mock_updater.ingestor.execute_write.call_count == 3 | |
| mock_updater.ingestor.flush_all.assert_called_once() |
References
- When multiple valid refactoring patterns exist (e.g., using a flag vs. early
continue), prioritize consistency with the existing codebase and previous review decisions over adopting a new, equally valid pattern.
| handler.dispatch(FileDeletedEvent(str(f))) | ||
| assert mock_updater.ingestor.execute_write.call_count == 3 | ||
| mock_updater.factory.definition_processor.process_file.assert_not_called() | ||
| mock_updater.factory.structure_processor.process_generic_file.assert_not_called() |
There was a problem hiding this comment.
To ensure the test is comprehensive and consistent with other deletion tests, please also assert that flush_all is called after the event is dispatched. This aligns with the principle of prioritizing consistency with existing codebase patterns, especially in testing, to ensure a uniform and robust test suite.
| mock_updater.factory.structure_processor.process_generic_file.assert_not_called() | |
| mock_updater.factory.structure_processor.process_generic_file.assert_not_called() | |
| mock_updater.ingestor.flush_all.assert_called_once() |
References
- When multiple valid refactoring patterns exist (e.g., using a flag vs. early
continue), prioritize consistency with the existing codebase and previous review decisions over adopting a new, equally valid pattern.
| mock_updater.factory.structure_processor.process_generic_file.assert_called_once_with( | ||
| f, "readme.md" | ||
| ) |
There was a problem hiding this comment.
This test is a bit minimal. For better coverage and consistency with other created event tests (like test_created_event_is_processed), please also assert the number of database writes and that flush_all is called. This aligns with the principle of prioritizing consistency with existing codebase patterns, especially in testing, to ensure a uniform and robust test suite.
mock_updater.factory.structure_processor.process_generic_file.assert_called_once_with(
f, "readme.md"
)
assert mock_updater.ingestor.execute_write.call_count == 3
mock_updater.ingestor.flush_all.assert_called_once()References
- When multiple valid refactoring patterns exist (e.g., using a flag vs. early
continue), prioritize consistency with the existing codebase and previous review decisions over adopting a new, equally valid pattern.
| mock_updater.factory.structure_processor.process_generic_file.assert_called_once_with( | ||
| f, "config.json" | ||
| ) |
There was a problem hiding this comment.
Similar to the markdown test, this test could be more comprehensive. Please add assertions for the number of database writes and the call to flush_all to ensure the full event handling process is verified. This aligns with the principle of prioritizing consistency with existing codebase patterns, especially in testing, to ensure a uniform and robust test suite.
mock_updater.factory.structure_processor.process_generic_file.assert_called_once_with(
f, "config.json"
)
assert mock_updater.ingestor.execute_write.call_count == 3
mock_updater.ingestor.flush_all.assert_called_once()References
- When multiple valid refactoring patterns exist (e.g., using a flag vs. early
continue), prioritize consistency with the existing codebase and previous review decisions over adopting a new, equally valid pattern.
| delete_file_calls = [ | ||
| c | ||
| for c in mock_updater.ingestor.execute_write.call_args_list | ||
| if c.args[0] == cs.CYPHER_DELETE_FILE | ||
| ] | ||
| assert len(delete_file_calls) == 1 | ||
| assert delete_file_calls[0].args[1] == { | ||
| cs.KEY_PATH: "notes.md", | ||
| } | ||
| mock_updater.factory.structure_processor.process_generic_file.assert_not_called() |
There was a problem hiding this comment.
This test's assertions are not as comprehensive as other deletion tests in this file (e.g., test_delete_file_only_targets_specific_path). To improve consistency and test coverage, please verify all expected database operations for a file deletion event. This includes checking for CYPHER_DELETE_MODULE and CYPHER_DELETE_FILE calls with the correct path, the total number of writes, and the call to flush_all. This aligns with the principle of prioritizing consistency with existing codebase patterns, especially in testing, to ensure a uniform and robust test suite.
assert mock_updater.ingestor.execute_write.call_count == 3
mock_updater.ingestor.flush_all.assert_called_once()
delete_file_calls = [
c
for c in mock_updater.ingestor.execute_write.call_args_list
if c.args[0] == cs.CYPHER_DELETE_FILE
]
assert len(delete_file_calls) == 1
assert delete_file_calls[0].args[1] == {
cs.KEY_PATH: "notes.md",
}
delete_module_calls = [
c
for c in mock_updater.ingestor.execute_write.call_args_list
if c.args[0] == cs.CYPHER_DELETE_MODULE
]
assert len(delete_module_calls) == 1
assert delete_module_calls[0].args[1] == {cs.KEY_PATH: "notes.md"}
mock_updater.factory.structure_processor.process_generic_file.assert_not_called()References
- When multiple valid refactoring patterns exist (e.g., using a flag vs. early
continue), prioritize consistency with the existing codebase and previous review decisions over adopting a new, equally valid pattern.
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
9b9c079 to
aa354ac
Compare
|
@greptile |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request adds a comprehensive set of tests for the realtime updater, covering event filtering, non-code file handling, and various event sequences. The changes also refine the orphan node pruning logic to be project-specific by checking qualified names.
My main feedback is regarding the updated pruning logic in graph_updater.py. The change to only prune Module nodes has removed the capability to clean up orphan Folder nodes, which could lead to stale data in the graph. I've left a suggestion to reintroduce folder pruning with the necessary adjustments.
| prune_specs: list[tuple[str, str, str]] = [ | ||
| (cs.CYPHER_ALL_FILE_PATHS, cs.CYPHER_DELETE_FILE, "File"), | ||
| (cs.CYPHER_ALL_MODULE_PATHS, cs.CYPHER_DELETE_MODULE, "Module"), | ||
| (cs.CYPHER_ALL_FOLDER_PATHS, cs.CYPHER_DELETE_FOLDER, "Folder"), | ||
| ( | ||
| cs.CYPHER_ALL_MODULE_PATHS_INTERNAL, | ||
| cs.CYPHER_DELETE_MODULE, | ||
| "Module", | ||
| ), | ||
| ] | ||
|
|
||
| for query_all, delete_query, label in prune_specs: | ||
| rows = self.ingestor.fetch_all(query_all) | ||
| orphans = [ | ||
| r["path"] | ||
| for r in rows | ||
| if r.get("path") | ||
| and not (self.repo_path / r["path"]).exists() | ||
| ] | ||
| orphans = [] | ||
| for r in rows: | ||
| path = r.get("path") | ||
| qn = r.get("qualified_name", "") | ||
| if not isinstance(path, str) or not path: | ||
| continue | ||
| if isinstance(qn, str) and qn and not qn.startswith(project_prefix): | ||
| continue | ||
| if not (self.repo_path / path).exists(): | ||
| orphans.append(path) |
There was a problem hiding this comment.
This change removes the pruning logic for orphan File and Folder nodes, restricting it to only Module nodes. While the hash-cache mechanism handles deleted files, orphan Folder nodes (e.g., from a deleted empty directory) will no longer be removed from the graph. This can lead to stale data accumulating over time.
Consider reintroducing pruning for Folder nodes. The new orphan detection logic, which checks for a project-prefixed qualified_name, is specific to modules, so you'll need to adapt it for folders.
prune_specs: list[tuple[str, str, str]] = [
(
cs.CYPHER_ALL_MODULE_PATHS_INTERNAL,
cs.CYPHER_DELETE_MODULE,
"Module",
),
(
cs.CYPHER_ALL_FOLDER_PATHS,
cs.CYPHER_DELETE_FOLDER,
"Folder",
),
]
for query_all, delete_query, label in prune_specs:
rows = self.ingestor.fetch_all(query_all)
orphans = []
for r in rows:
path = r.get("path")
if not isinstance(path, str) or not path:
continue
# The project prefix check is only applicable to Modules
if label == "Module":
qn = r.get("qualified_name", "")
if isinstance(qn, str) and qn and not qn.startswith(project_prefix):
continue
if not (self.repo_path / path).exists():
orphans.append(path)|
@greptile |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive set of tests for the realtime updater and refines the orphan node pruning logic. The changes to make pruning project-specific are a solid improvement, and the new tests are well-structured, covering various event filtering scenarios, non-code file handling, and edge cases. I have one minor suggestion to remove an unused constant. Overall, this is a valuable contribution that enhances the robustness and test coverage of the codebase.
codebase_rag/constants.py
Outdated
| CYPHER_ALL_MODULE_PATHS = ( | ||
| "MATCH (m:Module) RETURN m.path AS path, m.qualified_name AS qualified_name" | ||
| ) |
|
@greptile |
|



Summary
Follow-up to PR #405 (realtime updater fixes by @bhargavchippada). Adds 13 tests covering the bug fixes and new functionality.
Tests added (13 total)
Event filtering (6):
Non-code file handling (4):
Mixed sequences (2):
Query isolation (1):
Test plan