Skip to content

fix: prevent duplicated codebase indexing on VSCode restart #5942

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

roomote[bot]
Copy link

@roomote roomote bot commented Jul 19, 2025

This PR fixes issue #5941 where the codebase was being re-indexed every time VSCode was restarted, even when a valid index already existed.

Problem

The code indexing system was not checking if an existing index with data was already present before starting a full re-index on every VSCode restart. This caused unnecessary CPU usage and delays for users.

Solution

  1. Added a hasData() method to the IVectorStore interface to check if the collection contains any indexed data
  2. Modified the orchestrator to check for existing index data before performing a full scan
  3. Updated the manager initialization logic to only trigger indexing when necessary (configuration changes or first initialization)

Changes

  • Added hasData() method to IVectorStore interface
  • Implemented hasData() in QdrantVectorStore to check collection point count
  • Modified CodeIndexOrchestrator.startIndexing() to skip full scan if valid data exists
  • Updated CodeIndexManager.initialize() to better handle existing indexes
  • Updated test mocks to include the new hasData() method

Testing

  • All existing tests pass
  • The fix has been tested locally and prevents re-indexing when a valid index exists
  • File watcher still works correctly for incremental updates

Fixes #5941


Important

Fixes issue #5941 by preventing unnecessary codebase re-indexing on VSCode restart through checking existing index data.

  • Behavior:
    • Prevents unnecessary re-indexing on VSCode restart by checking existing index data.
    • CodeIndexOrchestrator.startIndexing() skips full scan if valid data exists.
    • CodeIndexManager.initialize() only triggers indexing when necessary.
  • Interfaces:
    • Added hasData() method to IVectorStore interface.
    • Implemented hasData() in QdrantVectorStore to check collection point count.
  • Testing:
    • Updated test mocks to include hasData() method.
    • Verified that re-indexing is skipped when a valid index exists.
    • Ensured file watcher still functions for incremental updates.

This description was created by Ellipsis for da4c8ec. You can customize this summary. It will automatically update as commits are pushed.

- Add hasData() method to IVectorStore interface and QdrantVectorStore implementation
- Check for existing index data in orchestrator before performing full scan
- Update manager initialization logic to only trigger indexing when necessary
- Skip re-indexing if valid index with data already exists

Fixes #5941
@roomote roomote bot requested review from mrubens, cte and jr as code owners July 19, 2025 03:16
@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. bug Something isn't working labels Jul 19, 2025
@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Jul 19, 2025
@daniel-lxs daniel-lxs moved this from Triage to PR [Needs Prelim Review] in Roo Code Roadmap Jul 19, 2025
@hannesrudolph hannesrudolph added PR - Needs Preliminary Review and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Jul 19, 2025
@daniel-lxs
Copy link
Collaborator

This would not scan the codebase when the extension starts meaning any changes made while the service wasn't watching for file modifications will not be picked up and the indexing will become outdated.

The initial scan skips files that have no changes so this isn't necessary. The actual bug must come from somewhere else in the code so the issue #5941 needs proper scoping.

@daniel-lxs daniel-lxs closed this Jul 22, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Jul 22, 2025
@github-project-automation github-project-automation bot moved this from PR [Needs Prelim Review] to Done in Roo Code Roadmap Jul 22, 2025
@daniel-lxs daniel-lxs deleted the fix/duplicated-codebase-indexing branch July 22, 2025 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working PR - Needs Preliminary Review size:M This PR changes 30-99 lines, ignoring generated files.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Duplicated Codebase Indexing
3 participants